SCAN (Smart Content Aggregation and Navigation) is a personal Information Retrieval framework.
SCAN is aiming for a solution of major problems of content organization and findability in information overload era.
SCAN aggregates content from different sources into a single documents collection. This repository may keep records on thousands of documents independently of their original locations and formats. Every document record contains a number of metadata properties (such as title, description, author, creation date, etc) which can be either set automatically or edited manually.
Adding documents to the repository is an automated operation. A user only need to point SCAN to a location and the application will find and add every document from there. Added document locations will be monitored for changes (new, modified or deleted documents) to keep the repository up-to-date.
The documents content is indexed for search and text analysis. You can search the documents either by simple text queries, or by using special forms to make complex queries for searching on document text and properties. The queries can be saved for repeatable use.
The documents collection is structured with a system of tags, similar to the services like del.icio.us or Flickr. Tags are keywords or labels attached to the items to identify them for quick navigation and finding. All tags together form a taxonomy representing the semantics of the documents collection. The taxonomy can be viewed as a "tags cloud" for navigating through the documents repository.
SCAN text analysis mechanism simplifies the process of tagging. It analyzes a document content and suggests the most relevant words as to-be tags. It makes manual tagging as simple as selecting the tags from the proposed candidates. It also can undertake the whole manual process of tagging, either by automated assigning the tags to the documents, or by finding the documents, relevant to a specific tag. Another text analysis application is searching the documents similar to a specific one (search by pattern).
SCAN is a component-based software using a number of plugins for specific features. The basic SCAN platform can be easily extended with plugins for new document formats, document locations (RSS feeds, web-sites, e-mail, etc) and language analyzers. Whole new areas of functionality can be added with user interface extensions. An example of such extensions is the plugin to browse the repository with a calendar (grouping the documents by their creation dates).
SCAN is a Java application, so it works on any Java-enabled platform. SCAN is a free open source software, distributed under Apache License, Version 2.0
Here are some key features of "SCAN":
Supported document formats:
· Arbitrary XML
· Plain text
· OpenDocument Text *
· PDF *
· MS Word 97/2000/XP *
Supported document locations:
· Local directories (recursive)
· Syndication feeds (RSS/Atom) *
· Del.icio.us bookmarks *
· Plugins for other location types (e.g. web sites and mail boxes) are planned.
· * - pluggable feature
Supported document languages:
· Autoidentification of a document format by a file name pattern
· Monitoring the indexed locations to track new, modified and deleted documents and keep the collection up-to-date.
· Analyzing content accordingly to specific language rules for stems extraction and stopwords filtering.
· Caching parsed documents
· Guessing basic metadata properties (title, description) from a document content
· Built-in search capabilities based on Apache Lucene search engine
· Basic full-text quick search (with ability of using special query syntax for experienced users)
· Advanced search on both documents text and properties
· Possibility to save the advanced search queries for repeating use
· Search of the documents similar to a specified one ("pattern search").
Tagging and text analysis:
· Manual tags assigning and editing
· Text analysis functions for extraction and prompting the relevant tags
· Automated documents tagging based on text analysis mechanism
· Optional transparent auto-tagging for new or modified documents (on indexing stage)
· Automated assigning a tag to the relevant documents ("tag auto-population")
· Adjustable analysis/auto-tagging parameters
· Navigating the document collection with the "tags cloud"
· Finding the groups of related tags and highlighting them dynamically
· Removing the tags
· Viewing the documents list in two modes: List view (brief) and Table view (detailed)
· Customizable set of properties to display in Table view
· Sorting the documents list by selected document property.
· Filtering the documents list by specified filter query
· In-place editing the document properties (in table view)
· Opening the documents with an external application defined for each document type
· Visual query editor for Advanced search
· Calendar plugin to browse the documents by creation dates
· TagClusters plugin to visualize the tags with a graphical clusters map
· Sun Java 2 Runtime Environment (JRE) version 5.0 and higher
Installation and running:
Unpack the distribution in a directory of your choice.
Cd to the SCAN installation directory and execute a startup script ('scan.sh' or 'scan.bat', depending on your platform).
Note for *NIX users: You may have to set proper file permissions in order to run "scan.sh". Cd to the SCAN directory and execute:
chmod 755 ./scan.sh
Alternatively, you can run SCAN from a console:
cd < SCAN_INSTALL_DIR >
java -jar scan-launcher.jar
Double-clicking on "scan-launcher.jar" icon in your file manager should also run the application. If it doesn't work, check your system file associations.
Some SCAN features are available as separate plugins. To install a plugin, unpack it as new subdirectory of 'plugins/' directory.
What's New in This Release:
· This release fixed a problem where incorrect document dates broke the application, validation of date values in Document properties, an issue where indexing fails with a String index out of range error, and a problem where function keys start editing the table.
· The application window now opens immediately on start.
· The exclusion filter regular expression was partially fixed.