DBIx::FullTextSearch is a Perl module for indexing documents with MySQL as storage.
DBIx::FullTextSearch uses a MySQL database backend to index files, web documents and database fields. Supports must include, can include, and cannot include words and phrases. Support for boolean (AND/OR) queries, stop words and stemming.
# connect to database (regular DBI)
my $dbh = DBI->connect('dbi:mysql:database', 'user', 'passwd');
# create a new stoplist
my $sl = DBIx::FullTextSearch::StopList->create_default($dbh, 'sl_en', 'English');
# create a new index with default english stoplist and english stemmer
my $fts = DBIx::FullTextSearch->create($dbh, 'fts_web_1',
frontend => 'string', backend => 'blob',
stoplist => 'sl_en', stemmer => 'en-us');
# or open existing one
# my $fts = DBIx::FullTextSearch->open($dbh, 'fts_web_1');
# index documents
$fts->index_document('krtek', 'krtek leze pod zemi');
$fts->index_document('jezek', 'Jezek ma ostre bodliny.');
# search for matches
my @docs = $fts->contains('foo');
my @docs = $fts->econtains('+foo', '-Bar');
my @docs = $fts->search('+foo -Bar');
my @docs = $fts->search('foo AND (bar OR baz)');
DBIx::FullTextSearch is a flexible solution for indexing contents of documents. It uses the MySQL database to store the information about words and documents and provides Perl interface for indexing new documents, making changes and searching for matches. For DBIx::FullTextSearch, a document is nearly anything -- Perl scalar, file, Web document, database field.
The basic style of interface is shown above. What you need is a MySQL database and a DBI with DBD::mysql. Then you create a DBIx::FullTextSearch index -- a set of tables that maintain all necessary information. Once created it can be accessed many times, either for updating the index (adding documents) or searching.
DBIx::FullTextSearch uses one basic table to store parameters of the index. Second table is used to store the actual information about documents and words, and depending on the type of the index (specified during index creation) there may be more tables to store additional information (like conversion from external string names (eg. URL's) to internal numeric form). For a user, these internal thingies and internal behaviour of the index are not important. The important part is the API, the methods to index document and ask questions about words in documents. However, certain understanding of how it all works may be usefull when you are deciding if this module is for you and what type of index will best suit your needs.