Recoll Changelog

What's new in Recoll 1.21.2

Oct 9, 2015
  • Added GUI dialog to perform partial indexing.
  • Avanced search in "Any Clause" mode: directory filter would not filter but add an ORed clause.
  • Fix bogus syntax errors about parentheses around phrases.
  • Fixed a few boundary conditions detected by VC++
  • Misc other small fixes, see commit log.

New in Recoll 1.20.4 (Apr 2, 2015)

  • 1.20.4 has a fix to skip compress file system images like xxx.img.gz by default. This should have been in 1.20.3

New in Recoll 1.20.1 (Dec 24, 2014)

  • An Open With entry was added to the result list and result table popup menus. This lets you choose an alternative application to open a document. The list of applications is built from the information inside the /usr/share/applications desktop files.
  • A new way for specifying multiple terms to be searched inside a given field: it used to be that an entry lacking whitespace but splittable, like [term1,term2] was transformed into a phrase search, which made sense in some cases, but no so many. The code was changed so that [term1,term2] now means [term1 AND term2], and [term1/term2] means [term1 OR term2]. This is useful for field searches where you would previously be forced to repeat the field name for every term. [somefield:term1 somefield:term2] can now be expressed as [somefield:term1,term2].
  • (1.20.1) The Query Fragments tool was added to the GUI. This is a window with customizable buttons to add arbitrary query language fragments to the current search. The buttons and fragments are defined in an xml file inside the recoll configuration directory ~/.recoll/fragbuts.xml. This makes it easy to define "pre-cooked" filters for things that you need repeatedly. See the manual for more details.
  • We changed the way terms are generated from a compound string (e.g. an email address). Previously, for an address like [email protected], only the simple terms and the terms anchored at the start were generated (jfd, recoll, org, jfd@recoll, [email protected]). The new text splitter generates all the other possible terms (here, recoll.org only), so that it is now possible to search for left-truncated versions of the compound, e.g., all emails from a given domain.
  • (1.20.1) New keyboard accelerators for the result table: Ctrl+r switches the focus from the search entry to the table, Ctrl+o opens the document for the current line, Ctrl+Shift+o opens document and closes recoll, Ctrl+d previews the document.
  • (1.20.1) A special term is now indexed for results from the web history: use "-rclbes:BGL" to exclude the web results, "rclbes:BGL" to restrict the results to the web ones. This is difficult to remember, but the Query Fragments feature means that you don't need to (this is in the sample Query Fragments file).
  • Recoll now indexes #hashtags as such.
  • It is now possible to configure the GUI in wide form factor by dragging the toolbars to one of the sides (their location is remembered between sessions), and moving the category filters to a menu (can be set in the "Preferences->GUI configuration" panel).
  • We added the indexedmimetypes and excludedmimetypes variables to the configuration GUI, which was also compacted a bit. A bunch of ininteresting variables were also removed.
  • When indexing, we no longer add the top container file name as a term for the contained sub-documents (if any). This made no sense in most cases, as it meant that you would get hits on all the sections from a chm or epub when the top file name matched the search, when you probably wanted only the parent document in this case.
  • However, the container file name was sometimes useful for filtering results, and it is still accessible, in a different way: the top container file name is added as a term to all the sub-documents, only for searching with a prefix. The field name is containerfilename, and no match on the subdocuments will occur if the field is not specified (this is different from previous filename processing, which was indexed as a general term. containerfilename is also set on files without sub-documents (e.g. a pdf).
  • A new attribute, pfxonly, was created to support the above change. This can be set on any metadata field inside the [prefixes] section of the fields file. The affected field terms will be indexed only with a prefix, so they will cause a hit only for a field search (the general behaviour is that field terms are indexed both prefixed and not, so they can also cause a hit when searched as general terms).
  • A new [queryaliases] section was created in the fields, for definining field name aliases to be used only at query time (to avoid unwanted collection of data on random fields during indexing). The section is empty by default, but 2 obvious aliases are commented: filename=fn and containerfilename=cfn. Setting them in your personal file may save you some typing if you search on file names.
  • You can now use both -e and -i for erasing then updating the index for the given file arguments with the same recollindex command.
  • We now allow access to the Xapian docid for Recoll documents in recollq and Python API search results. This allows writing scripts which combine Recoll and pure Xapian operations. A sample Python program to find document duplicates, using MD5 terms was added. See src/python/samples/docdups.py
  • The command used to identify the mime types of files when the internal method is file -i by default. It is now possible to customize this command by setting the systemfilecommand in the configuration. A suggested value would be xdg-mime, which sometimes works better than file.
  • The result list has two new elements: %P substitution for printing the parent folder name, and an F link target which will open the parent folder in a file manager window. e.g. Open parent directory
  • /media was added to the default skippedPaths list mostly as a reminder that blindly processing these with the general indexer is a bad idea (use separate indexes instead).
  • recollq and recoll -t get a new option -N to print field names between values when -F is used. In addition, -F "" is taken as a directive to print all fields.
  • Unicode hyphen (0x2010) is now translated to ASCII minus during indexing and searching. There is no good way to handle this character, given the varius misuses of minus and hyphen. This choice was deemed "less bad" than the previous one.

New in Recoll 1.19.14 (Jun 10, 2014)

  • 1.19.14 fixes relatively minor but ennoying issues in indexing, plus a few other glitches:
  • The use of a separate readonly Database object for querying the index while indexing would trigger Xapian errors, (bad block reads), and subsequent up-to-date check failures (leading to unnecessary reindexing). The jury is out as to the cause, but using the same object for reading and writing seems to eliminate the problem.
  • An unnecessary log message in the child process between forking and executing the filter could block on a mutex, and lead to a 20 mn timeout for the affected father process thread (happened only in multithread mode).
  • Also a possible overflow of the filter stack. This could only really happen in pathological situations (hand-crafted recursive zip file...).

New in Recoll 1.19.13 (May 7, 2014)

  • This hopefully fixes the last remaining bug in the multithreading code, which was causing quite rare, but annoying crashes. You definitely want to upgrade to this version if you are running recoll 1.19.

New in Recoll 1.19.11 (Nov 30, 2013)

  • Case/diacritics sensitivity is still off by default for this release. It can be turned on only by editing recoll.conf (see the manual). If you do so, you must then reset the index.

New in Recoll 1.19.9 (Nov 12, 2013)

  • This release fixes a number of significant bugs (query date condition handling, possible GUI crashes...).

New in Recoll 1.19.2 (May 14, 2013)

  • This release fixes a bug in path translations for additional indexes.

New in Recoll 1.18.1 (Nov 5, 2012)

  • This version brings optional case- and diacritics-sensitive searches, complex search history, direct access to hit pages for PDF documents.

New in Recoll 1.17.3 (May 25, 2012)

  • Release 1.17.3 mostly fixes an indexing crash that sometimes occurred while processing email.

New in Recoll 1.17.2 (May 18, 2012)

  • Fixes a few bugs and adds a small feature for handling characters that should not be accented in your language (ie: å in swedish). See unac_except_transx in the manual configuration section. Also a new rcldia filter for Dia files.

New in Recoll 1.17.0 (Mar 26, 2012)

  • Release 1.17.0 brings a number of usability improvement: management of indexing operations from the GUI, filtering on file size, extended directory filtering, Ubuntu Unity Lens, thumbnails in result lists, Okular notes and Gnumeric filters, etc.

New in Recoll 1.16.2 (Nov 8, 2011)

  • It fixes a number of bugs in 1.16.1, ranging from small to ennoying depending on usage.

New in Recoll 1.16.1 (Sep 29, 2011)

  • It fixes a GUI crash in 1.16.0 (see below) and a lyx filter issue.

New in Recoll 1.16.0 (Sep 21, 2011)

  • Images are displayed in preview. You can get at the fields and complete extracted text using the popup menu.
  • The preview window popup menu has a "save to file" entry to write a subdocument (ie: mail attachement) to a file.
  • The GUI advanced search panel allows specifying a field for each entry (ie: author/recipient, etc).
  • It is now possible to anchor searches to the beginning or end of the text or field, by using ^ and $ characters at the beginning or the end of a term or phrase. A maximum distance can be specified as a phrase slack either in the advanced search panel, or as a query language modifier, ie: "^beginterm"o10 would search for beginterm within 10 terms of the beginning of the text. This feature was suggested to me (thanks Gökhan), for searching for a name at the beginning of a text (in the author list, as opposed to anywhere in the text). This is useful for example in the very common case where the metadata for the author list was not created. More details about this feature are to be found in the user manual.
  • It is possible to configure the result list snippet separator, given as an html fragment. This is an ellipsis by default (…).
  • We can now perform negative directory filtering (-dir:/some/dir), to return all results except those from the specified directory (recursive). Other attempts at still impossible negative searches (ie: -mime:) now cause explicit errors messages instead of lame results. The inverted directory filtering is accessible from the query language and by checking a checkbox in the advanced search panel.
  • Result table:
  • The detail area now has a popup menu similar to the one in the result list (open parent, save to disk etc.).
  • The result table header popup menu has an entry to save the table as a CSV file.
  • Estimated result counts are displayed in the status line.
  • Set row height according to default font size, and better adjust row height and vertical text position in cells.
  • It is now possible to set an increased weight for indexing some fields. The title fields gets a boost by default. See the fields default file for details.
  • The query language allows setting weights on terms, ie, as in: "important"2.5 .
  • Improved preservation of indentation for text files displayed in the preview window.
  • Show hidden (dot) files in the indexing configuration GUI dialogs.
  • Added filters for .war (Konqueror web archive), .mhtm (other web archive format) and rar archives.
  • Improved handling for native cjk punctuation signs.
  • Updated the list of native apps in the default mimeview (ie: xv->gwenview, rox->dolphin, etc.)
  • Added -f option to recollindex to ignore skippedPaths/Names when used with -i. Allows the use of a purely external file selection mechanism.
  • The performance of email indexing has been slightly improved (less CPU usage).
  • Real time indexer: several configuration parameters allow adjusting the timing of indexing actions:
  • monauxinterval: the interval between auxiliary databases rebuilds (stemdb, aspell).
  • monixinterval: The waiting period during which indexing events are accumulated prior to actual indexing (saves work on duplicate events).
  • mondelaypatterns: a list of file patterns for which indexing should be delayed longer (quick changing files like logs that should be reindexed much slower than they change).
  • See the default configuration file for more detail.
  • Fixed bugs:
  • UTF-8 paths inside ZIP archives were mishandled. Also fixes problem with colons inside archive member paths.
  • Fixed GUI result list doc parent operations (open/preview) which were broken in 1.15.
  • Fixed case where indexing could hang or crash after an error occured while indexing an archive member (which should have affected only the relevant document).
  • Real time indexer: uncontrolled concurrent access to the global configuration could cause a startup crash (mostly of big file trees because of timing issues).
  • Fixed sorting by document and file size in the result table.
  • Email messages for which there would be an error indexing an attachment would not be indexed at all.
  • Text files bigger than 2 GB could not be indexed.
  • Fixed the handling of compressed man pages.
  • Memory usage could grow almost unbounded while deleting documents, because idxflushmb was not used for document deletions.

New in Recoll 1.15.9 (May 30, 2011)

  • Fixes an architecture-dependant startup crash in 1.15.8. No need to upgrade if you are not experiencing it.

New in Recoll 1.15.8 (May 4, 2011)

  • More bug fixes, negative directory filtering, some web archive formats.

New in Recoll 1.15.5 (Mar 7, 2011)

  • Fixes a number of ennoying GUI crashes in 1.15.1 and a few minor indexing issues.

New in Recoll 1.15.2 (Feb 15, 2011)

  • 1.15.2 fixes the GUI startup crash described below, and an issue with very long path elements, which manifested itself mainly while indexing the Beagle queue.

New in Recoll 1.15.1 (Feb 4, 2011)

  • The GUI has a new display for the result list inside a spreadsheet-like table, with configurable columns and sort by any column. The table header right-click menu has the reset sort function and column adding/removing.The "classical" look is still there, you can dynamically switch between list and table by clicking the table-like icon in the toolbar.
  • The old sort tool is gone. There are now vertical arrows in the tool bar to directly sort by date ascending or descending, which was its only significant use.
  • We added duplication indicators to the result list when results are collapsed because they have bit-for-bit identical contents. The indicater is the collapse count in parentheses, displayed just after the relevancy percentage.
  • There is now a menu entry to clear search history.
  • File name search: it used to be that multiple fragments where OR'ed together to perform the search. They are now AND'ed, which makes more sense in many cases (remembering several fragments of the file name but not the order), but means that a search for *.doc *.odt will always fail. Use ext:doc OR ext:odt instead.
  • Autophrase now works with the query language where it makes sense.
  • Support for lyrics in midi karaoke files. Works better with the Python chardet character encoding identification module.
  • Support newer Purple/Pidgin logs using an html format.
  • Support thunderbird extreme brokiness in handling the mbox format: naked "^From $" separators are now accepted with mhmboxquirks = tbird.
  • A change of method for filtering on directory location makes it much more efficient and faster.
  • The utf-8 file name is now a stored field by default.
  • We now catch all exceptions in Python filters to avoid crash reports from the system on benign filter failures.
  • Indexing now creates a lock/pid file inside the configuration directory.
  • Fixed parallel build issue on FreeBSD.

New in Recoll 1.14.3 (Nov 25, 2010)

  • New GNU info filter.
  • Improved Thunderbird mail indexing.
  • Other small bug fixes.

New in Recoll 1.14.0 (Sep 20, 2010)

  • Date searches and filtering, arbitrary email header indexing, new audio tag extractor based on the Mutagen Python library, and miscellaneous other improvements.

New in Recoll 1.13.04 (Apr 16, 2010)

  • It fixes a nasty bug (broken stemming) in 1.13.02.

New in Recoll 1.13.0 (Jan 6, 2010)

  • Recoll has a new class of persistent external filters with the capability to process several documents, or multi-document files, in the same instance. Benefits: much faster image tag indexing, and new file formats. Except for the Perl image tag filter (because of ExifTool), the new filters are written in Python.
  • New file formats: chm (microsoft help), zip archives, .ics calendar files. Individual pages in chm files are indexed and can be previewed. Zip is quite convenient for maildir archives (for example).
  • Recoll can now use the output of the Beagle Firefox plugin to index visited web pages and bookmarks. This is only usable if Beagle itself is not running, else Recoll and Beagle will be fighting for the same queue.
  • Big text files (like application logs) can now be paged for indexing, avoiding excess memory usage during indexing and improving the usability at query time. They can also be altogether skipped by setting a maximum size configuration parameter. These parameters have default values (1 MB and 20 MB) which change Recoll behaviour compared to previous versions. You can set textfilepagekbs and textfilemaxmbs to -1 in the configuration to restore the old behaviour.
  • A cache was implemented for mbox message header offsets. This speeds up message previews for big mbox files.
  • Miscellaneous usability improvements:
  • Allow using page-up/down and shift-home to scroll the result list while the focus is in the search entry.
  • Make 'Use desktop preferences' the default for new Recoll installations, and make this choice more prominent in the external viewer dialog.
  • ^P starts the print dialog on a preview window.
  • If a search has no result, alternate spellings are suggested. This feature is still a bit raw and will be improved.
  • If the text of a document is empty, preview will switch to displaying the document fields.
  • New entry in the result list contextual menu for opening the parent document of a result list hit with its native application. Useful for exemple for pages inside chm files.
  • Indentation is now preserved when displaying text documents inside the preview window. This is particularly welcome for program source files.
  • Allow substituting arbitrary fields in the result paragraph, using a %(fieldname) syntax
  • The real-time indexing monitor will now accumulate modifications during 30 S before indexing.
  • The indexer can now split camelCase words, allowing search on component terms. This is not enabled by default as it can confuse phrase searches (ie: "MySQL manual" is matched by phrase queries for "my sql manual" and "MySQL manual" but not "mysql manual"). Use "configure --enable-camelcase" to activate it.
  • The ipath is now printed by default after the url in the default result list format.
  • recoll_noindex and skippedNames can now be changed at any point in the tree (only for topdirs previously).
  • Allow using location/application sensitivity in external viewer choice. This uses several new functions:
  • Allow the substitution of arbitrary document fields inside external viewer command line arguments.
  • Allow field values to be set on all documents in a file system subtree. For example, you can set an application tag (ie: rclaptg = gnus) on all mailbox files under a specific directory.
  • New syntax in mimeview for including the rclaptg field in viewer choice (mimetype|tagvalue = ...).
  • Allow specifiying a specific default character set for mail messages. This is mainly useful for readpst dumps. All reasonable non-ascii messages specify their character set.
  • Added a --without-gui configure option. Removes all X11 and Qt dependancies and only compiles the command-line interface.
  • Improved the kio_recoll build. There is no need to run configure manually in the main directory any more. Ubuntu packages for kio_recoll are now built on the recoll-backports PPA on launchpad.net.

New in Recoll 1.12.3 (Oct 29, 2009)

  • Bug fixes

New in Recoll 1.12.2 (Oct 22, 2009)

  • Bug fixes

New in Recoll 1.12.1 (Jul 22, 2009)

  • Fixed compilation errors for new gcc and gnu libc.
  • Use groff html output in rclman to get rid of control characters in output (improve manual pages indexing). Fix 8bit character issues in file names in rcllyx.
  • Fixed command line arguments processing problem with "recoll -q"

New in Recoll 1.12.0 (Feb 11, 2009)

  • New KDE KIO slave module, collapsing of identical results, context-sensitive F1 help, saving email attachments and other embedded documents to files, and other small improvements and bug fixes.

New in Recoll 1.11.0 (Oct 25, 2008)

  • easy filtering of results by document type, nicer previews which use html when possible, python programming interface for indexing and searching, better support for the Xesam user query language, new filter framework, better support for arbitrary field indexing and searching.

New in Recoll 1.10.5 (Sep 2, 2008)

  • This release brings OpenXML filters (for formats such as docx) and improved identification of email attachments.
  • A few bugs were fixed: a rare problem in processing Thunderbird mailbox files, an indexer crash on extremely long file names (longer than 240), and incorrect preview highlighting for Near/Phrase searches.