LanguageTool Changelog

What's new in LanguageTool 4.5

Mar 31, 2019
  • Improved error detection for Catalan, Dutch, English, Galician, German, Portuguese, Russian, and Ukrainian.

New in LanguageTool 4.3 (Sep 26, 2018)

  • New version comes with improved error detection for English, German, Dutch, Portuguese, and other languages.

New in LanguageTool 2.9 (Mar 30, 2015)

  • Catalan:
  • updated POS tag dictionary
  • added new rules
  • fixed false alarms
  • English:
  • Added a few rules and fixed a few false alarms
  • Added many new style rules contributed by Heikki Lehvaslaiho. As these may cause false alarms, they are not activated by default. You can activate them by turning on all rules in the new 'Plain English' category.
  • Esperanto:
  • added a few new rules
  • French
  • updated POS tag dictionary and Hunspell dictionary to Dicollecte-5.3
  • German:
  • added a few new rules and fixed false alarms
  • Added a new rule that checks for subject verb agreement. For now, only cases with 'ist', 'sind', 'war', and 'waren' are supported. Example for errors that are detected: 'Der Hund sind schön.', 'Die Autos ist schnell.' To make this rule work, phrases are now unified in disambiguation.xml: for example, 'Mann' in the phrase 'ein Mann' will only retain its nominative reading (SUB:NOM:SIN:MAS), whereas it used to have also accusative and dative readings (SUB:AKK:SIN:MAS, SUB:DAT:SIN:MAS). (https://github.com/languagetool-org/languagetool/issues/233)
  • Italian:
  • improved a few rules
  • Polish:
  • added several new rules
  • Portuguese:
  • added/improved several rules
  • 3695 compound words (pre-reform) - the largest free database
  • Russian:
  • added and improved rules
  • Ukrainian:
  • big dictionary update
  • new grammar rules
  • new simple replace rule for soft suggestions
  • disambiguator improvements
  • compound tagging and spelling improvements
  • initials tagging improvements
  • sentence and word tokenizing improvements
  • improved handling of stres symbol and soft hyphen
  • Bitext rules:
  • added a simple rule for checking whether translations end with the same punctuation mark as the original (this includes only .?! characters).
  • it is now possible to add external bitext rule files on the command line, by using
  • -bitextrule option. The file path has to be absolute. Note: this allows using bitext rules also for languages that have no bitext rules included by default.
  • Spelling:
  • The new files at /hunspell/spelling.txt can be used to add accepted words to the spell checker that are also considered when creating suggestions for misspelled words. This is similar to the /hunspell/ignore.txt files, which list accepted words which are *not* used when creating suggestions for misspelled words.
  • API:
  • JLanguageTool.activateDefaultPatternRules() and JLanguageTool.activateDefaultFalseFriendRules() have been removed - all pattern rules and false friend rules (if a second language is specified) are now activated automatically when the constructor of JLanguageTool is called. Should you need a checker without the XML-based pattern rules, extend your language class (e.g. 'English') with one that overwrites the getPatternRules() method and returns an empty list there.
  • ManualTagger.lookup() has been replaced by ManualTagger.tag() after being deprecated since the latest release
  • All static methods and fields from class 'Language' have been moved to the new class 'Languages'. For now, the methods/fields in class Language still exist but have been deprecated.
  • LanguageIdentifierTools has been removed. Use LanguageIdentifier instead.
  • Removed (Default)ResourceDataBroker.setResourceDir() and setRulesDir() as these can be set with the constructor
  • Cleaned up up class Contributor, e.g. removing getRemark()
  • Category.setDefaultOff() has been removed, this can be set via constructor now
  • Renamed classes: o.lt.rules.patterns.Element => o.lt.rules.patterns.PatternToken o.lt.rules.patterns.ElementMatcher => o.lt.rules.patterns.PatternTokenMatcher
  • Other small API cleanups that shouldn't affect the common use cases, e.g. IncorrectExample.getCorrections() returns and unmodifiable list now, removal of deprecated methods.
  • Embedded server:
  • XML escaping has been fixed, this could cause invalid XML documents to be returned
  • new config file option 'maxWorkQueueSize' that lets you set the maximum size of the request queue - if it gets larger than this, requests will be rejected (503 Service unavailable)
  • The server now responds with more specific HTTP status codes to these error conditions: 413 Request Entity Too Large - if text exceeds maximum text size 503 Service Unavailable - if check exceeds maximum check time
  • GUI:
  • The stand-alone GUI can now take a plain text file as an argument, this file will then be loaded on startup (Github issue #232).
  • Command-line:
  • It is now possible to add an external rule file when calling LanguageTool from the command line. Use --rulefile to add a file. If the file name has a format that contains a language name, it will be used alongside other rules; otherwise, it will replace the rules. You can also load an external file with false friends by using the option --falsefriends . The file name should be an absolute file path, and false friend files are always added to the ones that are loaded for the language. (Github issue #192)
  • Rule syntax:
  • A rule may now have a single example sentence as long as it has a 'correction' attribute - this can save some redundancy if the only correct sentence is the same as the incorrect sentence with the correction applied. Before, a rule needed at least two example sentences.
  • 'example' element: type="incorrect" is now optional if there's a 'correction' attribute. The 'correction' attribute implies that the sentence is incorrect.
  • 'example' element: type="correct" is now optional. No 'type' attribute and no 'correction' attribute implies that the sentence is correct.
  • Internal:
  • We have switched from Apache Tika to language-detector (https://github.com/optimaize/language-detector) for automatically identifying the text language. It should be faster and results should be more reliable. Detection of Asturian and Galician had to be disabled because the detection quality was too low and also affected detection of Spanish.
  • Fixed a regression that made it impossible to load external rule files in the GUI.

New in LanguageTool 2.8 (Dec 30, 2014)

  • Asturian:
  • removed dependency on Hunspell, now uses Morfologik for spell checking
  • Breton:
  • added and improved a few rules
  • Catalan:
  • updated dictionary
  • added and improved rules
  • fixed false alarms
  • Dutch:
  • added and improved many rules
  • English:
  • some new rules (thanks to Nick Hough)
  • updated the tagger and synthesizer dictionaries, fixing issue #202
  • new filter to be used for matching the partofspeech of parts of words, e.g.: in.* This will only keep matches for words that start with 'in' and where the part after the 'in' is an adjective (POS tag 'JJ'). The 'no:1' is the token position, i.e. here the first (and only) matching is referred to.
  • French:
  • added and improved a few rules
  • German:
  • added and improved a few rules
  • Polish:
  • added and improved several rules
  • added and improved false friends with English
  • Portuguese:
  • added/improved several rules
  • Spanish:
  • removed dependency on Hunspell, now uses Morfologik for spell checking
  • Reformatted rules file
  • Added more rules
  • Tagalog:
  • removed dependency on Hunspell, now uses Morfologik for spell checking
  • the dash character ("") is a delimiter now when tokenizing the text
  • Russian:
  • added and improved rules
  • added a few false friend rules (Russian/English)
  • Ukrainian:
  • many new rules (including agreement with nouns, time expressions etc)
  • rule coverage improvement
  • dictionary update (big improvements for proper nouns and vocative case)
  • new tag and rule to warn about alternative spelling
  • added word frequency information to improve spelling suggestions
  • some new disambiguator rules
  • Rule Syntax:
  • ... can now be added to a rulegroup to affect all the rules of that group
  • If you develop your own rules that are not part of LT you can now add external="yes" to your categories to prevent the rule link to community.languagetool.org from appearing in our standalone GUI (the link would not work for rules that are not part of the main distribution of LT). (Github issue #223)
  • If a rule group specifies default="off", the rules inside the rule group may not also specify default="on"/"off".
  • API:
  • Removed classes and methods that had been deprecated since 2.7 or longer
  • Embedded server:
  • The config file options 'requestLimit' and 'requestLimitPeriodInSeconds' can now also be used for the HTTP server (not just for the HTTPS server)
  • New config file option 'trustXForwardForHeader': set this to 'true' if you run the server behind a reverse proxy and want the request limit to work on the original IP addresses provided by the 'Xforwardedfor' HTTP header, usually set by the proxy. If you run behind a proxy but don't set this property to true, one user can use all the requests so other users will also get an error message because of the request limit.
  • Fix response of After the Deadline mode: ... was sometimes empty, confusing the text check in WordPress
  • Bitext rules were not disabled properly, even if they were specified with a proper parameter for the server; now it's fixed
  • Fixed problem with improper positions for some bitext rules (issue #218)
  • GUI:
  • A new 'errorColors' setting has been added to the languagetool.cfg configuration file. It can be used to set the background color of errors. For example, errorColors=typographical:#b8b8ff, style:#ffb8b8 will show 'typographical' errors with a blue background and 'style' errors with a red background in the upper part of the LT window. 'typographical' and 'style' are the types that are set in grammar.xml as "type=...". There's no user interface yet to configure these colors. Note that you should only edit the languagetool.cfg file when LT is not running.
  • Internal:
  • Bugfix: rules inside a rule group had not been activated if a previous rule from the same rulegroup used default="off"
  • Words are not ignored anymore by the spell checker just because they occur in a rule's suggestion. If you want the spell checker to ignore words globally, add them to hunspell/ignore.txt. To ignore them depending on the context, add a 'ignore_spelling' rule to disambiguation.xml.
  • A file 'hunspell/prohibit.txt' can now be used to mark words as spelling errors even if the spell checker would normally accept them. This is useful to improve the LanguageTool spell checker without waiting for the upstream checker to be updated. The 'prohibit.txt' file is the opposite of 'ignore.txt', which causes the spell checker to ignore words.
  • The partofspeech tagger for most languages can now be extended by adding entries to the file org/languagetool/resource/XX/added.txt (XX being the language code). The format is "fullform baseform postag", three columns separated by tabs. This makes it easier for users (and developers) to extend the POS tagger, as they don't need to export, modify, and recreate the binary dictionary for every change.

New in LanguageTool 2.7 (Oct 14, 2014)

  • Breton:
  • added and improved rules
  • New rule that checks if a weekday matches a date, e.g. detects "Gwener 28 a viz Eost 2014", as that date isn't a Friday.
  • Catalan:
  • added and improved rules
  • fixed false alarms
  • Dutch:
  • added and improved many rules
  • switched to Morfologik-based spell checker
  • -English:
  • Do you want to be part of the team that develops the world's most powerful Open Source proofreading tool? We're looking for a maintainer for the English rules in LanguageTool. See http://wiki.languagetool.org/tasks-for-language-maintainers for details.
  • All English dictionaries have been extended to contain word frequency classes to improve the spell checker suggestions (the frequency data is taken from https://github.com/mozilla-b2g/gaia/tree/master/apps/keyboard/js/imes/latin/dictionaries, as for other languages that already use this feature).
  • Better suggestions for English learners: irregular verbs, nouns, and adjectives now usually have a better suggestion. For example, 'thinked' suggests 'thought', 'womans' suggests 'women'.
  • More misspellings provide suggestions now, e.g. 'garentee' (guarantee), 'greatful' (grateful). This may cause a performance decrease of ~ 10% (more for texts with a lot of unknown words).
  • New rule that checks if a weekday matches a date, e.g. detects "Monday, 7 October 2014", as that date isn't a Monday. This rule will only work if it detects the date format in use. So far, these formats are supported: * "Monday, 7 October 2014" * "Monday, 7 Oct 2014" * "Monday, October 7, 2014" * "Monday, Oct 7, 2014" * (this also works with abbreviated week days like Mo or Mon for Monday)
  • Esperanto:
  • New rule that checks if a weekday matches a date, e.g. detects "Vendredon la 28-an de AÅ­gusto 2014", as that date isn't a Friday.
  • French:
  • updated POS tag dictionary and Hunspell dictionary to Dicollecte-5.2
  • added a synthesizer - the agreement rule can now make suggestions for some errors
  • added/improved several rules
  • New rule that checks if a weekday matches a date, e.g. detects "vendredi 28/08/2014", as that date isn't a Friday.
  • German:
  • Fixed a rare NullPointerException and an ArrayIndexOutOfBoundsException
  • Fixed several false alarms
  • Added and improved rules
  • New rule that checks for sentences without a verb (turned off by default due to the risk of false alarms)
  • New rule that checks if a weekday matches a date, e.g. detects "Dienstag, 29.9.2014", as that date isn't a Tuesday.
  • Performance improvements for spell check suggestions
  • Persian:
  • added initial support for Persian (Farsi)
  • Polish:
  • added and improved some rules
  • new rule that checks if a weekday matches a date
  • Portuguese:
  • added/improved several rules
  • added many dozens of compound words
  • Russian:
  • added new rules
  • fix SourceForge feature request #38 (check for different quotation marks)
  • added a few false friend rules (Russian/English)
  • new rule that checks if a weekday matches a date, e.g. detects "понедельник, 30 сентября 2014 г.", as that date isn't a Monday.
  • expanded Russian compound rule with new words from postag dictionary
  • Spanish:
  • Added new POS category Z (for spelled numbers, e.g. 'uno', 'dos', ...)
  • Spelled numbers can now be detected and managed both in disambiguation and rules.
  • Fixed some incorrect lemmas in POS dictionary.
  • Added Hybrid chunker-disambiguator.
  • Tamil:
  • Added initial support for Tamil. If the font for Tamil is not properly displayed on your computer and you're using Windows, you might need to apply the work around described here: https://bugs.openjdk.java.net/browse/JDK-8008572
  • Ukrainian:
  • big update for POS dictionary (fixes and new words)
  • some POS tag renamed for consistency; new tags for abbreviations and rare words
  • many new rules and fixes for existing rules
  • new rule that checks if a weekday matches a date, e.g. detects "понеділок, 7 жов 2014", as that date isn't a Monday
  • token normalization performance improvement
  • LibreOffice integration:
  • Don't get confused by footnotes in LibreOffice 4.3 and later (it now provides us with the footnote positions as meta data, so we can ignore them).
  • API:
  • Major performance improvements for the multi-thread use case, where JLanguageTool gets created per thread, but the language object (e.g. 'German') gets created only once. Overhead for creating JLanguageTool should now be much lower.
  • Removed several classes and methods that had been deprecated since version 2.6
  • Removed DutchSpellerRule - use MorfologikDutchSpellerRule instead
  • The signature of Language.getRelevantRules() has changed
  • The JLanguageTool and MultiThreadedJLanguageTool constructors don't declare to throw an IOException anymore
  • WhitespaceRule has been renamed to MultipleWhitespaceRule (WhitespaceRule still exists but has been deprecated)
  • Deprecated some methods whose visibility will be decreased (e.g. from public to protected)
  • MorfologikSpellerRule.getRuleMatch(String, int) has been renamed to MorfologikSpellerRule.getRuleMatches(String, int)
  • The RuleMatch constructor now throws an exception if toPosition is not larger than fromPosition
  • Introduced a new abstract class TextLevelRule that extends Rule and that can be used for rules that cover more than single sentences.
  • Command line:
  • Enabling and disabling specific rules at the same time is now allowed. In order to test only some rules (disabling all the rest), which previously was done with "--enable LIST_OF_RULES", now use "--enabledonly --enable LIST_OF_RULES" (or "-eo -e LIST_OF_RULES").
  • Embedded server:
  • Two new options can be set in the properties file to make LanguageTool return the same XML format as After the Deadline (AtD). This way it can be used as a drop-in replacement for AtD: * mode - 'LanguageTool' or 'AfterTheDeadline' * afterTheDeadlineLanguage - code of default language if mode is set to 'AfterTheDeadline' NOTE: the 'AfterTheDeadline' mode should be considered experimental for now.
  • The new option 'maxCheckThreads' allows setting the maximum number of threads working on requests in parallel. The default is 10, as it used to be.
  • Internals:
  • New abstract rule AbstractDateCheckFilter that allows to check if a week day and date match. For example "Tuesday, September 29, 2014" could be detected, as September 29, 2014 is not actually a Tuesday. This uses the new experimental RuleFilter interface that can be called from XML with the new 'filter' element. 'filter' takes these attributes: 'class': the fully-qualified name of a Java class that implements RuleFilter, e.g. "org.languagetool.rules.de.DateCheckFilter" 'args': a string like "year:\1 month:\2 day:\3 weekDay:\4", i.e. a space-separated list of key/value pairs, where \x gets resolved to the pattern's token value (as in the 'message' element)
  • The compound rule now ignores tokens that have been immunized in the disambiguation.xml
  • The "filter" action in the disambiguator is now applied only to POS tags that match the POS tag given. If they don't match, the rule is not applied.
  • If you're extending the XML rules as described at http://wiki.languagetool.org/tips-and-tricks#toc2, the external rule and disambiguation files can now be hosted on a password-protected server by specifying an URL like this: http://user:[email protected]/path/user-rules.xml
  • The em dash ("—") is now a tokenizing character for all languages
  • New feature:
  • Use of language models
  • LanguageTool can now make use of ngram data. ngram data is information about how often phrases occur in a language. Currently, this uses phrases of length 3.
  • The data is used by an English rule to find homophone errors, like mixing up coarse/course or flair/flare. LanguageTool had some rules of this kind before, but the new rule now supports about 900 of such word pairs/sets.
  • The data needed for this is huge (7GB for English) and thus not part or LanguageTool.
  • The data (English only for now) and more documentation is available at http://wiki.languagetool.org/finding-errors-using-big-data
  • Using ngrams makes LanguageTool slightly slower when the data is stored on an SSD.
  • If not stored on an SSD, the performance might drastically decrease.
  • Use the new --languagemodel option with the command line client to activate the rule that uses the data. That option is not yet available for the stand-alone GUI.

New in LanguageTool 2.4.1 (Jan 10, 2014)

  • Updated Morfologik libraries to 1.8.3 to fix slow suggestions in the spell checker, which affected at least en-US

New in LanguageTool 2.4 (Jan 3, 2014)

  • Breton:
  • SRX sentence tokenization
  • added/improved a few rules
  • fixed some false alarms
  • fixed incorrect suggestions thanks to added tests on corrections
  • Catalan:
  • added/improved several rules
  • fixed false alarms
  • made additions and fixes to the tagger dictionary
  • removed some words from synthesis dictionary (see filterarchaic.txt)
  • added frequency data to the tagger dictionary; frequency wordlist comes from the Gaia project, with Apache License, version 2.0 (https://github.com/mozillab2g/gaia/tree/master/keyboard/dictionaries).
  • English:
  • added/improved a few rules
  • fixed some false alarms
  • French:
  • added/improved several rules
  • fixed some false alarms
  • German:
  • added/improved several rules
  • added a synthesizer the agreement rule can now make suggestions for some errors (not all suggestions are correct, though)
  • Polish:
  • added/improved several rules, especially for hyphen and dash usage
  • added frequency information for spellchecking dictionary; frequency wordlist comes from the Gaia project, with Apache License, version 2.0 (https://github.com/mozillab2g/gaia/tree/master/keyboard/dictionaries).
  • fixed some false alarms
  • Portuguese:
  • added/improved several rules (it now includes gender rules "a"/"o")
  • it now has 3400+ compound words
  • the JAR file has been renamed to languagetool.jar, from formerly languagetoolstandalone.jar to avoid confusion about what 'standalone' means in this context (github issue #29)
  • for languages with many rules (like French or German) performance on long texts has been increased by about 2030%
  • fix for threadsafety (could cause hang in MultiWordChunker)
  • fixed a bug where chunk annotations were not tested in groups
  • fix: \1 and had not been evaluated in ...
  • fixed a bug in the unification mechanism that discarded some of the matching interpretations prematurely
  • added support for chunk annotations in the disambiguator, and fixed one bug in filtering tokens with chunk annotations
  • updated Morfologik libraries to 1.8.2 (bug fixes, stricter input sanity checking, add frequency data to dictionaries)
  • added the option of including frequency data to taggging or spelling dictionaries. The expected format of the frequency wordlists is the one in the Gaia project, with Apache License, version 2.0 (https://github.com/mozillab2g/gaia/tree/master/keyboard/dictionaries)
  • new command line tools to export and create binary dictionaries:
  • org.languagetool.dev.DictionaryExporter
  • org.languagetool.dev.POSDictionaryBuilder
  • LibreOffice/OpenOffice integration:
  • added a workaround for incorrect sentence detection for the case that a footnote appeared after a sentence full stop (Sourceforge bug #191)
  • standalone GUI:
  • The dialog opened by the "More..." item in the context menu of an error will now also display correct and incorrect example sentences
  • API:
  • SentenceTokenizer is now an interface, the implementation has been moved to RegexSentenceTokenizer, but this is deprecated and SRXSentenceTokenizer should be used instead
  • Some methods from org.languagetool.tools.StringTools have been moved to the org.languagetool.gui.Tools class in the languagetoolguicommons project
  • LanguageToolListener.languageToolEventOccured() has been renamed to LanguageToolListener.languageToolEventOccurred()
  • org.languagetool.tools.SymbolLocator isn't public anymore (shouldn't affect anybody)
  • removed DanishSentenceTokenizer which had been deprecated for three years
  • Rule.getCorrectExamples() and Rule.getIncorrectExamples() don't return null anymore but an empty list if there are no examples. Consequently, setCorrectExamples() and setIncorrectExamples() don't accept null anymore.
  • Rule.getId() may return any string now, not just ASCIIonly strings (actually this has been the case before, as the ASCIIonly restriction was never enforced and only mentioned in the javadoc)
  • languagetoolwikipedia: the command line options for checking a Wikipedia dump have been simplified. The command can now be called like this: java jar languagetoolwikipedia.jar checkdata l en f enwiki20130621pagesarticles.xml Call just "java jar languagetoolwikipedia.jar checkdata" to get a usage message. More than one file can be specified with the f option. Additionally to Wikipedia XML dumps, CSV files from Tatoeba (http://tatoeba.org) are now also supported, they need to be filtered first to contain only the relevant language.

New in LanguageTool 2.3 (Oct 4, 2013)

  • Breton:
  • added/improved a few rules
  • fixed false alarms
  • updated POS dictionary from Apertium (svn r47282)
  • Catalan:
  • added support for language code ca-ES-valencia (Catalan Valencian), to be used in LibreOffice 4.2.0
  • added a simple replace rule with hundreds of replacement suggestions
  • added/improved several rules
  • fixed false alarms
  • Chinese:
  • added a workaround for a StringIndexOutOfBoundsException (http://sourceforge.net/p/languagetool/bugs/186/)
  • English:
  • added replacement patterns for the spelling checker to make suggestions better (now offers 'taught' for 'teached')
  • added/improved a few rules
  • French:
  • added/improved a few rules
  • fixed false alarms
  • updated POS tag dictionary and Hunspell dictionary to Dicollecte-4.12
  • German:
  • added/improved several rules
  • Portuguese:
  • added/improved a few rules
  • it now has 3300+ compound words
  • Ukrainian:
  • added/improved several rules
  • the source code has been moved to github: https://github.com/languagetool-org/languagetool
  • LanguageTool requires Java 7 now
  • LanguageTool makes use of multiple threads now for text checking on modern hardware, improving performance (this affects the stand-alone version, the command line version and the LibreOffice/OpenOffice extension)
  • Rule syntax:
  • preliminary support for new min/max attributes that allow to match an element that appears the given number of times. For example: foo will match nothing or "foo", i.e. "foo" is optional foo will match "foo" or "foo foo" foo will match nothing, "foo", or "foo foo" Use max="-1" to allow unlimited occurrences. For min, only 0 or 1 is supported (1 is the default).
  • support for OR-statements. For example: a Internally and in run-time, a rule containing OR-statements is converted into several rules without OR-statements.
  • English now has a chunker to detect, amongst others, singular and plural noun chunks. This is documented at http://wiki.languagetool.org/using-chunks
  • standalone version:
  • The standalone version now underlines errors with a red (spelling errors) or blue (other errors) line (Panagiotis Minos)
  • Remember the language selection for the next start
  • Improved window and dialog placement in a multi-monitor setup
  • embedded server: uses default port (8081) again if started without arguments
  • updated the morfologik-stemming library to version 1.7.1 to enable better suggestions, including proper handling of diacritics and replacement patterns (equivalents of MAP and REP features in hunspell dictionaries)
  • OpenOffice/LibreOffice integration:
  • fix: the "About" dialog didn't work in Apache OpenOffice 4.0
  • fix: country specific rules (like for British English) didn't work
  • API:
  • In class Language, getCountryVariants() has been renamed to getCountries(), and a new method getVariant has been added.
  • Some methods have been deprecated
  • Some methods have been moved from the Tools class (languagetool-core) to the new CommandLineTools class (languagetool-commandline)
  • AbstractRuleDisambiguator has been renamed XmlRuleDisambiguator and is not abstract anymore. The RuleDisambiguator classes have been removed, XmlRuleDisambiguator can be used directly instead.
  • A new method JLanguageTool.check(AnnotatedText) has been introduced that allows you to check text with markup. Use AnnotatedTextBuilder to build up the input.
  • Thread-safety has been improved. The recommended use case is now to create a new JLanguageTool object for each thread, but to create the language only once (e.g. new English()) and use that for all JLanguageTool instances. This changed the API of some public classes, but for the standard use case of checking texts with the JLanguageTool object it shouldn't make a difference. (patch by Stefan Lotties)
  • JLanguageTool.loadFalseFriendRules() now behaves like JLanguageTool.loadPatternRules(): it looks in the class path first, and then, if the given file is not found there, in the filesystem
  • Introduced the Chunker interface that can assign chunks (also known as phrases) to tokens. For example, for noun phrases like "a fast computer" the chunker could assign an 'NP-singular' (noun phrase, singular) chunk to each of the tokens in that phrase. In the grammar.xml, such a token can then be matched with this syntax:
  • The new class MultiThreadedJLanguageTool makes use of as many threads as the computer has processors. In our tests this has improved text checking time by about 70% on an Intel i7 processor when used on 30KB text.
  • AnalyzedTokenReadings now implements Iterable so it can be used in foreach loops
  • AnalyzedGermanTokenReadings has been removed, AnalyzedTokenReadings can be used instead
  • Embedded HTTP server: the server now uses 10 threads instead of 1 (thanks to Panagiotis Minos)
  • text extraction from Wikipedia dumps has been improved

New in LanguageTool 2.2 (Jul 1, 2013)

  • Many error detection rules have been updated, especially for French, Catalan, German, Portuguese, Russian, Esperanto, and Breton.
  • Several small bugs have been fixed.

New in LanguageTool 2.1 (Apr 2, 2013)

  • This version adds many updates for the error detection rules for English, French, German, Portuguese, Catalan, Polish, Russian, Breton, Esperanto, and Italian.
  • LanguageTool is now modular, for easier use by Java developers.
  • Instead of one big JAR, there are now several small ones (soon to be found at Maven Central).
  • Several bugs have been fixed.

New in LanguageTool 2.0 (Jan 3, 2013)

  • Many updates for the error detection rules for English, Spanish, French, German, Portuguese, Russian, Breton, Catalan, Esperanto, and Ukrainian have been added.
  • The embedded HTTP server can now be started from the context menu if LanguageTool is running in the system tray.
  • Some small bugs have been fixed.

New in LanguageTool 1.9 (Oct 1, 2012)

  • Many new error detection rules have been added and existing rules have been updated. Mostly affected languages are Danish, German, English, Catalan, Russian, Chinese, French, Breton, Portuguese, and Esperanto.
  • There is initial support for Japanese, with about 20 rules. Several bugs have been fixed.

New in LanguageTool 1.8 (Jul 2, 2012)

  • Spell checking is now included in the LanguageTool stand-alone version (i.e. not used in LibreOffice/OpenOffice).
  • Many error detection rules have been improved and new rules have been added, especially for German, English, Catalan, Italian, French, Breton, Polish, and Esperanto.
  • Initial support for Greek and Portuguese with a few rules has been added. LanguageTool now supports language variants like British English, American English, Swiss German, etc.
  • Several bugs have been fixed.

New in LanguageTool 1.6 (Jan 1, 2012)

  • Error detection rules have been updated for several languages (Chinese, French, Breton, and others).
  • The Java packages have been renamed from de.danielnaber.languagetool.* to org.languagetool.*.
  • Some small bugs have been fixed.

New in LanguageTool 1.4 (Jun 27, 2011)

  • LanguageTool now requires Java 6.0 or later
  • English: added a few new rules
  • German: added a few new rules
  • French:
  • Updated dictionary to use Dicollecte lexique 4.1
  • Added a few new rules
  • Esperanto:
  • Updated list of transitive verbs
  • SRX sentence tokenization rules
  • Word tokenizer now properly handles words with apostrophe
  • Added a few new rules and fixed false positives
  • Khmer: Added support for Khmer (thanks to Nathan Wells)
  • Russian: added a few new rules
  • GUI:
  • Pressing Ctrl-Return will check the current text
  • Fixed pre-selection of user's language
  • Made screen messages and buttons in Language Module Manager translatable,
  • thanks to Ilona Kuzmickaja.
  • API:
  • enabled bilingual mode for HTTP API; if you use srctext parameter, LT
  • will automatically check in bilingual mode, assuming that mothertongue
  • specifies the source language, and lang target language
  • renamed RuSimpleReplaceRule to RussianSimpleReplaceRule
  • renamed SlovakVes to SlovakVesRule
  • renamed JLanguageTool.paragraphHandling to JLanguageTool.ParagraphHandling
  • removed deprecated GermanSentenceTokenizer
  • Internal changes:
  • Java rules are not loaded dynamically from the classpath anymore,
  • instead every language needs to implement the getRelevantRules()
  • method that returns the rule classes relevant for that language.
  • Removed jaminid dependency for HTTP Server, thanks to Ankit.
  • The HTTP server doesn't block anymore when checking a long text. It
  • now has a longer startup time per call though (50-150ms).
  • Developers using LanguageTool as an API can now configure from which
  • addresses the embedded HTTP server will accept requests
  • Rule development:
  • Extended testrules.sh and testrules.bat so that they can take a
  • language code as argument to only check that language.
  • Thanks to Michael Bryant.
  • For example: sh testrules.sh en

New in LanguageTool 1.3.1 (Mar 30, 2011)

  • This version fixes a bug that might have crashed LanguageTool in some rare cases.

New in LanguageTool 1.3 (Mar 28, 2011)

  • Updated rules for Dutch, thanks to Ruud Baars
  • Updated rules and disambiguator for French
  • Updated the French dictionary to use Dicollecte lexique 4.0.1
  • Updated rules for Russian
  • Added disambiguation Russian rules
  • Added disambiguation support and disambiguation rules for Spanish, reducing false alarms in a significant way.
  • Forked Spanish word tokenizer fixing word recognition when it is after a dash.
  • More and updated Spanish rules, adding multiple suggestions when applicable.
  • Updated Spanish rule descriptions preventing apparently duplicated rules to appear in configuration windows.
  • Revamped short descriptions making them more informative improving Spanish OpenOffice experience.
  • New Spanish rule grouping with new categories improving configuration experience.
  • Updated rules and tagger for Esperanto
  • Updated rules and dictionaries for Polish (to Morfologik 1.8)
  • Small cleanup of rules for English, some new rules
  • Added one exception to Dutch SRX rules to avoid false alarms
  • Bugfix: made loading of external rule files in the GUI work
  • Bugfix: fixed the bug in UNPAIRED_BRACKETS rule that caused false alarms in languages that use the same characters to end different pairs of quotation marks (especially Dutch)
  • Bugfix: fixed the sf bug #3006662 for English (false alarm for "...those ideas came.")
  • Bugfix: fixed the sf bug #3200352 for English (the rule for "can not" is now switched off by default)
  • Bugfix: fixed the sf bug #3163988 for Polish ("PatternSyntaxException: Unmatched closing ')' near index 33 ...")
  • Updated OpenOffice.org help pages for English and Russian

New in LanguageTool 1.2 (Jan 3, 2011)

  • New and updated rules for Romanian, Dutch, Polish, German, Russian, Spanish, French and Danish.
  • Fixed false positives in French rules.
  • Updated the French dictionary to use Dicollecte lexique 3.9.1 provided by Olivier R.
  • Updated the Polish dictionary.
  • Added new scripts testwikipedia.sh and testwikipedia.bat to the distribution. These let you check a local Wikipedia XML dump. This helps to test new rules, especially to avoid false alarms. Get the Wikipedia dumps from http://download.wikimedia.org/backup-index.html. The file you probably want to use is pages-articles.xml.bz2.
  • SRX sentence tokenization rules for German, thanks to Jarek Lipski.
  • SRX sentence tokenization rules for Danish, thanks to Esben Aaberg.
  • Added support for Esperanto.
  • Now it's possible to enable and disable rules in bitext mode from the command line.
  • In bitext mode, the suggestions can now be automatically applied to target text.
  • Fixed API output for bitext mode.
  • Fixed a bug with mark_to and mark_from attributes being ignored for bitext XML rules.
  • Fixed a glitch in API output: now it's always correctly encoded as UTF-8, just what the XML header declares.
  • Fixed sf.bug #3076989: incorrect line numbers for files larger than 64000 bytes.
  • Minor performance fixes.

New in LanguageTool 1.1 (Sep 27, 2010)

  • Many error detection rules have been updated, such as those for French, Dutch, German, English, and Spanish.
  • Preliminary support for checking bilingual texts has been added.
  • Initial support for Malayalam and Belarusian is now available.
  • Several bugs have been fixed.

New in LanguageTool 0.9.9 (May 23, 2009)

  • Fixed a NullPointerException that could appear when using paragraph-level rules (#2787814)
  • Initial support for Icelandic
  • More Dutch, English, Polish, Romanian and Slovenian grammar rules
  • Fixed some bugs with pairing brackets
  • Added sentence tokenizer that uses SRX format for specifying end-of-sentence breaks

New in LanguageTool 0.9.6 (Jan 30, 2009)

  • Support for the new Proofreading API in OpenOffice 3.0.1 has been added.
  • There are new features in the rule disambiguator, such as unification, filtering, deleting, and adding interpretations.
  • Several rules have been fixed or improved.
  • This version only works with OpenOffice 3.0.1, not with 3.0.0.
  • Also, any previously installed release of LanguageTool must be de-installed from OpenOffice.org before upgrading OpenOffice.org.

New in LanguageTool 0.9.5 (Nov 3, 2008)

  • The rules for English and Polish have been updated, crashes with OpenOffice.org integration have been fixed, and false alarms in the German agreement rule have been reduced.

New in LanguageTool 0.9.4 (Oct 1, 2008)

  • A bug with OpenOffice.org 3.0 integration that could lead to crashes has been fixed.
  • The LanguageTool button in OpenOffice.org has been moved to a better place.
  • The error detection rules for French, Swedish, and Russian have been updated.
  • Note that this release only works with OpenOffice.org 3.0rc3 or later.