Products.BigramSplitter 1.0

Supports non-English languages, especially south east Asian languages

  Add it to your Download Basket!

 Add it to your Watch List!


Rate it!

What's new in Products.BigramSplitter 1.0:

  • Adding uninstall script
Read full changelog
send us
an update
GPL (GNU General Public License) 
ROOT \ Internet \ Plone Extensions
Products.BigramSplitter is an add-on search product for Plone 3.x.

Specification: Text character normalization process uses Python unicodedata. Convert full-width numeric and alphabet character into half-width equivalent. Convert half-width Katakana into full-width equivalent. Therefore all of above character variations can be recognized as same ones.

Language Specifications:

 * Chinese

 * No space between words.
 * There is only Kanji(Chinese) character
 * Process with Bigram(2-gram) model

 * Japanese

 * No space between words
 * Combination 0f Kanji(Chinese), Katakana, and Hiragana character

 * Korean

 * There are spaces between words, but it contains a particle
 * Combination of Korean alphabet and Kanji(Chinese) character
 * Discriminate Korean alphabet and Kanji(Chinese) character and processed with Bigram(2-gram) model

 * Thai

 * No space between words
 * It's very difficult to handle this language in a computer
 * A vowel and a consonant are registered in Unicode separately so that it is difficult to recognize as one word.
 * However, there is a possibility of dealing with Thai characters to use Bigram(2-gram) model.

 * Other languages (Including English)

 * There is a space between words
 * It is indexed each word


 * Source Code

 Since no documents are available on how to develop 'word splitter', we refer to other splitter source code. But I still have a number of questions. If you have any more information, please feel free let us know.

 * Hotfix to Plone 3.0 source code

 Because Plone 3.x catalog setting, catalog.xml, doesn't have existing index overwrite mechanism, we developed hotfix and added XML attribute. We believe Plone 3 XML define mechanism is simple and clear, so that we take this approach. We appreciate any comment.


Use zc.buildout

 * Add Products.BigramSplitter to the list of eggs to install, e.g.:

 eggs =

 * Tell the plone.recipe.zope2instance recipe to install a ZCML slug:

 recipe = plone.recipe.zope2instance
 zcml =

 * Re-run buildout, e.g. with:


 * Restart Zope
 * Plone setting -- Add on products -- Quick install

Last updated on December 7th, 2010


#add-on search #Plone search #search product #Plone #BigramSplitter #search #add-on

Add your review!