NoAho 0.9.02

Non-Overlapping Aho-Corasick Trie

  Add it to your Download Basket!

 Add it to your Watch List!

0/5

Rate it!
send us
an update
LICENSE TYPE:
MIT/X Consortium License 
USER RATING:
UNRATED
  0.0/5
DEVELOPED BY:
Jeff Donner
HOMEPAGE:
github.com
CATEGORY:
ROOT \ Text Editing&Processing \ Others
NoAho provides fast, non-overlapping simultaneous multiple keyword search.

Features:
- 'short' and 'long' (longest matching key) searches, both one-off and iteration over all non-overlapping keyword matches in some text.
- Works with both unicode and str in Python 2, and unicode in Python 3 (it's all UCS4 under the hood).
- Allows you to associate an arbitrary Python object payload with each keyword, and supports dict operations len(), [], and 'in' for the keywords (though no del or traversal).
- Does the 'compilation' (generation of Aho-Corasick failure links) of the trie on-demand; you can mix adding keywords and searching text freely.
- Can be used commercially, it's under the minimal, MIT license.

Anti-Features:
- Will not find overlapped keywords (eg given keywords "abcde" and 'defgh", will not find "defgh" in "abcdefgh"; would find both in "abcdedefgh"), unless you move along the string manually, one character at a time, which would defeat the purpose. The package 'Acora' is an alternative package for this use.
- Lacking overlap, find[all]_short is kind of useless.
- Lacks key iteration and deletion from the mapping (dict) protocol
- Memory leaking untested (should be ok but ...)
- No /testcase/ for unicode in Python 2 (did manual test however)
- Unicode chars represented as ucs4, and, each character has its own hashtable, so it's relatively memory-heavy.
- Requires a C++ compiler.

Bug reports and patches welcome of course!

Last updated on March 21st, 2012

requirements

#Non-Overlapping Aho-Corasick #Python library #keyword search #Python #Non-Overlapping #Aho-Corasick #library

Add your review!

SUBMIT