Lingua::TokenParse is a Perl module to parse a word into scored, fragment combinations.
SYNOPSIS
use Lingua::TokenParse;
my $p = Lingua::TokenParse->new(
word => 'antidisthoughtlessfulneodeoxyribonucleicfoo',
lexicon => {
a => 'not',
anti => 'opposite',
di => 'two',
dis => 'separation',
eo => 'hmmmmm', # etc.
},
constraints => [ qr/eo(?:.|$)/ ], # no parts ending in eo allowed
);
print Data::Dumper($p->knowns);
This class represents a Lingua::TokenParse object and contains methods to parse a given word into familiar combinations based on a lexicon of known word parts. This lexicon is a simple fragment => definition list.
Words like "automobile" and "deoyribonucleic" are composed of different roots, prefixes, suffixes, etc. With a lexicon of known fragments, a word can be partitioned into a list of its (possibly overlapping) known and unknown fragment combinations.
These combinations can be given a score, which represents a measure of word familiarity. This measure is a set of ratios of known to unknown fragments and letters.
Product's homepage
Requirements:
· Perl