Text::Compare is a Perl for language sensitive text comparison.
SYNOPSIS
use Text::Compare;
# the instant way:
my $tc = new Text::Compare( memoize => 1, strip_html => 0 );
my $sim = $tc->similarity($text_a, $text_b);
#$sim will be between 0 and 1
# second way (cache lists):
my $tc2 = new Text::Compare( strip_html => 1 );
# make a language sensitive word hash:
my %wordhash = $tc2->get_words($some_text);
$tc2->first_list(%wordhash);
foreach my $list (@wordlists) {
#list is a hashref
$tc2->second_list($list);
print $tc2->similarity();
}
# third way (cache texts)
my $tc3 = new Text::Compare();
$tc3->first($some_text);
$tc3->second($some_other_text);
print $tc3->similarity;
Text::Compare is an attempt to write a high speed text compare tool based on Vector comparision which uses language dependend stopwords.
Text::Compare uses Lingua::Identify to find the language of the given texts, then uses Lingua::StopWords to get the stopwords for the given language and finally uses Linuga::Stem to find word stems.
Product's homepage
Requirements:
· Perl