Text::NSP::Measures::2D::MI::ps is a Perl module that implements Poisson-Stirling measure of association for bigrams.
my $npp = 60; my $n1p = 20; my $np1 = 20; my $n11 = 10;
$ps_value = calculateStatistic( n11=>$n11,
if( ($errorCode = getErrorCode()))
print STDERR $errorCode." - ".getErrorMessage()."n"";
print getStatisticName."value for bigram is ".$ps_value."n"";
The log-likelihood ratio measures the devitation between the observed data and what would be expected if < word1 > and < word2 > were independent. The higher the score, the less evidence there is in favor of concluding that the words are independent.
Assume that the frequency count data associated with a bigram < word1 >< word2 > as shown by a 2x2 contingency table:
word1 n11 n12 | n1p
~word1 n21 n22 | n2p
np1 np2 npp
where n11 is the number of times < word1 >< word2 > occur together, and n12 is the number of times < word1 > occurs with some word other than word2, and n1p is the number of times in total that word1 occurs as the first word in a bigram.
The expected values for the internal cells are calculated by taking the product of their associated marginals and dividing by the sample size, for example:
np1 * n1p
The poisson stirling measure is a negative lograthimic approximation of the poisson-likelihood measure. It uses the stirlings firmula to approximate the factorial in poisson-likelihood measure.
Posson-Stirling = n11 * ( log(n11) - log(m11) - 1)
which is same as
Posson-Stirling = n11 * ( log(n11/m11) - 1)
calculateStatistic() - This method calculates the ps value
INPUT PARAMS : $count_values .. Reference of an hash containing the count values computed by the count.pl program.
RETURN VALUES : $poissonStirling .. Poisson-Stirling value for this bigram.
getStatisticName() - Returns the name of this statistic
INPUT PARAMS : none
RETURN VALUES : $name .. Name of the measure.