Softpedia
 


LINUX CATEGORIES:



GLOBAL PAGES >>
NEWS ARCHIVE >>
SOFTPEDIA REVIEWS >>
MEET THE EDITORS >>
WEEK'S BEST
  • Linux Kernel 3.9.3 / 3....
  • LibreOffice 3.6.6 / 4.0.3
  • MPlayer 1.1.1
  • systemd 204
  • Arch Linux 2013.05.01
  • Blender 2.67a
  • KDE Software Compilatio...
  • CrunchBang Linux Stable...
  • Elementary OS 0.1 / 0.2...
  • SystemRescueCd 3.6.0
  • Home > Linux > Programming > Libraries

    Statistics::CalinskiHarabasz 0.01

    Download button

    No screenshots available
    Downloads: 360  View global page NEW!  Tell us about an update
    User Rating:
    Rated by:
    NOT RATED
    0 user(s)
    Developer:

    License / Price:

    Last Updated:

    Category:
    Anagha Kulkarni | More programs
    Perl Artistic License / FREE
    May 22nd, 2007, 10:05 GMT
    ROOT / Programming / Libraries

     Read user reviews (0)  Refer to a friend  Subscribe

    Statistics::CalinskiHarabasz description

    Statistics::CalinskiHarabasz is a Perl extension to the cluster stopping rule proposed by Calinski and Harabasz (C&H).

    Statistics::CalinskiHarabasz is a Perl extension to the cluster stopping rule proposed by Calinski and Harabasz (C&H).

    SYNOPSIS

    use Statistics::CalinskiHarabasz;
    &ch(InputFile, "agglo", 10);

    Input file is expected in the "dense" format -
    Sample Input file:

    6 5
    1 1 0 0 1
    1 0 0 0 0
    1 1 0 0 1
    1 1 0 0 1
    1 0 0 0 1
    1 1 0 0 1

    C&H use the Variance Ratio Criterion which is analogous to F-Statistics to estimate the number of clusters a given data naturally falls into. They minimize Within Cluster/Group Sum of Squares (WGSS) and maximize Between Cluster/Group Sum of Squares (BGSS)

    EXPORT

    "ch" function by default.

    INPUT

    InputFile

    The input dataset is expected in "dense" matrix format. The input dense matrix is expected in a plain text file where the first line in the file gives the dimensions of the dataset and then the dataset in a matrix format should follow. The contexts / observations should be along the rows and the features should be along the column.

    eg:
    6 5
    1 1 0 0 1
    1 0 0 0 0
    1 1 0 0 1
    1 1 0 0 1
    1 0 0 0 1
    1 1 0 0 1

    The first line (6 5) gives the number of rows (observations) and the number of columns (features) present in the following matrix. Following each line records the frequency of occurrence of the feature at the column in the given observation. Thus features1 (1st column) occurs once in the observation1 and infact once in all the other observations too while the feature3 does not occur in observation1.

    ClusteringMethod

    The Clustering Measures that can be used are: 1. rb - Repeated Bisections [Default] 2. rbr - Repeated Bisections for by k-way refinement 3. direct - Direct k-way clustering 4. agglo - Agglomerative clustering 5. graph - Graph partitioning-based clustering 6. bagglo - Partitional biased Agglomerative clustering

    K value

    This is an approximate upper bound for the number of clusters that may be present in the dataset. Thus for a dataset that you expect to be seperated into 3 clusters this value should be set some integer value greater than 3.

    OUTPUT

    A single integer number which is the estimate of number of clusters present in the input dataset.

    Product's homepage

    Requirements:

    · Perl
    · This module uses suite of C programs called CLUTO for clustering purposes. Thus CLUTO needs to be installed for this module to be functional.

      


    TAGS:

    cluster stopping rule | cluster statistics | Perl module | Statistics::Calinski | cluster | Calinski

    Go to top

    WindowsGamesDriversMacLinuxScriptsMobileHandheldNews

    SUBMIT PROGRAM   |   ADVERTISE   |   GET HELP   |   SEND US FEEDBACK   |   RSS FEEDS   |   UPDATE YOUR SOFTWARE   |   ROMANIAN FORUM