Softpedia
 


LINUX CATEGORIES:



GLOBAL PAGES >>
NEWS ARCHIVE >>
SOFTPEDIA REVIEWS >>
MEET THE EDITORS >>
WEEK'S BEST
  • Linux Kernel 3.9.3 / 3....
  • LibreOffice 3.6.6 / 4.0.3
  • MPlayer 1.1.1
  • systemd 204
  • Arch Linux 2013.05.01
  • Blender 2.67a
  • KDE Software Compilatio...
  • CrunchBang Linux Stable...
  • Elementary OS 0.1 / 0.2...
  • SystemRescueCd 3.6.0
  • Home > Linux > Programming > Libraries

    Gensim 0.8.6

    Download button

    No screenshots available
    Downloads: 728  Tell us about an update
    User Rating:
    Rated by:
    NOT RATED
    0 user(s)
    Developer:

    License / Price:

    Last Updated:

    Category:
    Radim Rehurek | More programs
    LGPL / FREE
    September 19th, 2012, 14:27 GMT
    ROOT / Programming / Libraries

     Read user reviews (0)  Refer to a friend  Subscribe

    Gensim description

    Python Framework for Topic Modeling

    Gensim is a library written in Python, for unsupervised learning from raw, unstructured digital texts. It provides a framework for learning hidden (*latent*) corpus structure. Once found, documents can be succinctly expressed in terms of this structure, queried for topical similarity and so on.

    If the previous paragraph left you confused, you can read more about unsupervised document analysis at e.g. `Wikipedia < http://en.wikipedia.org/wiki/Latent_semantic_indexing >`_.

    Gensim's target audience is the NLP research community and interested general public; gensim is not meant to be a production tool for commercial environments.

    Creation of gensim was motivated by a perceived lack of available, scalable software frameworks that realize topic modeling, and/or their overwhelming internal complexity. You can read more about the motivation in our `LREC 2010 workshop paper < http://www.fi.muni.cz/~sojka/lrec2010/dml_lrec.pdf >`_.

    The principal design objectives behind gensim are:

    1. Straightforward interfaces and low API learning curve for developers, facilitating modifications and rapid prototyping.
    2. Memory independence with respect to the size of the input corpus; all intermediate steps and algorithms operate in a streaming fashion, processing one document at a time.


    Product's homepage

    Here are some key features of "Gensim":

    · Memory independence -- there is no need for the whole text corpus (or any intermediate term-document matrices) to reside fully in RAM at any one time.
    · Provides implementations for several popular topic inference algorithms, including Latent Semantic Analysis (LSA, LSI) and Latent Dirichlet Allocation (LDA), and makes adding new ones simple.
    · Contains I/O wrappers and converters around several popular data formats.
    · Allows similarity queries across documents in their latent, topical representation.

    Requirements:

    · Python

      


    TAGS:

    topic modeling | Python framework | Python library | topic | modeling | framework

    Go to top

    WindowsGamesDriversMacLinuxScriptsMobileHandheldNews

    SUBMIT PROGRAM   |   ADVERTISE   |   GET HELP   |   SEND US FEEDBACK   |   RSS FEEDS   |   UPDATE YOUR SOFTWARE   |   ROMANIAN FORUM