Softpedia
 


LINUX CATEGORIES:



GLOBAL PAGES >>
NEWS ARCHIVE >>
SOFTPEDIA REVIEWS >>
MEET THE EDITORS >>
WEEK'S BEST
  • Linux Kernel 3.9.2 / 3....
  • LibreOffice 3.6.6 / 4.0.3
  • MPlayer 1.1.1
  • systemd 204
  • Arch Linux 2013.05.01
  • Blender 2.67
  • KDE Software Compilatio...
  • CrunchBang Linux Stable...
  • Elementary OS 0.1 / 0.2...
  • SystemRescueCd 3.6.0
  • Home > Linux > Text Editing&Processing > Indexing

    text-sentence 0.14

    Download button

    No screenshots available
    Downloads: 406  View global page NEW!  Tell us about an update
    User Rating:
    Rated by:
    NOT RATED
    0 user(s)
    Developer:

    License / Price:

    Last Updated:

    Category:
    Robert Lujo | More programs
    BSD License / FREE
    June 21st, 2010, 13:38 GMT [view history]
    ROOT / Text Editing&Processing / Indexing

     Read user reviews (0)  Refer to a friend  Subscribe

    text-sentence description

    A text tokenizer and sentence splitter tool

    The text-sentence is a text tokenizer and sentence splitter library.

    Input is for main function is text, list of known names and abbreviations. Result is list of tokens. Each token has type and other attributes i.e.:

     * is word,
     * is number,
     * is roman number,
     * is sentence end,
     * is abbreviation,
     * is name,
     * is end of chapter
     * etc.

    Determining end of sentence needs special logic and care what is the main reason for naming package with "text-sentence".

    FEATURES

    System is based on unicode strings.

    Check Getting started.

    INSTALLATION

    Installation instructions - if you have installed pip package http://pypi.python.org/pypi/pip:

    pip install text-sentence

    If not, then do it old-fashioned way:

     * download zip from http://pypi.python.org/pypi/text-sentence/
     * unzip
     * open shell
     * go to distribution directory
     * python setup.py install

    Development version you can see at http://bitbucket.org/trebor74hr/text-sentence.

    or Mercurial clone with:

    hg clone https://bitbucket.org/trebor74hr/text-sentence

    GETTING STARTED

    TODO:

    Usage example - start python shell:

    >>> from text_sentence import ...

    Further

    Since there is currently no good documentation, the best source of further information is by reading tests inside of module and tests test_sentence. More information in Running tests. You can allways read a source.

    DOCUMENTATION

    Currently there is no documentation. In progress ...

    SUPPORT

    Since this project is limited by my free time, support is limited.

    REPORT BUG OR REQUEST FEATURE

    If you encounter bug, the best is to report it to the bitbucket web page http://bitbucket.org/trebor74hr/text-sentence.

    The best way to contact me is by mail (find in LICENCE).

    TODO list is in readme.txt (dev version).

    CONTRIBUTION

    Since this project is not currently in the stable API phase, contribution should wait for a while.

    RUNNING TESTS

    All tests are doctests (not unittests). There are two type of tests in the package:

     1. doctests in module i.e. in __init__.py
     2. doctests in test_sentence.txt

    Running module directly will run 1. and 2.

    To run tests:

     * goto text_sentence directory
     * run tests by running module, e.g.:

     > python __init__.py
     __main__: running doctests
     test_sentence.txt: running doctests


     * other with:

     > python -m"text_sentence"


    Product's homepage

    Requirements:

    · Python

    What's New in This Release: [ read full changelog ]

    · is_contraction token attribute - e.g. isn't or oš'

      


    TAGS:

    sentence splitter | text tokenizer | sentencer | tokenization | chapter

    Go to top

    WindowsGamesDriversMacLinuxScriptsMobileHandheldNews

    SUBMIT PROGRAM   |   ADVERTISE   |   GET HELP   |   SEND US FEEDBACK   |   RSS FEEDS   |   UPDATE YOUR SOFTWARE   |   ROMANIAN FORUM