Softpedia
 


LINUX CATEGORIES:



GLOBAL PAGES >>
NEWS ARCHIVE >>
SOFTPEDIA REVIEWS >>
MEET THE EDITORS >>
WEEK'S BEST
  • Linux Kernel 3.9.3 / 3....
  • LibreOffice 3.6.6 / 4.0.3
  • MPlayer 1.1.1
  • systemd 204
  • Arch Linux 2013.05.01
  • Blender 2.67
  • KDE Software Compilatio...
  • CrunchBang Linux Stable...
  • Elementary OS 0.1 / 0.2...
  • SystemRescueCd 3.6.0
  • Home > Linux > Information Management

    cpdetector 1.0.10

    Download button

    No screenshots available
    Downloads: 976  Tell us about an update
    User Rating:
    Rated by:
    Fair (2.5/5)
    16 user(s)
    Developer:

    License / Price:

    Last Updated:

    Category:
    Achim Westermann | More programs
    MPL / FREE
    December 5th, 2011, 08:29 GMT [view history]
    ROOT / Information Management

     Read user reviews (0)  Refer to a friend  Subscribe

    cpdetector description

    A small yet clever framework for codepage detection

    cpdetector project is a small yet clever framework for codepage detection.

    cpdetector is a small yet clever framework for codepage detection that integrates different strategies. It may be used as a library for third party software that accesses textual data over network.

    It also includes a best-practice implementation in form of a command line tool that allows sorting and transforming large collections of documents based on their codepage.

    Available strategies include: jchardet (exclusion, frequency analysis, and guessing), detection of the HTML charset property, and detection of the XML encoding declaration.

    What is a code page?

    At first, a textual document is nothing more than sequences of bits. A computer has to decide, how he can display this data in form of characters (which are identified by the computer as numbers).

    A code page - which is also known as charset encoding - maps the raw data of a textual document to characters. The original ASCII code page for example only uses 7 bits of an octet (byte) for deciding the character that is represented thus allowing only to map 128 different characters. In the past memory was expensive and computers most often only had registers and busses for 8 bit.

    When a mainframe was conceived it had to be decided, which characters it should support. Physicians and mathematicians for example needed special characters for equations. As a result, a computer often shipped with a special codepage.

    What's New in This Release: [ read full changelog ]

    · This major bugfix version fixes two issues in command-line batch mode.
    · The switch to skip moving undetected documents works now again.
    · No attempt will be made to transcode undetected documents (the latter caused exceptional program flow).

      


    TAGS:

    clever framework | codepage detection | integrates strategies | cpdetector | framework | codepage

    Go to top

    WindowsGamesDriversMacLinuxScriptsMobileHandheldNews

    SUBMIT PROGRAM   |   ADVERTISE   |   GET HELP   |   SEND US FEEDBACK   |   RSS FEEDS   |   UPDATE YOUR SOFTWARE   |   ROMANIAN FORUM