translitcodec is a Python library that contains codecs for transliterating ISO 10646 texts into best-effort representations using smaller coded character sets (ASCII, ISO 8859, etc.). The translation tables used by the codecs are from the ``transtab`` collection by Markus Kuhn.
Three types of transliterating codecs are provided:
"long", using as many characters as needed to make a natural replacement. For example, u00e4 LATIN SMALL LETTER A WITH DIAERESIS ``ä`` will be replaced with ``ae``.
"short", using the minimum number of characters to make a replacement. For example, u00e4 LATIN SMALL LETTER A WITH DIAERESIS ``ä`` will be replaced with ``a``.
"one", only performing single character replacements. Characters that can not be transliterated with a single character are passed through unchanged. For example, u2639 WHITE FROWNING FACE ``☹`` will be passed through unchanged.
Using the codecs is simple:
>>> import translitcodec
>>> u'fácil € ☺'.encode('translit/long')
u'facil EUR :-)'
>>> u'fácil € ☺'.encode('translit/short')
u'facil E :-)'
The codecs return Unicode by default. To receive a bytestring back, either chain the output of encode() to another codec, or append the name of the desired byte encoding to the codec name:
>>> u'fácil € ☺'.encode('translit/one').encode('ascii', 'replace')
'facil E ?'
>>> u'fácil € ☺'.encode('translit/one/ascii', 'replace')
'facil E ?'
The package also supplies a 'transliterate' codec, an alias for 'translit/long'.
Product's homepage
Requirements:
· Python
What's New in This Release: [ read full changelog ]
· Fixes to the transtab table rebuilding tool.
· Added translitcodec.__version__