uni2ascii 4.18

uni2ascii and ascii2uni convert between UTF-8 Unicode and any of a variety of 7-bit ASCII.
uni2ascii and ascii2uni convert between UTF-8 Unicode and any of a variety of 7-bit ASCII equivalents including: hexadecimal and decimal HTML numeric character references, u-escapes, standard hexadecimal, and raw hexadecimal.

Such ASCII equivalents are useful when including Unicode text in program source, when entering text into Web programs that can handle the Unicode character set but are not 8-bit safe, and when debugging.

The Unicode escapes available are:

- HTML hexadecimal numeric character references (e.g. �)
- HTML decimal numeric character references (e.g. ȳ)
- u-escapes, as used in Python (e.g. u00E9)
- u-escapes within the BMP and U-escapes beyond the BMP, e.g. u00E9 but U00010024.
- U -escapes (e.g. U 00E9)
- U-escapes (e.g. U00E9)
- u-escapes (e.g. u00E9)
- U-escapes within angle brackets (e.g. )
- x-escapes (e.g. x00E9)
- x-escapes with braces (e.g. x{00E9})
- Standard hexadecimal (e.g. 0x00E9)
- Raw hexadecimal (e.g. 00E9)

uni2ascii accepts a command line flag determining whether to generate upper-case A-F or lower-case a-f as hexadecimal digits since some some programs accept only one or the other. ascii2uni accepts either.

In the case of uni2ascii by default, only characters outside the ASCII range are converted. Even if ASCII characters are also converted, newlines are preserved unless their conversion is explicitly requested. Space characters are also preserved unless conversion is explicitly requested. In the case of the three non-ASCII space characters (Ethiopic word space, Ogham space, and ideographic space), if space characters are not converted, these are replaced with ASCII space (0x20) so as to keep the output within the 7-bit ASCII range.

This package contains four programs. The main program is uni2ascii. It is written in C and must be compiled. uni2html.py is the predecessor to uni2ascii. As it is written in Python, it does not need to be compiled and should run on just about any current computer. uni2ascii is otherwise superior in that:

- It generates a wider range of output formats.
- It is approximately 20 times faster.
- It handles input in the full 32 bit Unicode range. In contrast, uni2html handles only the

Basic Multilingual Plane (Plane 0) because at present Python represents Unicode encoded text internally using 16-bit integers. If you've got text in, say, Linear B or Ugaritic, you need uni2ascii.

It does a better job of reporting errors. If it encounters an error in its input, such as mal-formed UTF-8, it reports the location of the error both in terms of the character count from the beginning of the file (starting at 0) and in terms of the byte count from the beginning of the file (also starting at 0). (Character counts and byte counts are generally not the same since a UTF-8 encoded character occupies from one to four bytes.) The Python version reports only the character count. uni2ascii also provides information about the nature of the error.

The third program, ascii2uni, is the inverse of uni2ascii. It accepts text containing a variety of ASCII representations of Unicode characters and generates UTF-8 Unicode.

The fourth program, ascii2uni.py, reads 7-bit ASCII containing u-escaped Unicode, as used in Python and Tcl, and converts it to UTF-8 Unicode. It is the original program of which ascii2uni is a generalization.

last updated on:
May 16th, 2011, 7:35 GMT
price:
FREE!
developed by:
Bill Poser
homepage:
billposer.org
license type:
GPL v3 
category:
ROOT \ Text Editing&Processing \ Markup

FREE!

In a hurry? Add it to your Download Basket!

user rating 24

3.4/5
 

0/5

What's New in This Release:
  • Fixed bug in uni2ascii in which in certain cases the subsitution count was too high, fixing Debian bug #626268.
  • Patched to handle situation in NetBSD which lacks getline.
  • Clarified semantics of pure option as converting characters in ascii range other than space and newline. Fixed bug in which this was not implemented correctly for UTF8 types.
read full changelog

Add your review!

SUBMIT