msort is a program for sorting files in sophisticated ways. msort project was originally developed for alphabetizing dictionaries of "exotic" languages, for which it has been extensively used, but is useful for many other purposes.
msort differs from typical sort utilities in providing greater flexibility in parsing the input into records and identifying key fields and greater control over the sort order.
The underlying command-line program msort should compile and run without difficulty on any POSIX-conformant system on which the TRE regular expression library is available. It is known to compile and run without modification under GNU/Linux, FreeBSD, and SunOs.
Under Mac OS X, the internationalization and localization libraries used by msort are not available. Msort will compile if you edit the Makefile and remove the flag -DINTLIZE.
The graphical user interface should run anywhere that Tcl/Tk is available, but a few features may not work on non-Unix systems. In particular, the Abort Sort command depends on the existence of a Unix-style kill program that can be used to send a signal to another process.
It is known to run under GNU/Linux, FreeBSD, and SunOS. msg will run properly under Mac OS X if you have installed X11 and use Tk-X11. msg now adapts itself to Tk-Aqua sufficiently well as to be usable, but some details remain to be dealt with.
Here are some key features of "msort":
· msort can be used as a command-line program or via a graphical user interface that is helpful not only to those who find a complicated command line difficult to deal with but also to those unfamiliar with the finer points of sorting.
· Records need not be single lines of text but may be delimited in a number of ways.
· Key fields may be selected by position in the record (counting from the beginning or the end), by character ranges (e.g. the key consists of the fourth through eighth characters), or by matching a regular expression to a tag.
· For each key an arbitrary sort order may be specified.
· For each key an effectively unlimited number of multigraphs (sequences of characters to be treated as a single unit for purposes of sorting, "collating elements" in Unicode parlance) of effectively unlimited length may be defined.
· In addition to the usual lexicographic and numerical orderings, msort supports sorting by date, time, and string length.
· For each key a distinct set of characters may be excluded from consideration when sorting in any combination of initial, final, and medial position in the key field.
· For each key a distinct set of regular expression substitutions may be defined. These provide the means to make names like McCarthy sort before MacCawley, as if McCarthy were spelled MacCarthy as well as to handle the rare cases in which a single character is treated for purposes of sorting as a sequence, such as German "eszet", which is traditionally sorted as if it were ss.
· Lexicographic keys may be reversed, allowing the construction of reverse dictionaries.
· Any or all keys may be optional. For optional keys, the user may specify how records missing the key field should compare to records in which the key field is present.
What's New in This Release: [ read full changelog ]
· Adapted to be compatible with libtre 0.8
· Removed unnecessary conditioning of Hybrid mapping code on availability of locale support.
· Added -Z option for copying the first record to the output without sorting it. This is useful for sorting files with a header.
· Considerably reduced the memory used for exclusions
· Fixed a bug in the reporting of exclusions