BSD License 
Toby Rosen
pyp is a Linux command-line text manipulation tool similar to awk or sed, but which uses standard Python string and list methods as well as custom functions evolved to generate fast results in an intense production environment. Pyed Pyper was developed at Sony Pictures Imageworks to facilitate the construction of complex image manipulation "one-liner" commands during visual effects work on Alice in Wonderland, Green Lantern, and the upcoming The Amazing Spiderman.

Because pyp employs it's own internal piping syntax ("|") similar to unix pipes, complex operations can be proceduralized by feeding the output of one python command to the input of the next. This greatly simplifies the generation and troubleshooting of multistep operations without the use of temporary variables or nested parentheses. In practice, the ability to easily construct complicated command sequences can largely replace "for each" loops on the command line, thus significantly speeding up work-flow using standard unix command recycling.

pyp output has been optimized for typical production scenarios. For example, if text is broken up into an array using the "split()" method, the output will be automatically numbered by field making selecting a particular field trivial. Numerous other conveniences have been included, such as an accessible history of all inter-pipe sub-results, an ability to perform mathematical operations, and a complement of variables based on common metacharcter split/join operations.

For power users, commands can be easily saved and recalled from disk as macros, providing an alternative to quick and dirty scripting. For the truly advanced user, additional methods can be added to the pyp class via a config file, allowing tight integration with larger facilities data structures or custom toolsets.

A Quick Tour

The simplest pyp example shows how python string methods can be used easily on the command line. For example, to split up the different columns of a linux long listing, we just use the split method with pyp's line-by-line variable "p"

ls -l | pyp "p.split()"

we can then use standard python indexing to select the column. For example, to select the last column, we can just use this:

ls -l | pyp "p.split()[-1]"

Any other python string methods can be used; for example p.lower() will make everything lowercase.

For a more complicated example, we take a linux long listing, capture every other of the 5th through the 10th lines, keep username and file name fields, replace "hello" with "goodbye", capitalize the first letter of every word, and then add the text "is splendid" to the end:

ls -l | pyp "pp[5:11:2] | whitespace[2], w[-1] | p.replace('hello','goodbye') | p.title(),'is splendid'"

This uses pyp's built-in line-by-line and entire input variables (p and pp), as well as the variable whitespace and it's shortcut w, which both represent a list based on splitting each line on whitespace (whitespace = w = p.split()).

The other functions and selection techniques are all standard python. Notice the pipes ("|") are inside the pyp command.

We can then save this as a macro to disk using the flag --macro_save splendid_example

The next time we need to perform this operation, we can simply use this:

ls -l | pyp splendid_example

Last updated on March 20th, 2012


