prll 0.6.2

Easily execute stuff in parallel
While every full-featured shell provides job control, it is only meant for manual, interactive handling of several jobs, and not much more. prll (pronounced "parallel") was created to simplify a common task of running a large number of jobs a few at a time. See the features below summary for a quick overview.

If you have a bunch of files to process, a loop is what you need. However, if you have a multicore/multiprocessor machine, it is much more efficient to run as many processes in parallel as there are CPUs available. While a minor extension to the loop might be adequate, it is not the most efficient solution. This article describes how to do parallel execution using a loop, or using the shell's notion of a job, and the shortcomings of both methods. It also describes prll's predecessor, which was called mapp, and on which prll is based. In the end, they do the same thing, but use different means of interprocess communication.

prll is implemented as a shell function, with helper programs written in C. While there are other ways to tackle the problem, like using the xargs utility, and while many are "saner" in some sense, having a shell function has a distinct advantage: you don't need to write any scripts or programs. Implement your task as a shell function, and prll will run it using the context of your current shell. This makes one-off commands possible without having to put them into script files, which would be too bothersome. As an example, to flip all photos in the current directory, just do

 function myfn() { mogrify -flip $1 ; }
 prll myfn *.jpg


With version 0.3 or later, you can even do just

 prll -s 'mogrify -flip $1' *.jpg

For comparison, here is the same thing with a non-parallel loop:

 for i in *.jpg ; do
 mogrify -flip $i
 done


In version 0.4, prll also gained xargs-like ability to read standard input, with both newline and null separators, which enables processing of data that is harder to quote. The difference from xargs is that prll is fed a shell function, making interactive use easier. xargs takes a simple command, and complex commands must be wrapped in a script or in 'bash -c' or such. Also, parallel execution in xargs must be specified separately, while prll reads the number of CPUs automatically. Not to mention that xargs is prone to data loss when doing parallel execution (an example here), while prll 0.5 or later features full output buffering and locking which prevents that. Please note that this is not a rant against xargs. xargs is not a tool for parallel execution, it is a tool for constructing argument lists for other programs, and cooperates with prll.

The shell function you write can be anything. In the README file provided in the download, there is an example of a function that takes more than one argument. Also, if you use ssh, preferrably with key-based authentication and ssh-agent, you can use prll to handle execution over several machines -- an ad-hoc cluster.

REQUIREMENTS

 * bash or zsh
 * C compiler, such as gcc
 * GNU make
 * OS support for System V Message Queues and Semaphores
 * device files /dev/urandom or /dev/random
 * the cat utility
 * optional tests require utilites tr, grep, sort, split, diff and uname

These requirements should be satisfied by your system by default, excepting perhaps the compiler and its toolchain, which are not installed by default on systems such as Ubuntu Linux. Refer to your system's documentation on how to install missing programs.

Optionally (on Linux), the /proc/cpuinfo file can be used to automatically determine the number of processors, but it is not mandatory.

prll passes basic tests on the following Operating Systems: GNU/Linux, FreeBSD, OpenBSD, MacOS X, Solaris versions 8-10.

Main features:

  • Easy to use. Focuses on a single task and doesn't try to emulate a kitchen sink.
  • Code is passed in shell functions to ease interactive use.
  • Works in both bash and zsh and in several operating systems.
  • Execution can be terminated gracefully, letting started jobs finish their work.
  • Can be terminated from within the code it executes, easing aborting on errors or implementing an ad-hoc parallel search.
  • Does internal buffering and locking to prevent mangling/interleaving of output from separate jobs.

last updated on:
October 31st, 2011, 8:54 GMT
price:
FREE!
developed by:
Jure Varlec
license type:
GPL v3 
category:
ROOT \ Utilities

FREE!

In a hurry? Add it to your Download Basket!

user rating

UNRATED
0.0/5
 

0/5

What's New in version 0.6.1
  • prll_seq, a simple substitute for GNU seq, was added.
  • Five locks are now available to users should they need to synchronize their functions.
  • Another helper function was added to ease passing and splitting of multiple arguments.
read full changelog

Add your review!

SUBMIT