numexpr is a Python library that evaluates multiple-operator array expressions many times faster than NumPy can. It accepts the expression as a string, analyzes it, rewrites it more efficiently, and compiles it to faster Python code on the fly. It's the next best thing to writing the expression in C and compiling it with a specialized just-in-time (JIT) compiler, i.e. it does not require a compiler at runtime.
Why It Works
There are two extremes to array expression evaluation. Each binary operation can run separately over the array elements and return a temporary array. This is what NumPy does: 2*a + 3*b uses three temporary arrays as large as a or b. This strategy wastes memory (a problem if the arrays are large). It is also not a good use of CPU cache memory because the results of 2*a and 3*b will not be in cache for the final addition if the arrays are large.
The other extreme is to loop over each element:
for i in xrange(len(a)):
c[i] = 2*a[i] + 3*b[i]
This conserves memory and is good for the cache, but on each iteration Python must check the type of each operand and select the correct routine for each operation. All but the first such checks are wasted, as the input arrays are not changing.
numexpr uses an in-between approach. Arrays are handled in chunks (the first pass uses 256 elements). As Python code, it looks something like this:
for i in xrange(0, len(a), 256):
r0 = a[i:i+256]
r1 = b[i:i+256]
multiply(r0, 2, r2)
multiply(r1, 3, r3)
add(r2, r3, r2)
c[i:i+256] = r2
The 3-argument form of add() stores the result in the third argument, instead of allocating a new array. This achieves a good balance between cache and branch prediction. The virtual machine is written entirely in C, which makes it faster than the Python above.
For more info about numexpr, read the Numexpr's Overview written by the original author (David M. Cooke).
Examples of Use
Using it is simple:
>>> import numpy as np
>>> import numexpr as ne
>>> a = np.arange(1e6) # Choose large arrays for high performance
>>> b = np.arange(1e6)
>>> ne.evaluate("a + 1") # a simple expression
array([ 1.00000000e+00, 2.00000000e+00, 3.00000000e+00, ...,
9.99998000e+05, 9.99999000e+05, 1.00000000e+06])
>>> ne.evaluate('a*b-4.1*a > 2.5*b') # a more complex one
array([False, False, False, ..., True, True, True], dtype=bool)
and fast... :-)
>>> timeit a**2 + b**2 + 2*a*b
10 loops, best of 3: 33.3 ms per loop
>>> timeit ne.evaluate("a**2 + b**2 + 2*a*b")
100 loops, best of 3: 7.96 ms per loop # 4.2x faster than NumPy