Karmasphere DP language is a high-performance non-blocking parallel language for performing data processing. The project is designed to give the user a high degree of control over the usage of system resources, for example, how many CPU cores or how much disk I/O time to use, without requiring the software developer to explicitly consider these issues in code.
It was originally intended for collecting attributes of URLs and domain names to be used in an anti-spam system, although it has since developed into a full parallel programming language with many general purpose operators.
The implementation is a standalone library which can be used in any Java 1.5 environment. It can take full advantage of multiprocessor (SMP or NUMA) systems, and may be scaled sideways - since the interpreter and environment are stateless, an entire cluster of machines may run the interpreter in parallel without any requirement for synchronization.
Traditional, sequential programs are lists of instructions, executed in order. If an instruction needs CPU, disk or network resources, it must wait until the resource is available before continuing. Network latency, for example, is highly unpredictable and can create terrible performance problems for sequential programs. While it is possible to write complex sequential programs which optimize resource usage, it is well beyond the ability of the naive programmer. The DP language is designed to solve this problem by making parallel programming easy.
DP Programs are workflows, that is, they may be represented graphically using a boxes-and-arrows notation. In the DP language, every operation executes concurrently, whenever the necessary resources are available. This means that operations which would hold up execution waiting for resources in a traditional sequential language do not slow down a DP program at all.
We chose to make our source language almost identical with GraphViz, which builds this same textual representation into JPEG or other images. Debugging output from the interpreter is also in GraphViz format, and may be easily rendered and read without deep understanding of the machine.
The core language, documented here, includes some basic operators for processing and network operations. Additional operators are easy to develop using the framework provided.
The entire interpreter is provided as an API. It may be executed on the commandline, using a job server (available, but not documented here), embedded in a query server (also available, but not documented here), an RPC daemon (under development) or anywhere else that it may be useful. It consumes no resources when idle, and only those resources specified when active.
While in informal workflows, arrows may be implicitly typed or may simply indicate a relation, the DP language permits explicit typing of each arrow. Any Java type may be used; the DP interpreter does not have to be made aware of every type in the system, although it can make certain inferences about types if they are registered. If types are specified, programs may be typechecked at compile-time.