The Cell Messaging Layer (or CML for short) is a communication library for the Cell Broadband Engine, which many people recognize as the Playstation 3's microprocessor. The CML implements a small but useable subset of the functions provided by the Messaging Passing Interface (MPI), which provides a familiar interface to programmers who are accustomed to programming parallel computers or workstation clusters.
The Cell Messaging Layer runs not only on a single Cell processor but also on compute nodes containing multiple Cell processors sharing a common memory space and on clusters containing multiple Cell compute nodes. Regardless of configuration, the CML makes the entire system look like a homogeneous cluster of Cell vector units (known as synergistic processing elements or SPEs). Any SPE can communicate directly with any other SPE, regardless of physical location.
The Cell Messaging Layer is optimized for performance. At the time of this writing, it is the fastest messaging-passing library available for the Cell. It is designed to utilize the Cell's slow but flexible Power processor element (PPE) only for internode communication, never within a node. Collective operations are designed hierarchically so as to minimize the use not only of the PPE but also of the Broadband Interface (BIF), which connects multiple Cells within a node.
The examples directory in the CML distribution shows how to use the Cell Messaging Layer. The files in the minimal subdirectory demonstrate the minimal amount of code needed on the PPE and the SPE for a "do-nothing" program. The files in the showcase subdirectory show how to use all of the MPI functions implemented by the Cell Messaging Layer. At the time of this writing, those functions include the following:
There is documentation on the Web for each of these functions (e.g., at http://www-unix.mcs.anl.gov/mpi/www/). See also the spe/include/mpi.h file, installed as part of the Cell Messaging Layer, for the complete set of function prototypes.
Additional features and characteristics
MPI ranks are assigned such that they utilize all of the SPEs on one Cell before using any of the SPEs on the next Cell. That is, ranks 0 to 7 are on the first Cell, ranks 8 to 15 are on the second Cell, and so forth (assuming current hardware, with 8 SPEs per Cell).
The MPI_Comm_get_attr() function accepts a MPI_CML_LOCAL_NEIGHBORS key, which returns the the number of SPEs managed by a single PPE (typically 8 for a single Cell or 16 for a pair of Cells connected via a BIF connection).
The CMLMAXLOCALSPES environment variable limits the number of SPEs used by each PPE. It must be set to a power of two.
The Cell Messaging Layer supports a convenient remote procedure call (RPC) mechanism that enables a SPE to invoke functions on the PPE and receive the results. See the files in the examples/showcase directory for usage examples.
What's New in This Release: [ read full changelog ]
· Numerous bugs were fixed, some severe.
· The "showcase" example was modified to sanity-check the result of the reductions/multicasts.