Cbench is a scalable cluster benchmarking and testing tool.




Cbench's goal is to be a relatively straightforward collection of tests, benchmarks, applications, utilities, and framework to hold them together with the goal to facilitate scalable testing, benchmarking, and analysis of a Linux parallel compute cluster. It grew gradually out of frustration having to redo the same work over and over as new Linux clusters were being integrated and brought online. I've continually found labor intensive tasks in cluster integration that could be assisted so that labor could be applied to the true goals of system integration, i.e. getting the system tested and debugged. As this toolkit has grown, it has opened the doors to more sophisticated system integration, testing, and characterization capabilities.

Here are some key features of "Cbench":

stress testing and analyzing cluster interconnect performance and characteristics using multiple bandwidth, latency, and collective tests
test and analyze cluster scalability utilizing common benchmarks like Linpack, HPCC, NAS Parallel Benchmarks, Intel MPI Benchmarks, IOR
stress test cluster file systems with a mix of job sizes and flavors
stress testing a cluster after maintenance by pounding the system with 100s to 1000s of jobs of various sizes and flavors
stress testing cluster scheduler and resource manager
test nodes for hardware stability and conformance to performance profile of homogeneous hardware
used as the basis for a deterministic methodology strictly detailing the testing process for returning broken hardware back into general production usage

What's New in This Release:

Added the --existencecheck mode to nodehwtest_output_parse.pl which is sort of a light version of checking node(s) parsed test data against characterized data. In the normal mode, the parsed data is compared against the characteristic data for deviation. In this new mode, the parsed data is compared against the characteristic for existence, i.e. to make sure the test seems to have run on the node and generated data.
udpated the little used --nodediag feature of output_parse_generic.pl to find the nodelist a job ran on if it was in the embedded Cbench info tags
start grabbing the nodelist from Torque inside batch jobs
fix and update output_parse modules that use the CBENCH NOTICE capability
the NPB ft.D.* mpi tests take wayyyyy to long if there isn't enough memory on the nodes, so added a basic runtime check to only run ft.D.* binaries on nodes with at least 6GB
bypass lu.D on node-level testing since it seems to behave badly
some attempts to get NPB to compile with fewer errors
add --random option to allow random ordering of test modules in node_hw_test
fixed a bug that for some reason caused the target_values file to be read wrong and threw off all the statistical comparisons in nodehwtest_output_parse.pl
added the prototype 'shakedown' testset using a new gen_jobs scripting structure (install with sbin/install_shakedown)
update Linpack and HPCC makefiles so that the HP_dlamch.c file is compiled properly with the NOOPT_FLAGS from make.def (CCNOOPT in the makes) to avoid optimzation related failures (this was the desire of the hpcc and linpack makefiles, it was just not quite right)
in compute_N(), round up when calculating the number of nodes
Marcus added some sorting so that job generation happens in a repeatable, logical order
make 1 processor the default starting size for generating and starting jobs
several tools/gen_jobs_generic.pl optimizations and cleanups from Marcus
several more custom parsing error filters
update tools/nodehwtest_output_parse.pl to be more consistent with tools/output_parse_generic.pl
assume openmpi 1.2.x is being used now when dealing with openmpi
fix a bug with the HPL/HPCC N parameter calulation when using > 1ppn, thanks to Jim Schutt
make the default hpcc input deck match that of linpack
Marcus Epperson added the mpiGraph benchmark, by Adam Moody of LLNL, http://sourceforge.net/projects/mpigraph. The crunch_mpiGraph utility can be used to generate an HTML report from mpiGraph run output.
added OSU messsage rate benchmark
changed the IO testset to be a stress test of the parallel filesystem
adding the IOSANITY to be used as a sanity checker of parallel filesystems, but not necessarily stressful
Build some openmp versions of streams if we are using Intel compilers
honor RPATH and BLASLIB set in the environment if found which allows one to recompile with a different BLAS linkage w/o doing any editing
some under the covers openmpi 1.2.X specific support
added the --waitall option for --throttledbatch mode which will keep start_jobs from exiting until all jobs have exited the batch system, thanks to Jim Schutt for suggesting this
add binary identifier support (--binident parameter) which analogous to test identifiers but for organizing multiple sets binaries in a Cbench testing tree
update the --ident parameter to accept comma separated lists
fixed a bug with --repeat when combined with with --throttledbatch or --serialbatch modes of start_jobs*
added a --polldelay option for --throttledbatch/--serialbatch mode
added more variations to compile streams
more improvements to the Cbench testing tree rpm capabilities
Adding initial (but working) support for 'yod' job launcher and 'pbspro' batch system
Marcus Epperson added the very nice HPCC debug build which allows one to select on the command line which hpcc tests to run
change the running as root/non-root logic to be smarter in node-level testing and not depend on privelege escelation
