Since I started using Ubuntu 8.04 as my main operating system, I have been trying to obtain some benchmark information for my month-old new build.
Unlike a machine running Windows, benchmarking suites for Linux are few and far between, and are especially hard to find for 64bit systems. Phoronix does provide an excellent test suite that is designed to run under Linux. But I haven’t had any luck to get the latest version (0.6.0) to build and run properly under the 64bit version of Linux yet. And since most of the applications within the test suite have Linux versions only, it would be very difficult to make cross-OS performance comparisons.
If your primary goal is to test your CPU and memory sub system, then I would recommend using Intel’s open source Threading Building Block (TBB). The source includes a few algorithms that were executed after compilation to test whether the build was successful. As a side benefit, these tests are timed and can be used as benchmarks as well.
As an example, the following list is the benchmark information obtained while building the latest stable version of TBB (tbb20_020oss_src). The library was built on my machine (Q9450 @3.2G, 8GB DDR2-800, Linux 2.6.24-16-generic SMP x86_64)
./count_strings 1
threads = 1 total = 1000000 time = 0.336895
./count_strings 2
threads = 2 total = 1000000 time = 0.214048
./count_strings 4
threads = 4 total = 1000000 time = 0.181645./seismic – 300
101.5 frame per sec with serial version
102.3 frame per sec with 1 way parallelism
193.9 frame per sec with 2 way parallelism
219.0 frame per sec with 3 way parallelism
244.2 frame per sec with 4 way parallelism./convex_hull_bench
Starting TBB unbufferred push_back version of QUICK HULL algorithm
Number of nodes:5000000 Number of threads:1 Initialization time:0.293048 Calculation time:0.807145
Number of nodes:5000000 Number of threads:2 Initialization time:0.822569 Calculation time:1.02838
Number of nodes:5000000 Number of threads:3 Initialization time:0.607247 Calculation time:1.13264
Number of nodes:5000000 Number of threads:4 Initialization time:0.5828 Calculation time:1.08477
Number of nodes:5000000 Number of threads:5 Initialization time:0.569491 Calculation time:1.10567
Number of nodes:5000000 Number of threads:6 Initialization time:0.585655 Calculation time:1.09051
Number of nodes:5000000 Number of threads:7 Initialization time:0.583944 Calculation time:1.08213
Number of nodes:5000000 Number of threads:8 Initialization time:0.561563 Calculation time:1.09363
Starting TBB bufferred version of QUICK HULL algorithm
Number of nodes:5000000 Number of threads:1 Initialization time:0.180772 Calculation time:0.713631
Number of nodes:5000000 Number of threads:2 Initialization time:0.09458 Calculation time:0.369742
Number of nodes:5000000 Number of threads:3 Initialization time:0.0698851 Calculation time:0.266026
Number of nodes:5000000 Number of threads:4 Initialization time:0.0567744 Calculation time:0.207367
Number of nodes:5000000 Number of threads:5 Initialization time:0.0555128 Calculation time:0.230236
Number of nodes:5000000 Number of threads:6 Initialization time:0.0598358 Calculation time:0.23095
Number of nodes:5000000 Number of threads:7 Initialization time:0.0624336 Calculation time:0.257518
Number of nodes:5000000 Number of threads:8 Initialization time:0.0586483 Calculation time:0.278667./primes 100000000 0:4
#primes from [2..100000000] = 5761455 (0.16 sec with serial code)
#primes from [2..100000000] = 5761455 (0.18 sec with 1-way parallelism)
#primes from [2..100000000] = 5761455 (0.09 sec with 2-way parallelism)
#primes from [2..100000000] = 5761455 (0.06 sec with 3-way parallelism)
#primes from [2..100000000] = 5761455 (0.05 sec with 4-way parallelism)./parallel_preorder 1:4
0.235308 seconds using 1 threads (average of 199.74 nodes in root_set)
0.202356 seconds using 2 threads (average of 199.74 nodes in root_set)
0.153144 seconds using 3 threads (average of 199.74 nodes in root_set)
0.181067 seconds using 4 threads (average of 199.74 nodes in root_set)./sum_tree
Tree creation using TBB scalable allocator
half created serially: time = 177.1 msec
half done in parallel: time = 77.9 msec
Calculations:
SerialSumTree: time = 77.9 msec, sum=7.01275e+08
SimpleParallelSumTree: time = 44.5 msec, sum=7.01275e+08
OptimizedParallelSumTree: time = 43.4 msec, sum=7.01275e+08
./sum_tree -stdmalloc
Tree creation using standard operator new
half created serially: time = 369.2 msec
half done in parallel: time = 548.7 msec
Calculations:
SerialSumTree: time = 94.7 msec, sum=7.01275e+08
SimpleParallelSumTree: time = 65.3 msec, sum=7.01275e+08
OptimizedParallelSumTree: time = 65.4 msec, sum=7.01275e+08