Earlier today, I decided to see how fast my quad-core PC really is in terms of the raw floating point performance measured by GFLOPS. The typical software to measure the performance of scientific calculations is LINPACK. Using Intel’s implementation, I obtained the following results (Q9450 @ 3.2GHz, O.C.):
CPU frequency: 3.200 GHz Number of CPUs: 4 Number of threads: 4 Parameters are set to:Number of tests : 1 Number of equations to solve (problem size) : 10000 Leading dimension of array : 10000 Number of trials to run : 10 Data alignment value (in Kbytes) : 1024 Maximum memory requested that can be used = 801248576, at the size = 10000 ============= Timing linear equation system solver ================= Size LDA Align. Time(s) GFlops Residual Residual(norm) 10000 10000 1024 16.427 40.5946 1.012665e-10 3.570760e-02 10000 10000 1024 16.398 40.6676 1.012665e-10 3.570760e-02 10000 10000 1024 16.395 40.6740 1.012665e-10 3.570760e-02 10000 10000 1024 16.473 40.4833 1.012665e-10 3.570760e-02 10000 10000 1024 16.391 40.6852 1.012665e-10 3.570760e-02 10000 10000 1024 16.394 40.6785 1.012665e-10 3.570760e-02 10000 10000 1024 16.427 40.5970 1.012665e-10 3.570760e-02 10000 10000 1024 16.397 40.6712 1.012665e-10 3.570760e-02 10000 10000 1024 16.394 40.6766 1.012665e-10 3.570760e-02 10000 10000 1024 16.396 40.6733 1.012665e-10 3.570760e-02 Performance Summary (GFlops) Size LDA Align. Average Maximal 10000 10000 1024 40.6401 40.6852 End of tests
It is amazing to see that a personal PC nowadays can achieve 40+ GFLOPS! To put this number in perspective, take a look at the TOP 500 super computer ranking back in 2005. At that time this was the performance of a super computer (Since the benchmarking program used here is different and the conditions under which the tests are performed are not necessarily the same, direct numerical comparison might not be meaningful. Nevertheless, the general trend still holds.)!
It is also worth noting that Q9450 is an extremely overclockable CPU. To achieve a 3.2GHz core frequency, I only needed to raise the Front Side Bus (FSB) frequency from the default 333MHz to 400MHz (vcore is set at 1.2V manually). I was able to achieve a maximum of 3.4GHz without any stability issue. In fact, the only thing keeps me from achieving a higher clock rate is my DDR2 800 RAM (4x2GB, G.SKILL F2-6400CL5D-4GBPQ). With DDR2 RAM, Q9450 can easily archive 50 GFLOP in LINPACK test.