NAG Numerical Routines for GPUs

Performance Results

Single-threaded CPU equivalents of all the GPU routines are provided to enable verification of results. The CPU equivalents are not optimised, and should not be used in performance comparisons. For such benchmarks, we compare our GPU code against the highly optimised MKL/VSL library from Intel. We parallelized the random number generators in VSL using OpenMP to use multiple CPU cores. The performance figures below were obtained on the following system:

CPU: Intel Core i7 860 running at 2.8GHz
OS: Windows 7 64bit

GPU Table

Figures in bold are for double precision.

Testing and Verification

Verification is performed through a suite of rigorous test programs and by comparing the CPU and GPU values.  For the uniform random number generators, the CPU and GPU values are always identical.  For the non-uniform distributions (e.g, Normal), small numerical differences may arise due to different implementations of special functions between the two platforms, and due to the extended precision used in intermediate calculations by many CPU chips.


We would like to thank the Technology Strategy Board (TSB) and the Smith Institute for their support in sponsoring this project and EPSRC for supporting Professor Giles’ academic research