NAG Numerical Routines for GPUs
Single-threaded CPU equivalents of all the GPU routines are provided to enable verification of results. The CPU equivalents are not optimised, and should not be used in performance comparisons. For such benchmarks, we compare our GPU code against the highly optimised MKL/VSL library from Intel. We parallelized the random number generators in VSL using OpenMP to use multiple CPU cores. The performance figures below were obtained on the following system:
CPU: Intel Core i7 860 running at 2.8GHz
GPU: NVIDIA C2050
OS: Windows 7 64bit
Figures in bold are for double precision.
Testing and Verification
Verification is performed through a suite of rigorous test programs and by comparing the CPU and GPU values. For the uniform random number generators, the CPU and GPU values are always identical. For the non-uniform distributions (e.g, Normal), small numerical differences may arise due to different implementations of special functions between the two platforms, and due to the extended precision used in intermediate calculations by many CPU chips.
We would like to thank the Technology Strategy Board (TSB) and the Smith Institute for their support in sponsoring this project and EPSRC for supporting Professor Giles’ academic research