|
Learn more about...
NAG Numerical Routines for GPUsPerformance ResultsFor the purpose of results verification and performance comparisons, we have single CPU processor implementations of the GPU equivalent functions available. The computation speed shown in the table was obtained running on a machine with an Intel Quad-core Xeon 64-bit 2.33GHz processor, 8GB memory, under Visual Studio 2005 SP1 with Microsoft Windows XP 64-bit and equipped with an NVIDIA Tesla C1060 card. Based on a test generating 33,554,432 (33 million) uniform random numbers we obtain the following performance measurements:
Our results show that double precision performance of the MRG32k3a implementation is comparable to single precision. However, the execution speed of the double precision implementation is highly dependent on the number of double precision units (DPs) available on the GPU hardware. *The speed-up comparison is done with the same test running on the same Intel Xeon machine using single core. Testing and Verification We refer to the CPU serial implementation as the "gold" reference. For the uniform random number generator, we always generate identical values on the CPU and the GPU. For other non-uniform distributions (e.g, Normal), we expect to have slight differences between the CPU/GPU single precision implementations, potentially because CPU usually performs higher precision calculations when storing intermediate results. Acknowledgements We would like to thank the Technology Strategy Board (TSB) and the Smith Institute for their support in sponsoring this project and EPSRC for supporting Professor Giles’ academic research |
© Numerical Algorithms Group
Visit NAG on the web at:
www.nag.co.uk (Europe and ROW)
www.nag.com (North America)
www.nag-j.co.jp (Japan)
http://www.nag.co.uk/numeric/GPUs/benchmarks.asp