Bitwise Reproducibility with the NAG Libraries
I've written in this blog before about the problems of Wandering Precision - where the results computed by a program are not consistent, even when running exactly the same program on the same machine with the same data several times in a row.
At SC12 in Salt Lake City a couple of weeks ago I took part in a "Birds of a Feather" session organised by Intel, where problems like these, associated with bitwise reproducibility of results, were discussed. On the panel, apart from myself, were representatives of Intel's Math Kernel Library, The MathWorks (producers of MATLAB), German engineering company GNS, and the University Of California, Berkeley.
We all gave brief presentations discussing the impact of non-reproducible results on our work or on our users, and ways around the problem. It turned out that most of
our presentations were remarkably similar, involving tales of disgruntled users who were not happy about accepting such varying results. This is true even when the same
users completely understand the fact that numerical software is subject to rounding errors, and that the varying results they see are not incorrect - they may be equally good approximations to an ideal mathematical result. But they just don't like inconsistency - in fact, often they would prefer to get an answer that is accurate to fewer digits of precision so long as it is consistent, rather than inconsistent but slightly more accurate results.
As it happens, we do have some control over these inconsistent results, which are largely due to optimizations introduced by clever compilers that are designed to take advantage of fast SIMD instructions like SSE and AVX that are available on modern hardware. By using appropriate compiler flags on the Intel Fortran and C compilers, for example, we can avoid these optimizations, at the cost of making the code run up to 12 or 15% slower (according to Intel).
For NAG Libraries, we've decided what we're going to do in future. Most NAG Library products are distributed in two variants - one which is based on fast vendor library kernels (like MKL) and one that is not, but consists of all-NAG versions of routines like BLAS and LAPACK. Typically we expect the all-NAG variant to run slower than the MKL-based variant, so what we plan to do for the next Marks of our libraries is to compile the all-NAG variant library avoiding the SIMD optimizations, but compile the MKL_based library still to use them. That way, we hope to get the best of both worlds - our users can choose whether they want better consistency, or better performance.