CSL6I26DDL Users' Note

NAG C Library, Mark 26, Multithreaded

CSL6I26DDL - License Managed

Linux 64 (Intel 64 / AMD64), Intel C/C++, 64-bit integers

Users' Note

1. Introduction
2. Supplementary Information
3. General Information
4. Routine-specific Information
5. Documentation
6. Support from NAG
7. Contact Addresses

1. Introduction

This document is essential reading for every user of the NAG C Library implementation specified in the title. It provides implementation-specific detail that augments the information provided in the NAG Mark 26 Library Manual (which we will refer to as the Library Manual). Wherever that manual refers to the "Users' Note for your implementation", you should consult this note.

In addition, NAG recommends that before calling any Library routine you should read the following reference material from the Library Manual (see Section 5):

(a) How to Use the NAG Library and its Documentation
(b) Chapter Introduction
(c) Routine Document

2. Supplementary Information

Please check the following URL:

http://www.nag.co.uk/doc/inun/cs26/l6iddl/supplementary.html

for details of any new information related to the applicability or usage of this implementation.

3. General Information

This implementation of the NAG C Library provides static and shareable libraries that use the Intel ® Math Kernel Library for Linux (MKL), a third-party vendor performance library, to provide Basic Linear Algebra Subprograms (BLAS) and Linear Algebra PACKage (LAPACK) routines (except for any routines listed in Section 4). It also provides static and shareable libraries that use the NAG versions of these routines (referred to as the self-contained libraries). This implementation has been tested with version 11.3.3 of MKL, which is supplied as a part of this product. Please see the Intel website for further information about MKL (https://software.intel.com/intel-mkl). For best performance, we recommend that you use one of the variants of the NAG C Library which is based on the supplied MKL, i.e. libnagc_mkl.a or libnagc_mkl.so, in preference to using one of the self-contained NAG libraries, libnagc_nag.a or libnagc_nag.so.

Note that the NAG C Library is carefully designed so that any memory used can be reclaimed – either by the Library itself or by the user invoking calls of NAG_FREE(). However, the Library does itself depend on the use of compiler run-time and other libraries which may sometimes leak memory, and memory tracing tools used on programs linked to the NAG Library may report this. The amount of memory leaked will vary from application to application, but should not be excessive and should never increase without limit as more calls are made to the NAG Library.

If you intend to use the NAG library within a multithreaded application please refer to Section 2.10.1 of the document How to Use the NAG Library and its Documentation for more information. Further information about using the supplied Intel MKL libraries with threaded applications is available at http://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-using-intel-mkl-with-threaded-applications.

The libraries supplied with this implementation have been compiled with OpenMP. However, the OpenMP runtime libraries of different compilers may not be compatible, thus you are recommended to only use this implementation in conjunction with your own OpenMP code (including any OpenMP statements required in the user-supplied functions of the routines listed in Section 4) when using the compiler and corresponding OpenMP runtime listed in the Installer's Note, Section 2.2. Note that the system's default thread stacksize may not be sufficient for running all NAG C Library routines within multithreaded applications; you may increase this stacksize using the OpenMP environment variable OMP_STACKSIZE.

Intel have introduced a conditional bitwise reproducibility (BWR) option in MKL. Provided a user's code adheres to certain conditions (see https://software.intel.com/en-us/node/528579), BWR can be forced by setting the MKL_CBWR environment variable. See the MKL documentation for further details. It should be noted, however, that many NAG routines do not adhere to these conditions. This means that for a given NAG library built on top of MKL, it may not be possible to ensure BWR for all NAG routines across different CPU architectures by setting MKL_CBWR. See Section 2.9.1 of How to Use the NAG Library and its Documentation for more general information on bitwise reproducibility.

3.1. Accessing the Library

In this section we assume that the Library and the NAG include files have been installed in the directory [INSTALL_DIR]. By default [INSTALL_DIR] (see Installer's Note (in.html)) is $HOME/NAG/csl6i26ddl; however it could have been changed by the person who did the installation, in which case you should consult that person.

To use the NAG C Library and the supplied Intel MKL libraries, you may link in the following manner:

  icc -qopenmp driver.c -I[INSTALL_DIR]/include \
      [INSTALL_DIR]/lib/libnagc_mkl.a \
      -Wl,--start-group \
      [INSTALL_DIR]/mkl_intel64_11.3.3/lib/libmkl_intel_ilp64.a \
      [INSTALL_DIR]/mkl_intel64_11.3.3/lib/libmkl_intel_thread.a \
      [INSTALL_DIR]/mkl_intel64_11.3.3/lib/libmkl_core.a \
      -Wl,--end-group \
      [INSTALL_DIR]/rtl/intel64/libiomp5.a -lpthread -lm -ldl \
      [INSTALL_DIR]/rtl/intel64/libifcoremt.a -lstdc++

where driver.c is your application program; or

  icc -qopenmp driver.c -I[INSTALL_DIR]/include \
      [INSTALL_DIR]/lib/libnagc_mkl.so \
      -L[INSTALL_DIR]/mkl_intel64_11.3.3/lib \
      -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core \
      -L[INSTALL_DIR]/rtl/intel64 \
      -liomp5 -lpthread -lm -ldl -lifcoremt

if the shareable library is required.

However, if you prefer to link to a version of the NAG C Library which does not require the use of MKL you may wish to use the self-contained libraries as follows:

  icc -qopenmp driver.c -I[INSTALL_DIR]/include \
      [INSTALL_DIR]/lib/libnagc_nag.a \
      [INSTALL_DIR]/rtl/intel64/libifcoremt.a -lpthread -lstdc++

  icc -qopenmp driver.c -I[INSTALL_DIR]/include \
      [INSTALL_DIR]/lib/libnagc_nag.so \
      -L[INSTALL_DIR]/rtl/intel64 -lifcoremt -lpthread

if the shareable library is required.

Please note the shareable libraries are fully resolved so that you need not link against other run-time libraries explicitly; this requires the environment variable LD_LIBRARY_PATH to be set correctly at link time (see below).

If your application has been linked with the shareable NAG and MKL libraries then the environment variable LD_LIBRARY_PATH must be set or extended, as follows, to allow run-time linkage.

In the C shell, type:

  setenv LD_LIBRARY_PATH [INSTALL_DIR]/lib:[INSTALL_DIR]/mkl_intel64_11.3.3/lib

to set LD_LIBRARY_PATH, or

  setenv LD_LIBRARY_PATH \
    [INSTALL_DIR]/lib:[INSTALL_DIR]/mkl_intel64_11.3.3/lib:${LD_LIBRARY_PATH}

to extend LD_LIBRARY_PATH if you already have it set.

In the Bourne shell, type:

  LD_LIBRARY_PATH=[INSTALL_DIR]/lib:[INSTALL_DIR]/mkl_intel64_11.3.3/lib
  export LD_LIBRARY_PATH

to set LD_LIBRARY_PATH, or

  LD_LIBRARY_PATH=[INSTALL_DIR]/lib:[INSTALL_DIR]/mkl_intel64_11.3.3/lib:${LD_LIBRARY_PATH}
  export LD_LIBRARY_PATH

to extend LD_LIBRARY_PATH if you already have it set.

Note that you may also need to set LD_LIBRARY_PATH to point at other items such as compiler run-time libraries, for example if you are using a newer version of the compiler.

If you are using a different compiler, or indeed a different version of the Intel compiler, you may need to link against the Intel compiler run-time libraries provided in [INSTALL_DIR]/rtl/intel64.

3.1.1. Setting the number of threads to use

This implementation of the NAG Library, and MKL, make use of OpenMP to implement threading in some of the library routines. The number of threads that will be used at run time can be controlled by setting the environment variable OMP_NUM_THREADS to the appropriate value.

In the C shell, type:

  setenv OMP_NUM_THREADS N

In the Bourne shell, type:

  OMP_NUM_THREADS=N
  export OMP_NUM_THREADS

where N is the number of threads required. The environment variable OMP_NUM_THREADS may be re-set between each execution of the program, as desired. If you wish to change the number of threads to use for different parts of your program during execution, routines are provided in Chapter X06 of the NAG Library to assist with this process.

Multiple levels of OpenMP parallelism may be present in some NAG Library and MKL routines, and you may also call these multithreaded routines from within an OpenMP parallel region in your own application. By default, OpenMP nested parallelism is disabled, so only the outermost parallel region will actually be active, using N threads in the example above. The inner level(s) will not be active, i.e. they will run on one thread. You can check if OpenMP nested parallelism is enabled and choose to enable/disable it by either querying and setting the OMP_NESTED OpenMP environment variable or using the appropriate routines in Chapter X06. If OpenMP nested parallelism is enabled, the above example will create N threads at each parallel region for each thread at a higher level, thus N*N threads in total if there are two levels of OpenMP parallelism, etc. To provide more detailed control of nested parallelism, the environment variable OMP_NUM_THREADS can be set to be a comma separated list to specify the number of threads desired at each level.

In the C shell, type:

  setenv OMP_NUM_THREADS N,P

In the Bourne shell, type:

  OMP_NUM_THREADS=N,P
  export OMP_NUM_THREADS

This will create N threads for the first level of parallelism, and then P threads for each outer level thread when an inner level of parallelism is encountered.

Note: If the environment variable OMP_NUM_THREADS is not set, the default value can vary from compiler to compiler, and for different vendor libraries, usually to either be 1 or else equal to the maximum number of cores available on your system. The latter could be an issue if you are sharing the system with other users, or are running a higher level of parallelism within your own application. Thus it is recommended that you always set OMP_NUM_THREADS explicitly to your desired value.

In general, the maximum number of threads you are recommended to use is the number of physical cores on your shared memory system. However, most Intel processors support a facility known as Hyper-Threading Technology, which allows each physical core to support up to two threads at the same time and thus appear to the operating system as two logical cores. It may be beneficial to make use of this functionality, but this choice will depend on the particular algorithms and problem size(s) used. You are advised to benchmark performance critical applications with and without making use of the additional logical cores, to determine the best choice for you. This can normally be achieved simply by an appropriate choice for the number of threads to use, via OMP_NUM_THREADS. Completely disabling hyper-threading normally requires setting the desired choice in the BIOS on your system at boot time.

3.2. Example Programs

The example results distributed were generated at Mark 26, using the software described in Section 2.2 of the Installer's Note. These example results may not be exactly reproducible if the example programs are run in a slightly different environment (for example, a different C compiler, a different compiler library, or a different set of BLAS or LAPACK routines). The results which are most sensitive to such differences are: eigenvectors (which may differ by a scalar multiple, often -1, but sometimes complex); numbers of iterations and function evaluations; and residuals and other "small" quantities of the same order as the machine precision.

The distributed example results are those obtained with the static library libnagc_mkl.a (i.e. using the MKL BLAS and LAPACK routines). Running the examples with NAG BLAS or LAPACK may give slightly different results.

Note that the example material has been adapted, if necessary, from that published in the Library Manual, so that programs are suitable for execution with this implementation with no further changes. The distributed example programs should be used in preference to the versions in the Library Manual wherever possible. The example programs are most easily accessed by using one of the following scripts, which are located in the directory [INSTALL_DIR]/scripts:

nagc_example_mkl, to link with the NAG static library libnagc_mkl.a and the supplied MKL libraries
nagc_example_shar_mkl, to link with the NAG shareable library libnagc_mkl.so and the supplied MKL libraries
nagc_example, to link with the NAG self-contained static library libnagc_nag.a
nagc_example_shar, to link with the NAG self-contained shareable library libnagc_nag.so

Each command will provide you with a copy of an example program (and its data and options file, if any), compile the program and link it with the appropriate libraries (showing you the compile command so that you can recompile your own version of the program). Finally, the executable program will be run (with appropriate arguments specifying data, options and results files as needed), with the results being sent to a file and to the command window.

The example program concerned, and the number of OpenMP threads to use, are specified by the arguments to the command, e.g.

nagc_example_mkl e04ucc 4

will copy the example program and its data and options files (e04ucce.c, e04ucce.d and e04ucce.opt) into the current directory, compile and link the program and run it using 4 OpenMP threads to produce the example program results in the file e04ucce.r.

3.3. Data Types

In this implementation, the NAG types Integer and Pointer are defined as follows:

NAG Type	C Type	Size (bytes)
`Integer`	`long`	`8`
`Pointer`	`void *`	`8`

The values for sizeof(Integer) and sizeof(Pointer) are also given by the a00aac example program. Information on other NAG data types is available in the How to Use the NAG Library and its Documentation section of the Library Manual (see Section 5).

3.4. Maintenance Level

The maintenance level of the Library can be determined by compiling and executing the example that calls a00aac, or you could call one of the nagc_example* scripts with the argument a00aac. See Section 3.2. This example prints out details of the implementation, including title and product code, compiler and precision used, mark and maintenance level.

4. Routine-specific Information

Any further information which applies to one or more routines in this implementation is listed below, chapter by chapter.

Routines that call User Functions within OpenMP Parallel Regions

In this implementation, the following routines make calls to user functions from within OpenMP parallel regions located inside the NAG routines:
```
 e05ucc  e05usc  f01elc  f01emc  f01flc  f01fmc  f01jbc  f01jcc
 f01kbc  f01kcc  
```
Thus orphaned OpenMP directives can be used in user functions, unless you are using a different compiler from the one used to build your NAG Library implementation, as listed in the Installer's Note, Section 2.2. You must also ensure that you use the user workspace arrays IUSER and RUSER in a thread safe manner, which is best achieved by only using them to supply read-only data to the user functions.
c06
In this implementation, calls to the Intel Discrete Fourier Transforms Interface (DFTI) routines, from the supplied MKL library, are made whenever possible in the following NAG routines:
```
 c06pac  c06pcc  c06pfc  c06pjc  c06pkc  c06ppc  c06pqc  c06prc
 c06psc  c06puc  c06pvc  c06pwc  c06pxc  c06pyc  c06pzc  c06rac  
 c06rbc  c06rcc  c06rdc
```
f06, f07, f08 and f16
In Chapters f06, f07, f08 and f16, alternate routine names are available for BLAS and LAPACK derived routines. For details of the alternate routine names please refer to the relevant Chapter Introduction. Note that applications should reference routines by their BLAS/LAPACK names, rather than their NAG-style names, for optimum performance.
Many LAPACK routines have a "workspace query" mechanism which allows a caller to interrogate the routine to determine how much workspace to supply. Note that LAPACK routines from the MKL library may require a different amount of workspace from the equivalent NAG versions of these routines. Care should be taken when using the workspace query mechanism.
In this implementation, calls to BLAS and LAPACK routines in the non-self-contained NAG libraries are implemented by calls to MKL, except for the following routines:
```
blas_damax_val  blas_damin_val  blas_daxpby     blas_ddot       blas_dmax_val
blas_dmin_val   blas_dsum       blas_dwaxpby    blas_zamax_val  blas_zamin_val
blas_zaxpby     blas_zsum       blas_zwaxpby
```
The following NAG named routines in the non-self-contained NAG libraries are wrappers to call LAPACK routines from the vendor library:
```
nag_dgetrf/f07adc  nag_dgetrs/f07aec  nag_zgetrf/f07arc  nag_zgetrs/f07asc
nag_dgbtrs/f07bec  nag_zgbtrs/f07bsc  nag_dpotrf/f07fdc  nag_dpotrs/f07fec
nag_zpotrf/f07frc  nag_zpotrs/f07fsc  nag_dpptrs/f07gec  nag_zpptrs/f07gsc
nag_dpbtrs/f07hec  nag_zpbtrs/f07hsc  nag_dgeqrf/f08aec  nag_dormqr/f08agc
nag_zgeqrf/f08asc  nag_zunmqr/f08auc  nag_dsytrd/f08fec  nag_zhetrd/f08fsc
nag_dsptrd/f08gec  nag_dopgtr/f08gfc  nag_zhptrd/f08gsc  nag_zupgtr/f08gtc
nag_dsteqr/f08jec  nag_zsteqr/f08jsc  nag_dgebrd/f08kec  nag_zgebrd/f08ksc
nag_dbdsqr/f08mec  nag_zbdsqr/f08msc
```

s10 - s21

The behaviour of functions in these Chapters may depend on implementation-specific values.

General details are given in the Library Manual, but the specific values used in this implementation are as follows:

s10aac  E_1 = 1.8715e+1
s10abc  E_1 = 7.080e+2
s10acc  E_1 = 7.080e+2

s13aac  x_hi = 7.083e+2
s13acc  x_hi = 1.0e+16
s13adc  x_hi = 1.0e+17

s14aac  fail.code = NE_REAL_ARG_GT if x > 1.70e+2
        fail.code = NE_REAL_ARG_LT if x < -1.70e+2
        fail.code = NE_REAL_ARG_TOO_SMALL if abs(x) < 2.23e-308
s14abc  fail.code = NE_REAL_ARG_GT if x > x_big = 2.55e+305

s15adc  x_hi = 2.65e+1
s15aec  x_hi = 2.65e+1
s15agc  fail.code = NW_HI if x >= 2.53e+307
        fail.code = NW_REAL if 4.74e+7 <= x < 2.53e+307
        fail.code = NW_NEG if x < -2.66e+1

s17acc  fail.code = NE_REAL_ARG_GT if x > 1.0e+16
s17adc  fail.code = NE_REAL_ARG_GT if x > 1.0e+16
        fail.code = NE_REAL_ARG_TOO_SMALL if 0 < x <= 2.23e-308
s17aec  fail.code = NE_REAL_ARG_GT if abs(x) > 1.0e+16
s17afc  fail.code = NE_REAL_ARG_GT if abs(x) > 1.0e+16
s17agc  fail.code = NE_REAL_ARG_GT if x > 1.038e+2
        fail.code = NE_REAL_ARG_LT if x < -5.7e+10
s17ahc  fail.code = NE_REAL_ARG_GT if x > 1.041e+2
        fail.code = NE_REAL_ARG_LT if x < -5.7e+10
s17ajc  fail.code = NE_REAL_ARG_GT if x > 1.041e+2
        fail.code = NE_REAL_ARG_LT if x < -1.9e+9
s17akc  fail.code = NE_REAL_ARG_GT if x > 1.041e+2
        fail.code = NE_REAL_ARG_LT if x < -1.9e+9
s17dcc  fail.code = NE_OVERFLOW_LIKELY if abs(z) < 3.92223e-305
        fail.code = NW_SOME_PRECISION_LOSS if abs(z) or fnu+n-1 > 3.27679e+4
        fail.code = NE_TOTAL_PRECISION_LOSS if abs(z) or fnu+n-1 > 1.07374e+9
s17dec  fail.code = NE_OVERFLOW_LIKELY if AIMAG(z) > 7.00921e+2
        fail.code = NW_SOME_PRECISION_LOSS if abs(z) or fnu+n-1 > 3.27679e+4
        fail.code = NE_TOTAL_PRECISION_LOSS if abs(z) or fnu+n-1 > 1.07374e+9
s17dgc  fail.code = NW_SOME_PRECISION_LOSS if abs(z) > 1.02399e+3
        fail.code = NE_TOTAL_PRECISION_LOSS if abs(z) > 1.04857e+6
s17dhc  fail.code = NW_SOME_PRECISION_LOSS if abs(z) > 1.02399e+3
        fail.code = NE_TOTAL_PRECISION_LOSS if abs(z) > 1.04857e+6
s17dlc  fail.code = NE_OVERFLOW_LIKELY if abs(z) < 3.92223e-305
        fail.code = NW_SOME_PRECISION_LOSS if abs(z) or fnu+n-1 > 3.27679e+4
        fail.code = NE_TOTAL_PRECISION_LOSS if abs(z) or fnu+n-1 > 1.07374e+9

s18adc  fail.code = NE_REAL_ARG_TOO_SMALL if 0 < x <= 2.23e-308
s18aec  fail.code = NE_REAL_ARG_GT if abs(x) > 7.116e+2
s18afc  fail.code = NE_REAL_ARG_GT if abs(x) > 7.116e+2
s18dcc  fail.code = NE_OVERFLOW_LIKELY if abs(z) < 3.92223e-305
        fail.code = NW_SOME_PRECISION_LOSS if abs(z) or fnu+n-1 > 3.27679e+4
        fail.code = NE_TOTAL_PRECISION_LOSS if abs(z) or fnu+n-1 > 1.07374e+9
s18dec  fail.code = NE_OVERFLOW_LIKELY if REAL(z) > 7.00921e+2
        fail.code = NW_SOME_PRECISION_LOSS if abs(z) or fnu+n-1 > 3.27679e+4
        fail.code = NE_TOTAL_PRECISION_LOSS if abs(z) or fnu+n-1 > 1.07374e+9

s19aac  fail.code = NE_REAL_ARG_GT if abs(x) >= 5.04818e+1
s19abc  fail.code = NE_REAL_ARG_GT if abs(x) >= 5.04818e+1
s19acc  fail.code = NE_REAL_ARG_GT if x > 9.9726e+2
s19adc  fail.code = NE_REAL_ARG_GT if x > 9.9726e+2

s21bcc  fail.code = NE_REAL_ARG_LT if an argument < 1.583e-205
        fail.code = NE_REAL_ARG_GE if an argument >= 3.765e+202
s21bdc  fail.code = NE_REAL_ARG_LT if an argument < 2.813e-103
        fail.code = NE_REAL_ARG_GT if an argument >= 1.407e+102

x01

The values of the mathematical constants are provided in the header file nagx01.h:
```
X01AAC (pi) = 3.1415926535897932
X01ABC (gamma) = 0.5772156649015328
```

x02

The values of the machine constants are provided in the header file nagx02.h:

The basic parameters of the model

X02BHC   = 2
X02BJC   = 53
X02BKC   = -1021
X02BLC   = 1024

Derived parameters of the floating-point arithmetic

X02AJC   = 1.11022302462516e-16
X02AKC   = 2.22507385850721e-308
X02ALC   = 1.79769313486231e+308
X02AMC   = 2.22507385850721e-308
X02ANC   = 2.22507385850721e-308

Parameters of other aspects of the computing environment

X02AHC   = 1.42724769270596e+45
X02BBC   = 9223372036854775807
X02BEC   = 15

X06
Chapter X06 routines also change the behaviour of MKL threading in this implementation of the Library.

5. Documentation

The Library Manual is available as a separate installation, via download from the NAG website. The most up-to-date version of the documentation is accessible via the NAG website at http://www.nag.co.uk/content/nag-c-library-manual.

The Library Manual is supplied in the following formats:

HTML5, a fully linked version of the manual using HTML and MathML (recommended for browsing) and providing links to the PDF version of each document (recommended for printing); and
PDF, a full PDF manual
- browsed using the PDF bookmarks, or
- via HTML index files.

The following main index files have been provided for these formats:

  nagdoc_cl26/html/frontmatter/manconts.html
  nagdoc_cl26/pdf/frontmatter/manconts.pdf
  nagdoc_cl26/pdf/frontmatter/manconts.html

Use your web browser to navigate from here. For convenience, a master index file containing links to the above files has been provided at

  nagdoc_cl26/index.html

Advice on viewing and navigating the formats available can be found in http://www.nag.co.uk/numeric/cl/nagdoc_cl26/html/genint/essint.html.

In addition the following are provided:

in.html - Installer's Note
un.html - Users' Note (this document)

Please see the Intel website for further information about MKL (https://software.intel.com/intel-mkl).

6. Support from NAG

Please see

http://www.nag.co.uk/content/nag-technical-support-service

for information about the NAG Technical Support Service, including details of the NAG Technical Support Service contact points. We would also be delighted to receive your feedback on NAG's products and services.

7. Contact Addresses

Please see

http://www.nag.co.uk/content/worldwide-contact-information

for worldwide contact details for the Numerical Algorithms Group.

NAG C Library, Mark 26, Multithreaded

CSL6I26DDL - License Managed

Linux 64 (Intel 64 / AMD64), Intel C/C++, 64-bit integers

Users' Note

Contents

Routines that call User Functions within OpenMP Parallel Regions

c06

f06, f07, f08 and f16

s10 - s21

x01

x02

X06