Extracting data from Microsoft Access and Excel for use with NAG functions

by S.J. Langdell

Abstract
NAG supplies data in text form for the example programs it distributes with its numerical libraries. Whereas text form makes the programs simple and portable, it may not be the form in which users store data. In particular, a database enables users to better organise their data and to extract subsets of data according to a set of rules by using a Structured Query Language (SQL) statement. This article provides and describes C++ source code that extracts data from Microsoft Access and Excel databases. The software has been tested using Microsoft Visual Studio 6 and Office 2000.

ODBC databases
Both Microsoft Access and Excel are examples of databases that are compliant with the Open Database Connectivity (ODBC) standard. On Windows, a connection to either of these databases can be made by using:

  • DSN connections. A Data Source Name (DSN) is a name that hides from the user connection details such as database name and directory, database driver, username, and password. Having set up a DSN, you do not have to remember these details when making an ODBC connection to the database. There are three types of DSN connection:

    • System DSN. Anyone with access to the machine can see a System DSN and attempt (the database may still be password protected) to access its database. System DSNs are stored in the Windows registry;

    • User DSN. Only the user that set up the connection can see and make a connection with a User DSN. User DSNs are stored in the Windows registry;

    • File DSN. A text file containing fields that describe a DSN connection. All users with access to the machine can attempt to make an ODBC connection with a File DSN.

    In Windows XP new DSN connections can be setup by using the Data Sources (ODBC) shortcut in Administrative Tools.

  • DSN-less connections. As an alternative to DSN connections, the following information can be supplied explicity in order to make an ODBC connection to a database:

    • The name of an installed driver relevant to the database

    • The file name, including the full path and drive letter;

    • Username, if the database is restricted to a user;

    • Password, if the database is password protected.

SQL statement The software supports the SQL keywords SELECT, FROM and WHERE in a SQL statement of the form:

  SELECT < Column list > FROM < Table list > WHERE < Condition >

wherein the WHERE clause is optional and

  • < Table list > describes the table(s) containing the data of interest in the database;
  • < Column list > columns name(s) in the table or * to signify all columns;
  • < Condition > if the WHERE clause is present, selects a subset of rows. For example, a condition on a column name AGE may be AGE > 65.

The above gives only a brief description of the SELECT, FROM and WHERE keywords; for more detail, see, for example, this tutorial.

Description of the source code
The software defines three classes in the header files: ODBCRead.h, ODBCDSN.h and ODBCDSNless.h; classes ODBCDSN and ODBCDSNless inherit the base class ODBCRead. As the names suggest, one class is designed for DSN connections and another for DSN-less connections (the base class ODBCRead is not designed to be called directly). The files: ODBCRead.cpp, ODBCDSN.cpp and ODBCDSNless.cpp implement the methods particular to their respective header files.

The ODBCDSN class has the constructor:

  ODBCDSN::ODBCDSN(CString strTable, CString strColumn = "*",
                   CString strCondition = "");

and, therefore, one mandatory argument (strTable) and two optional arguments, where:

  • strTable is the < Table list > part of a SQL statement;
  • strColumn is the < Column list > part of a SQL statement;
  • strCondition is the < Condition > part of a SQL statement.

Note that by default all columns are selected and there is no WHERE clause.

The ODBCDSNless class has the constructor:

  ODBCDSNless::ODBCDSNless(CString strFile, CString strTable,
                         CString strColumn = "*", CString strCondition = "");

and, therefore, two mandatory arguments and two optional arguments. The only difference between the constructors ODBCDSN and ODBCDSNless is that the strFile mandatory argument must contain the name and full path of either an Access or Excel file.

Using the source code
There are two steps in using the source code. Firstly, select the correct constructor; ODBCDSN for DSN connections and ODBCDSNless for DSN-less connections, making use as appropriate of optional arguments. Secondly, access the following members of an ODBCDSN or ODBCDSNless object for use in other applications:

  • MYDB.n the number of records returned by the SQL statement;

  • MYDB.m the number of fields returned by the SQL statement;

  • MYDB.data a double array of MYDB.n times MYDB.m data values stored by column;

  • MYDB.fail a boolean type set to true if a constructor had a problem.

Access examples
Given an Access database, to

  • get all data in a table by using a DSN connection:
        ODBCDSN MYDB(strTable);
    
  • get all data in a table by using a DSN-less connection:
        ODBCDSNless MYDB(strFile,strTable);
    
  • select all fields of records with field name “Species” equal to 3 by using a DSN-less connection:
        ODBCDSNless MYDB(strFile,strTable,"*","Species=3");
    
  • select the field values of a field name “Petal Width” from records that have field name “Species” equal to 3 by using a DSN connection:
        ODBCDSN MYDB(strTable,"Petal Width","Species=3");
    

Excel examples
The use of the constructors ODBCDSN and ODBCDSNless is the same for Excel databases as for Access databases. However, there are several issues that Excel users should be aware of to better ensure that SQL queries work against an Excel spreadsheet, namely:

  • make sure that the first row of an array of data is a header row containing field names;
  • start the data values in the second row and do not skip any rows or columns;
  • tables in Excel spreadsheets can be identified in two ways. If all data is contained in an array on sheet number k, use “Sheetk$”; for example, “Sheet2$”. If there are several data arrays on the sheet containing the data of interest, select Insert | Name &rarr Define on the Excel menu bar to associate a name with a region of data;
  • field names are most easily replaced in a SQL statement by the text “col” followed by their numerical position, for example, “col4” for a field in the fourth column of a spreadsheet;
  • field names are case insensitive.

If the name held by strTable is associated with an array of data, the following makes a DSN connection:

    ODBCDSN MYDB(strTable);

Assuming the array that strTable represents is the only data on sheet two of a workbook, the following achieves the same result:

    ODBCDSN MYDB("Sheet2$");

See the Access examples for example of DSN-less connections and more elaborate SQL statements.

Troubleshooting
If the software complains that the driver cannot be found, check the names of drivers as written in the function ODBCDSNless::connectString in the source code are present on your system. It is possible that your driver names differ slightly, in which case change the source code accordingly. In order to see the drivers on your system open Data Sources in Administrative Tools and click on the Drivers tab. If your system does not have any (*.xls) or (*mdb) drivers, you should download and install the Microsoft Data Access Components (MDAC).

Call the info member function that will display the string used to connect to a database and the SQL statements; for example, given an object named MYDB:

  MYDB.info();

will display the connection and SQL strings if the code returns an error; to return the same information even when no error is returned use:

  MYDB.info(true);

Example source code
The following source code extracts all data values from a table “Iris Data” in an Access database by using a DSN connection and prints the result to stdout:

  #include < iostream >
  #include "ODBCRead.h" // Includes both ODBCDSN.h and ODBCDSNless.h

  using namespace std;

  int main(void) {

    ODBCDSN MYDB("Iris Data");

    IRIS.info(); // If it doesn't work as expected, display debug info

    cout << "Read " << MYDB.n << " records with " << MYDB.m << " data value"
         << (MYDB.m > 1 ? "s" : "") << "\n\n";

    // Do something interesting with the data array ... instead of this:
    for (long i = 0, k = 0; i < MYDB.n; i++, k += MYDB.m) {
        for (long j = 0; j < MYDB.m; j++)
            cout << MYDB.data[k + j] << " ";
        cout << endl;
    }

    return (int)0;
  }