by Anton Heijs, MINT b.v. and Ruud Smeulders, Robeco Group
Most businesses today have realised that improving the way in which they interpret their data is the key to making better business decisions. To be in control, companies collect large amounts of data and store these in databases and data warehouses. Without any a priori knowledge about the correlation between data records, a variety of techniques can be used to analyse and interpret the data. Well-known data mining methods are mostly related to statistical techniques, neural networks or genetic algorithms. The results from these methods are often complex and multi-dimensional, but many data mining packages lack the functionality for displaying them. Visualization techniques-such as those offered by IRIS Explorer-have a useful role to play here, and offer other benefits such as direct interaction and immediate feedback to the user. The combination of these techniques has been referred to elsewhere as visual data mining.
A good example of an area where knowledge and insight has to be absolutely up-to-date is marketing research. With a short time to market and a rapidly changing customer demand, it is vital that data is presented in an intuitive manner so that decisions can be taken quickly. In this article, we describe the application of visualization techniques along with statistical analysis methods to a large marketing dataset collected by a Dutch financial asset management company. They wanted to use their data to improve communications with a very large population of clients. Specifically, they were keen to ensure that their marketing of certain financial products was targetted at clients with a high degree of entrepreneurship, who would be more likely to purchase them. We have found that the use of visualization has given the company better insight into their data, which has enabled them to make better-informed decisions.
The dataset consisted of marketing data for 25,000 customers. Each customer was characterised by values for more than 100 variables (such as age, income and investment history for example) which was partly obtained from the customer database, and partly from a customer questionnaire. The company was interested in using the dataset to identify entrepreneurs in their full client base (which consisted of more than half a million customers). To determine this, a measure of entrepreneurship was first defined in terms of some of the variables relating to investment history in the dataset. The objective was then to discover the degree of correlation between these variables and the rest of the data (such as age, for example). Accordingly, the correlation matrix for all of the variables was constructed for the 25,000 customer sample and displayed in IRIS Explorer.
The correlation matrix was read into an IRIS Explorer map, where the Graph3D module was used to display it as a 3D histogram (see figure overleaf). This gave an immediate impression of the significant correlations in the dataset, which show up as 'walls' in the histogram (the wall across the diagonal is the autocorrelation-of each variable with itself-and can be discounted here). For example, it was discovered that the degree of entrepreneurship exhibited by a client was strongly correlated with their age. Next, the user interactively selected strongly correlated variable pairs by picking on bars in the histogram. An additional one or two further variables can also be selected at this point. These variables are then subject to a cluster analysis; the data values are 'binned' into ranges of interest (for example, age between 20 and 25, between 26 and 30, etc) and sorted (see figure). The binning and sorting were both performed interactively in the IRIS Explorer map. Finally, each of the data points is displayed on a 3D scatter plot (see figure), with the selected variables being mapped onto x, y and z coordinates, marker size and color.
The market researchers found that the display and interactivity offered by the IRIS Explorer application helped them in analyzing and interpreting their data by finding the significant variables in the identification of entrepreneurship. Using the 3D 'correlation landscape' they could identify important patterns in their data. By selecting highly correlated variables they could then focus on the data distributions and select the proper data ranges for a cluster analysis. In some cases, the cluster visualization revealed non-linear relationships in the data at a detailed level, which would have been very difficult to identify using other methods.
We found that the visualization techniques offered by IRIS Explorer proved to be a valuable additional tool for marketing researchers, which enabled them to more accurately target future communications with their customer base.
Our use of IRIS Explorer for business visualization is continuing. For more information about this work, please see the MINT web page at http://www.mint.nl/.