Changepoint Analysis using MongoDB and NAG4PY
Recently, I attended the Alan Tayler Day at St. Catherine’s College, Oxford, organised by the Smith Institute. One of the speakers was Dr Rebecca Killick, of Lancaster University, whose talk highlighted her collaboration with NAG that has led to the inclusion of the PELT algorithm into the NAG Library. Dr Killick's collaboration with NAG started in her student days, when she was the runner-up in the "Take Aim" competition, another event run by the Smith Institute and, this year, sponsored by NAG along with Babcock, BT, CATAPULT Satellite Applications, ESPRC, Experian, GCHQ and National Nuclear Laboratory.
The PELT algorithm, of Killick et al, is designed to detect changepoints in an ordered sequence of data, for example, a time series. A changepoint is the location in the series (or time) at which one or more properties of the sequence, such as the mean, changes. A typical example of this is the time at which the average price of a stock changes to a new average value; this is an example that we will consider in more detail later in this post. PELT is an acronym for Pruned Exact Linear Time, where pruned stands for the pruning technique applied to the data to reduce the computation cost. Exact, in this situation, stands for the nature of the search algorithm for the changepoints; it is guaranteed to find the exact minimisation of a cost function used to determine the changepoints. Finally, the algorithm is linear in time, this means that, as long as the number of changepoints grows linearly, the cost of the algorithm is linear O(n), where n is the number of data points. More information about the PELT algorithm can be found in our documentation and Killick et al. (2012).
In an IPython Notebook, which can also be accessed via NAG’s GitHub, I demonstrate how one could calculate the changepoints of a stock price that has been stored in a MongoDB database. Using the example of Volkswagen, I have shown that a changepoint occurred around the time that the news broke about the recent emissions scandal. One should be careful here and note that the reason behind a changepoint is not necessarily obvious and that changepoints may not occur where one intuitively expects them to.
For those of you who are interested, the winners of the latest Take Aim prize were Georg Maierhofer and Rachael Bonnebaigt, University of Cambridge.
You can also view an IPython Notebook that does not require MongoDB here.