Mining Financial Time Series
by Arno Siebes
A lot of financial data is in the form of time-series data, eg,
the tick data from stock markets. Interesting patterns mined from
such data could be used for, eg, cleaning the data or spotting
possible market opportunities.
Mining time-series data is, however, not trivial. Simply seeing
each individual time-series as a (large) record in a table pre-supposes
that all series have the same length and sampling frequency. Moreover,
straightforward application of standard mining algorithms to such
tables means that one forgets the time structure in the series.
To overcome these problems, one can work with a fixed set of characteristics
that are derived from each individual time-series. These characteristics
should be such that they preserve similarity of time-series. That
is, time-series that are similar should have similar characteristics
and vice versa. If such a set of characteristics can be found,
the mining can be done on these characteristics rather than on
the original time-series.
A confounding factor in defining such characteristics is that
similarity of time-series is not a well-defined criterion. In
the Dutch HPCN project IMPACT, in which CWI participates, we take
similarity as being similar to the human eye, and we use wavelet
analysis to define and compute the characteristics. One of the
attractive features of this approach is that different characterisations
capture different aspects of similarity. For example, Hoelder
exponents capture roughness at a pre-defined scale, whereas a
Haar representation focuses on local slope.
Currently, experiments are underway with the Dutch ABN AMRO bank
to filter errors from on-line tick-data. In the first stage, a
Haar representation is used to identify spikes in the data. In
the next stage, clustering on Hoelder exponents and/or Haar representations
will be used to identify smaller scale errors.
Please contact:
Arno Siebes - CWI
Tel: +31 20 592 4139
E-mail: Arno.Siebes@cwi.nl