Genetic Programming for Feature Extraction in Financial Forecasting
by József Hornyák and László Monostori
Artificial neural networks (ANNs) received great attention in
the past few years because they were able to solve several difficult
problems with complex, irrelevant, noisy or partial information,
and problems which were hardly manageable in other ways. The usual
inputs of ANNs are the time-series themselves or their simple
descendants, such as differences, moving averages or standard
deviations. The applicability of genetic programming for feature
extraction is investigated at the SZTAKI, as part of a PhD work.
During the training phase ANNs try to learn associations between
the inputs and the expected outputs. Although back propagation
(BP) ANNs are appropriate for non-linear mapping, they cannot
easily realise certain mathematical relationships. On the one
hand, appropriate feature extraction techniques can simplify the
mapping task, on the other hand, they can enhance the speed and
effectiveness of learning. On the base of previous experience,
the user usually defines a large number of features, and automatic
feature selection methods (eg based on statistical measures) are
applied to reduce the feature size. A different technique for
feature creation is the genetic programming (GP) approach. Genetic
programming provides a way to search the space of all possible
functions composed of certain terminals and primitive functions
to find a function that satisfies the initial conditions.
The measurement of goodness of individual features or feature
sets plays a significant role in all kinds of feature extraction
techniques. Methods can be distinguished, whether the learning/
classification/estimation phases are incorporated in the feature
extraction method (filter and wrapper approaches).
In fact, most of the financial technical indicators (Average True
Range, Chaikin Oscillator, Demand Index, Directional Movement
Index, Relative Strength Index etc.) are features of time-series
in a certain sense. Feature extraction can lead to similar indicators.
An interesting question is, however, whether such an approach
can create new, better indicators.
The techniques were demonstrated and compared on the problem of
predicting the direction of changes in the next weeks average
of daily closes for S&P 500 Index. The fundamental data were the
daily S&P 500 High, Low and Close Indices, Dow Jones Industrial
Average, Dow Jones Transportation Average, Dow Jones 20 Bond Average,
Dow Jones Utility Average and NYSE Total Volume from 1993 to 1996.

Three ANN-based forecasting models have been compared. The first
one used ANNs trained by historical data and their simple descendants.
The second one was trained by historical data and technical indicators,
while the third model used new features extracted by GP as well.
Plain ANN models did not provide the necessary generalization
power. The examined financial indicators showed interclass distance
measure (ICDM) values better than those of raw data and enhanced
the performance of ANN-based forecasting. By using GP much better
inputs for ANNs could be created improving their learning and
generalization abilities.
Nevertheless, further work on forecasting models is planned, for
example:
- extension of functions and terminals for GP
- direct application of GP for the extraction of investment decisions
- committee forecasts where some different forecasting systems work
for the same problem and these forecasts are merged.
This project is partially supported by the Scientific Research
Fund OTKA, Hungary, Grant No. T023650.
Please contact:
László Monostori - SZTAKI
Tel: +36 1 466 5644
E-mail: laszlo.monostori@sztaki.hu