READ - Recognition and Document Analysis
by Liliane Peters and Ashutosh Malaviya
The project READ aims at increasing the efficiency of the current
"Recognition and Document Analysis" technology. The main goal
in READ is to combine and refine document analysis techniques, ranging
from low level picture processing over document structure analysis to linguistic
extraction, into a general framework.
The objects to be recognized in READ are examples from the three main
domains of document analysis: addresses, forms and documents. The combination
and integration of the expertise of the project partners in these domains
is both a promising but also a challenging task. READ is funded by the
German Federal Ministry of Education, Science, Research and Technology.
To achieve the pursued goals, the implementation of intelligent systems
is required. These systems should ensure robustness towards data errors
and adaptivity towards unknown data. The progress in document recognition
and analysis methodologies is expected to be realised through the following
research activities:
- acquisition of documents
- object extraction and recognition
- modeling and interpretation.
The outcome of these cooperative research activities will be integrated
into a prototype application.
GMD is mainly involved in the activities related to object recognition,
and benchmarking of object recognition systems modeling and interpretation.
The contribution of GMD Institute for System Design Technology is the
development of robust object recognition methodologies. Soft computing
methods, such as the combination of fuzzy logic and neural networks, are
used to recognize cursive handwriting, as well as isolated words. The cognitive
processing of segmented document objects is the focus of the proposed method.
Fuzzy grammars are used to represent the unconnected and incomplete feature
information of hand-written documents. 
The FOHDEL language, which was developed by GMD scientists for on-line
character recognition, is extended and applied to represent the fuzzy rules
for word recognition. Neural networks supplement the fuzzy set theoretic
techniques to generate the expert information automatically and thus generate
the rule-bases for the recognition process. This capability is developed
also to adapt the recognition system on-the-fly to new document environments.
A major activity of this work package will be the integration of the
various object recognition approaches developed by GMD, Siemens EletroCom
and University Koblenz into a unique toolbox.
The benchmarking activities of the project were divided into several
working packages, corresponding to the various levels of document recognition.
GMD Institute for System Design Technology will participate in the benchmarking
activities related to object recognition. Therefore a server with reference
data collected from the partners was made available.
By defining a common data interface and common test and verification
data sets, all partners can test their algorithms, even during the development
and implementation phases. This should reduce the complexities of final
integration phase which comprises the combination of various object recognition
approaches and will offer a better overview of the advantages of each algorithmic
strategies. At the end of the project an improved recovery of classification
errors is expected.
Please contact:
Liliane Peters - GMD
Tel: + 49 2241 14 2332
E-mail: liliane.peters@gmd.de
Ashutosh Malaviya - GMD
Tel: + 49 2241 14 2751
E-mail: ashutosh.malaviya@gmd.de