DML-CZ: Czech Digital Mathematics Library
by Jirí Rákosník
Mathematics, much more than any other area of science, depends on access to literature that may be tens of or even hundred years old. The rapidly increasing extent of this kind of information makes efficient searching and navigation difficult, especially if the majority of works remain accessible only in paper form. Contemporary scholarly literature is commonly available in electronic form online, which enables information to be stored, organized, searched and accessed in a digital environment. It would be highly advantageous if this were also possible for the older body of literature.
A number of recent projects worldwide - JSTOR and NUMDAM, for example - were set up with the aim of digitizing historical mathematical literature. Having different initiatives working on the same problem might result in many different formats and interfaces. To avoid such mess, discussions started in order to define common standards and best practices. In addition, conditions were set for interlinking the individual projects in an ambitious system called the World Digital Mathematical Library (WDML). The entire mathematical literature is estimated to consist of approximately 50 million pages.
Encouraged by these activities, the Czech Mathematical Society initiated a national digitization project called DML-CZ: Czech Digital Mathematics Library (see http://dml.cuni.cz for more details about the project and other digitization initiatives). Proposed for the period 2005–2009, it is supported by the Academy of Sciences of the Czech Republic within the framework of the national research programme Information Society.
The aim of the project is to investigate, develop and apply techniques, methods and tools that would allow the creation of a suitable infrastructure and conditions for establishing what will become the DML-CZ. It will consist of the historical mathematical literature published in the Czech lands, and upon completion it will be incorporated into the WDML. The project will involve launching the digitization process and providing end users with access to the digitized material. It will also involve research into advanced technologies for searching mathematical documents, and for including both existing and future 'born-digital' materials. Presumably, in view of the common history and lingual similarity, suitable Slovak mathematical literature will also be included.
Creating an adequate digital library is a complex task and requires numerous problems to be solved. These include the following areas, which will be tackled within the project:
- Acquisition: technical preparation of materials to be digitized; intellectual property and copyright issues
- Digitization: setting technical parameters compatible with the WDML Best Practice Statements; setting the digitization workflow; selection and adaptation of software supporting the digitization process; OCR processing and post-processing; provision of metadata
- Digital documents: Digital Objects structure specification; defining standards for descriptive, structural and administrative metadata; global persistent identification; archiving and presentation formats; conversions between formats and generation of digital derivatives; inclusion of born-digital materials; automatic conversions of visually marked OCR data into logically structured documents
- Digital library: implementation of the Content Management System; providing access to the digitized material; interlinking the content with the reference databases ZMATH and MathSci- Net; research and implementation of advanced search techniques; the DML-CZ administration including long-term preservation of the digital content
- Integration of the DML-CZ in the WDML.
The testbed for the DML-CZ is being built upon digitized documents from the Czechoslovak Mathematical Journal. The electronic material created within the DIEPER project (Mathematica Bohemica and Commentationes Mathematicae Universitatis Carolinae) offers another possibility.
 |
| The proposed scheme of the DML-CZ. |
The complexity of this task requires the expertise of specialists in distinct fields. The team therefore consists of five groups from different institutions:
- Mathematical Institute AS CR, Prague (project co-ordination, selection and preparation of materials for digitization, IPR and copyright issues)
- Institute of Computer Science, Masaryk University, Brno (technical integration, development of the digital library for the DML-CZ, metadata provision coordination, incorporation of the DML-CZ into the WDML)
- Faculty of Computer Science, Masaryk University, Brno (OCR post-processing, techniques for searching and presenting digital documents)
- Faculty of Mathematics and Physics, Charles University, Prague (user requirements, metadata specifications, links to ZMATH and MathSciNet)
- Library of the Academy of Sciences, Prague (digitization, OCR, storage and presentation of digitized content within the Kramerius digital library system).
Links:
http://www.jstor.org
http://www.numdam.org
http://gdz.sub.uni-goettingen.de/dieper
Please contact:
Jirí Rákosník, Mathematical Institute AS CR, Prague, Czech Republic
Tel: +420 221403446
E-mail: rakosnik
kav.cas.cz