Eleventh ERCIM Database Research Group Workshop: Metadata for Web Databases
by Brian J Read
The latest in the series of ERCIM Database Research Group workshops
was held at GMD Birlinghoven Castle near Bonn on 26 May 1998. The topic
'Metadata for Web Databases' attracted significant interest among members
of the working group, resulting in a very full day of presentations and
discussion. Karl Aberer of GMD-IPSI organised and chaired the workshop.
There were sixteen participants from ten institutes.
In the context of Web data management, database systems are mostly used
in an isolated way as data sinks or sources. Data management services that
exploit and support the connectivity of the Web require the interaction
and co-operation of different data management components on the Web. To
enable this the Web needs to be equipped with the metadata on structure
and behaviour of Web data that these components require. Thus the workshop
was intended to address such questions as the extraction, modelling and
querying of metadata, so adding semantics to the use of web data.
Keith Jeffery (CLRC-RAL) introduced the workshop topic by presenting
an overview of the nature of metadata in databases, distinguishing its
various purposes, and classifying it into three main kinds: schema, navigational
and associative. Capturing metadata from the web presents problems as virtual
pages generated from database queries are invisible to the large web crawlers.
The limitations of HTML, and indeed XML, in managing metadata were discussed
in this and several subsequent talks.
Yannis Stavrakas (NTU-Athens) expanded on the nature of metadata for
web-based information systems. He distinguished three perspectives corresponding
to the atomic level (information within a page or document), the local
level (the structure of a site and links between documents), and the global
information space of the whole web.
Terje Brasethvik (IDI/NTNU-Trondheim), currently in Paris, described
his work with Arne Sølvberg on a Referent Model of Documents classified
by semantic metadata. In this approach to sharing information on the web,
they are developing a modelling language and editor to capture the meaning
of documents.
Giuseppe Sindoni (Rome III University), currently visiting RAL, presented
work from Paolo Atzeni's Rome group on a logical model for metadata in
web bases. Their Araneus Data Model with the Penelope language embeds the
schema within HTML. Turning to XML is potentially attractive, but that
too has limitations for data modelling.
Three research projects were covered in the afternoon session. Menzo
Windhouwer (CWI) described the work with Martin Kersten on the Acoi project.
This is developing a feature detector engine to classify multimedia objects,
especially images. The Acoi web robot has already stored in a database
details extracted from over two hundred thousand images.
Thomas Klement (GMD-IPSI) spoke about the ICE (Information Catalogue
Environment) project. This concerns metadata for multidimensional categorisation
and navigation support on multimedia documents. It includes an interesting
use of dynamic menus to explore hypercube structures stored in an object-relational
database.
The last presentation was from Donatella Castelli (CNR-Pisa) about supporting
retrieval by "relation among documents" in the ERCIM Technical
Reference Library (ETRDL) based on the Dienst system and the Dublin core.
This provided an interesting discussion on the possible semantics of a
relationship defined between documents.
The workshop concluded with a lively panel and discussion session on
the future research direction of EDRG and also its role in the EC Fifth
Framework Programme. A relevant component of the latter is "Creating
a User Friendly Information Society", especially Key Actions relating
to application domains. This suggested that future workshops might be targeted
towards an application area (such as transport, environment or health)
instead of a technical topic. CWI emphasised semantic indexing of the web,
in particular by involving the end user, in an ambitious research agenda
and cautioned against being too much influenced by funding considerations.
The workshop papers will be published in the ERCIM reports series (http://www.ercim.org/publication/workshop_reports.html).
Please contact:
Brian J Read - CLRC
Tel: +44 1 235 44 6492
E-mail: b.j.read@rl.ac.uk