Semantic Querying of Scientific Data through a Context Meta-data
Database
by Epaminondas Kapetanios and Moira C. Norrie
Posing queries to scientific data addresses many problems originated
in the difficulties of understanding the data models and/or values
which usually refer to particular value domains and/or might be
expressed in specific measurement units. Besides, these values
have also behaviour captured by statistical descriptive values.
All these issues are not expressed in query languages relying
on well-established algebras such as relation or collection algebra,
which operate over a known schema. Therefore, we elaborated a
context meta-data database which helps the user to pose semantically
enriched queries through navigation of semantic information spaces.
This kind of queries can be transformed into database specific
query languages addressing database specific schemas.
The development of the context meta-data base has been guided
by case studies concerning scientific data for avalanche prediction,
which are mainly measurement data, and nominal or categorical
data for quality management in medicine. The first case study
is being funded by the Swiss National Science Foundation in cooperation
with the Swiss Federal Institute for Avalanche Research (http://www.slf.ch/slf.html),
whereas the second case study has been funded by the Institute
for Social and Preventive Medicine of the University of Zurich.
Orientation and Techniques
Semantics are usually ignored in traditional query languages because
they are mainly designed to operate over a known schema on the
basis of a well-established algebra such as a relation or collection
algebra. They are not intended to express the meaning of a query
within particular contexts. Since we want to support user querying
without a detailed knowledge of schema and/or data values by making
use not only of the structural issues but also of the semantic
ones, we add a context meta-data database to the information system
architecture.
Since our aim is to enable only meaningful queries for scientific
data, knowledge about the schema and data value domains must be
made explicit and incorporated in the intended query. It is not
only the interpretation of schema elements such as relations or
attributes, but also their classification according to the semantics
of arithmetic operations, eg an attribute can be classified either
as a measurement variable or as a categorical one. Furthermore,
finite value sets such as [-50, 50] and not infinite value domains
like integer or real expressed in specific measurement units should
be taken into account. The same holds for categorical data where
the value domain usually is a finite set of string values such
as positive, negative, indeterminate, not done, which might also
be encoded by numerical values. Moreover, behaviour of values
can be captured by statistical descriptive values.
This knowledge has to be addressed when only meaningful queries
should be composed. The context meta-data (knowledge) base represents
this knowledge in terms of semantic information spaces. Therefore,
database specific queries are replaced with semantically enriched
queries which can be implicitly formulated by navigating through
semantic information spaces. Each selected information space can
be translated to an underlying database specific query language.
This kind of constructing meaningful queries also frees the end
user from the need to learn the syntax of a particular query language.
The context meta-data database is being developed with respect
to knowledge representation issues. Posing a query towards scientific
data is done interactively through a graphical user interface
for the presentation of semantic information spaces made out of
elements for intended queries. These elements are represented
as information objects within the context meta-data database.
They stand for semantically rich descriptions of a data model
and particular value domains and are semantically associated to
each other. The underlying representational formalism for representing
semantical information spaces is that of a multi-layered directed
cyclic graph, where nodes and links are classified at various
semantic levels. The system is being implemented with an object-oriented
DBMS, OMS, which supports both aspects:
- rich classification constraints for both unary collections of
objects and binary associations,
- a directed association construct since it relies on an object-oriented
model, OM.
The knowledge elements are expressed in natural language and are
mainly classified, at a first level, into subsets called Concepts,
Properties, Value domains, Measurement units and Descriptive values.
Directed binary associations hold among these elements which can
also be self-referential - recursive definitions. Knowledge elements
and their associations at different specification levels specify
the semantic information spaces. For example, in case of avalanche
related data values, a semantic query can be expressed as set
of nodes and links to be navigated as shown in the figure.
At the moment, we are implementing a historical database for the
collection of categorical data for quality management in medicine.
The data will be collected from various clinical and/or therapeutical
institutions in Switzerland. Besides, transformation mediators
for semantic queries are being implemented for historical databases
(SQL engines) with measurement data for physical experiments concerning
avalanche research. A collaboration with the Institute for Theoretical
Computer Science, at EPFL Lausanne, will illustrate the interfacing
possibilities of the context meta-data database and SGML derivatives
for a dynamical construction of semantically enriched web documents.
Please contact:
Epaminondas Kapetanios
and M.C. Norrie - SGFI / Swiss Federal Institute of Technology
(ETH) Zurich
Tel: +41 1 63 27261 (27242)
E-mail: {kapetani,norrie}@inf.ethz.ch