SARI - A System for Semantical Information Retrieval
by Kuldar Taveter
TILA (Tools for Information Retrieval and Organization) is a project
of the Finnish National Multimedia Research Programme whose goal
is to design and develop an agent-based system for retrieval and
organization of heterogeneous information that can be in different
forms and lie in different locations. The SARI (Software Agents
for Retrieval of Information) system is intended to act as a broker
between human users or other computerized systems (ie applications)
needing information at one end, and heterogeneous information
sources with different search engines at the other.
SARIs architecture reflects the systems role as a broker between
its users and information sources. In the above figure SARIs
agents of the following types are depicted:
- Application Agents represent the users (humans or other computerized
systems) to the SARI system. They send agent messages containing
information retrieval requests to Control Agents
- Search Agents mediate information sources. They compile queries
coming from Control Agents into the query languages of their information
sources, and send the results back to the Control Agents
- Control Agents act as brokers in the SARI system. Each Control
Agent receives agent messages containing information retrieval
requests from Application Agents, decides to which Search Agents
it forwards the requests, sends messages containing the retrieval
requests to the appropriate Search Agents, receives messages containing
search results from the Search Agents, combines them into information
retrieval results, and sends the retrieval results back to the
Application Agents
- Ontology Agent contains metadata in the form of ontologies that
describe the conceptual structure of the information present in
the information sources used by SARI.
In addition, there are also Content Provider Agents that represent
content providers to the SARI system. Content providers are organizations
or individuals who own one or more information sources that are
accessible to the SARI system. Content Provider Agents take for
example care of mediating metadata about the information in its
information sources to SARI.
Control Agents form the heart of SARI. They make their brokering
decisions on the grounds of the user information lying in user
profiles, and of the metadata about the information to be retrieved
lying in ontologies. Control Agents can form federations with
each other, as a rule, but there is just one Control Agent in
the present pilot version of SARI.
The content of any information retrieval request originating at
some Application Agent is translated into the internal query language
SAL (SAri query Language) before it is forwarded to the Control
Agent. The query is translated into the query language of an information
source by its Search Agent. In this way, for n applications and
m information sources, only n+m compilers need to be built.
The conceptual structure of the information contained in the information
sources available to SARI is described by ontologies. An ontology
is a description of the concepts and inter-concept relationships
of some problem domain. The ontologies for relational databases
used by SARI are derived from their schemas. Ontology can also
be a classification that the information in an information source
is based on. An example of this is the APL database Ultika containing
statistical information about the Finnish foreign trade which
is used by SARI. Since SARI includes an implementation of the
Resource Description Format (RDF) proposed by the W3 Consortium,
the ontologies describing Web resources are specified as RDF schemas
and descriptions for SARI. Ontologies can be graphically browsed
in SARI.
One of the most important problems that has to be solved in semantical
information retrieval from heterogeneous sources is to reconcile
different conceptualizations of the world represented by different
information sources. In SARI the concepts of different ontologies
are linked to each other by making use of the notions of viewpoint
and bridge. The ontologies interlinked in such a way form the
ontological structure that can be viewed from different perspectives.
For example, there is a bridge between the concepts Commodity
and Product which are respectively the root classes of the classifications
under the foreign trade and manufacturing viewpoints.
Future goals with SARI include making the formation of bridges
between the concepts of different ontologies semiautomatic, and
also semiautomatic generation of RDF metadata from Web resources.
The SARI system is being worked out in Finland jointly by VTT
Information Technology, Tampere University of Technology, and
Tampere University. The project started in March 1996, and will
continue until March 1999.
Please contact:
Kuldar Taveter - VTT
Tel: +358 9 456 6044
E-mail: Kuldar.Taveter@vtt.fi