[OAI-implementers] Re. Metadata Language Confusion

zubair@cs.odu.edu zubair@cs.odu.edu
Fri, 13 Jul 2001 09:11:21 -0400


I think multilingual discovery can be handled at the service level within
the existing OAI framework. Let me tell you how we are handling the
multilingual issue in Arc (we recently finished the implementation and
tested  for  English and German OAI compliant collections).

There are two major approaches you can take for building a multilingual
service provider. In the first approach translation of metadata is
maintained in all the languages. This reduces the cross-language retrieval
to its monolingual equivalent. In the second approach, the query is
translated to all other languages. Both approaches have been investigated
by several researchers. The first approach requires maintaining of
translated versions of all harvested metadata. For every English metadata
record, we have a corresponding German metadata record and vice versa. High
quality translated metadata requires prohibitively expensive human
translation. For this reason, we have taken the second approach that
requires only query translation.

For query translation we are using a Web based translation service along
with user assisted translation (we keep track of user selections in a
dynamic translation table). For illustration, consider user has entered the
keyword "Mobile" in the English unified interface.  Starting with this
query, we generate a query that will work for harvested metadata in German.
For translating the user-entered string, in this case "Mobile", we support
two options: user assisted, and default.  In the user-assisted option, the
user either enters the German equivalent of the term "Mobile" or selects
from a list of equivalent terms in German that are available in the dynamic
translation table. The dynamic translation table keeps track of all the
past user-entered translations for English terms. To start with, the
dynamic translation table may be empty or initialized  to a set of most
frequently used terms in the given domain. With time, the dynamic
translation table grows and will contain collective knowledge of the users
in the given domain.  The default option is invoked when the user decides
not to assist in translation. In this option, the translated term is picked
from the dynamic translation if it exists. If the term is not in the
translation table, we use a Web based translation service, such as:
http://www.freetranslation.com/. Note that in this case the quality of
translation will not be as good as what can be obtained by using the
dynamic translation table.

The multilingual version of Arc is still in trial version and we have not
made the URL public. However, if you are interested I can email you the URL
for trying out the system. Also, if you need more details on our
implementation, let me know.


Zubair (zubair@cs.odu.edu)
Department of Computer Science
Old Dominion University, Norfolk
VA 23529

Phone: 757-683-3917  Fax: 757-683-4900