[OAI-implementers] Metadata Language Confusion

Hussein Suleman hussein@vt.edu
Thu, 12 Jul 2001 20:22:45 -0700


hi

well, im not sure if each of your records is in all three languages or
in one of the three languages ...

anyway, heres some quick thoughts:

firstly, i believe when you speak of your language problems you are
primarily referring to the language of the metadata (as opposed to the
language of the digital objects which can be encoded by using the
DC.language tag) ... in this case, what you really need is a way to
specify the language of the metadata record and as far as i remember
thats the reason for the "about" section of the record - information
about the metadata ... you could possibly do something like:

<record>
  <header>
    <identifier>oai:dummy:12345</identifier>
    <datestamp>2000-01-01</datestamp>
  </header>
  <metadata>
    <dc>
      <title>Rekenaar-wiskunde vir kinders</title>
      <description>Hierdie woorde is net om te vys dat alles nie
in             Engels moet geskryf word</description>
      <creator>Hussein Suleman</creator>
    </dc>
  </metadata>
  <about>
    <dc>
      <language>af</language>
    </dc>    
  </about>

some points on this concoction:
- af = Afrikaans 
  (and dont pick on my Afrikaans - im badly out of practice :))
- i have left out the namespace/schema stuff since this is just
illustrative
- english tag names and non-english text looks strange - i am presuming
this is valid - does anyone know why this is not the case ?

this would handle some kind of tagging model for each record being in
exactly one language ... if the records are in all 3 languages, you
should choose one to be the normative version (presumably the one the
article was written in) and disseminate that in response to an oai_dc
request ...

now, in general, DC is only the basic requirement ... i strongly suggest
using a metadata format for exchanging citation information which is
multilingual-aware ... as an example from another community, check out
the experimental ETD-MS format (Electronic Thesis and Dissertation
Metadata Set) ... in our case we have the same problem with documents
being authored in multiple languages and we have created a new format
which is basically DC with some extensions, one of which being the fact
that any tag may have the "xml:lang" attribute and may have multiple
instances (thus supporting multiple translations for any or all fields
in the metadata set)

[schema at http://oai.dlib.vt.edu/OAI/etdms.xsd, description at
http://www.ndltd.org/standards/metadata/current.html and if you view the
VTETD collection in http://purl.org/net/oai_explorer you can check out
some live examples]

hope these comments help ...

ttfn
----hussein

-- 
========================================================================
hussein suleman -- hussein@vt.edu -- vtcs -- http://purl.org/net/hussein
========================================================================