[OAI-implementers] ABOUT & Schemas & metadataPrefix

Tim Brody tdb198@soton.ac.uk
Sun, 1 Apr 2001 14:57:37 +0100


Hello one and all,

Some problems...

Validator:

I've been playing around with the validator and have found it fails with my
output, which contains an empty ABOUT section (i.e. just <about></about>).
From my understanding of the protocol I should be able to have an empty
about (my logic was that if I was going to make use of this in future I
would want any harvesters to be able to handle about correctly, so an empty
tag was better than no tag at all).

Could someone explain how the about works (or doesn't)?

(http://cite-base.ecs.soton.ac.uk/cgi-bin/oai/OAI-script)

Schemas:

Does anybody know of a "Howto" for writing OAI schemas, I really don't want
to spend hours trawling through the incomprehensibility of w3c just to make
a few tweaks?

(http://www.ecs.soton.ac.uk/~tdb198/oai/opcit_dc.xsd - fails on "relation"
type, but I can't see anything wrong ...)

metadataPrefix:

<metadata>
<dc/oai_dc xmlns=...>
</dc/oai_dc>
</metadata>

Inside the metadata tag is the metadata format tag. Should the name of this
tag be derived from the schema or as the harvester requested (i.e.
metadataPrefix)?

I wonder about this because my instinct was that this tag be the same name
as was requested (so, if the harvester requested "oai_dc" it would be
reasonable to expect the relevent section of XML to be called "oai_dc"), but
the protocol examples have this as "dc" (the same as the schema, which, I
suppose is "correct").

Of my quick survey of repositories, arXiv and NACA currently use "oai_dc"
and women writers, california international, cogprints and Humboldt
University "dc" (I haven't checked other places, but it would look like most
go with the schema/protocol examples).

From a harvesters point of view (I don't believe there are many of us :-), I
would prefer to have "oai_dc" because that tells me explicitly what data I
can expect to find, rather than having to remember what I requested (as far
as I can tell it is the one part that makes an isolated OAI response
stateful, a real pain if one is using caching or other systems).

As I am currently only harvesting from arXiv and cogprints I have explicitly
written into my code handling of "dc" and "oai_dc" as meaning the same
thing, but if this is ambiguous within the OAI protocol, or is schema
dependent, I'll need to maintain the state information related to the
metadataPrefix (n.b. all records contain a datestamp, so date state
information is stored inside the XML).

Dublin Core semantics within OAI:

I am currently working in the very restricted environments of arXiv and
cogprints. I have looked at some other repositories and noticed that some
archives differ from the "standard" formatting for authors and dates (other
fields aren't quite a simple), will validation check that a repository does
something sensible with its data for Dublin Core as well as making sure it
passes an XML validator?

All the best,
Tim.