[OAI-implementers] returning *data* (as opposed to metadata)

herbert van de sompel herbertv@CS.Cornell.EDU
Wed, 25 Jul 2001 12:37:28 -0400

Donna Bergmark wrote:
> Ben,
> I could not agree more with Mike Nelson's response to you.
> The Cornell Digital Library Research Group (Southampton
> partners) would also be interested in extracting references
> and citation data straight from the text of the papers.

But, obviously, if a data provider has full-content in a well-structured
format, he may as well consider exposing references in a preparsed
manner as yet another separate "information packet": doing so would lead
to 3 "metadata-formats" related to the same content: metadata, data,
references. As long as the xsd files that define those formats come with
some human readable information that explains what the content of
records rendered according to those formats is, this makes perfect

If a data provider does not have the full-content in such a
well-structured format (or doesn't want to go through the hassle of
exposing the references separately), another party can indeed extract
references, as Donna points out.  Moreover, the latter party is then in
a position to expose those extracted references (amongst other to the
original data provider).  If it does so by maintaining the connection
with the original content from which the references were extracted (for
instance, by using the original oai-identifier as identifier for
harvesting the references) the door is open for some neat things to

Actually these ideas fit in well with the concept of using the OAI
protocol to establish an interoperable grid for a highly distributed
electronic scholarly communication system, which is a topic that I have
been speculating about at various conferences for some time now (see for
Unfortunately I haven't found the time to write these embryonic ideas

herbert van de sompel