[OAI-implementers] hierarchical documents

Hussein Suleman hussein@cs.uct.ac.za
Mon, 12 May 2003 11:41:42 +0200


i haven't seen a "nice" solution to the problem of addressing hierarchy, 
views and versions all at once. but if we consider the problems 
independently, there are a few solutions that you can look at for ideas.

my favorite example of hierarchy-encoding is the IMS Content Packaging 
specification, which makes embedded use of the IMS Metadata 
Specification (see www.imsglobal.org for details). while the specs show 
how to encapsulate content hierarchically, alas i don't know of anyone 
actively exposing such packaged content through OAI-PMH - maybe someone 
else will comment on this.

for views (such as the slide formats in your example), you could use a 
metadata format that allowed for a list of simple objects. the CSTC 
project that i used to work on did this with its native metadata format. 
go to the following OAI record to see what we did:
(of course, here the issue is: does a list denote parts or options? if 
you want to be explicit, you should look at the IMS model.)
if you look at the same record in DC you will see that we used a "cover 
page" to bind together the various files to form a single resource.

lastly, for versions, the OAI has a provenance container that allows you 
to specify the relationship of a metadata record to an older version, 
from a harvesting perspective. while it doesn't directly apply to 
versions of the resource, the basic idea is similar ...

so, my take on your example is:
- the paper and slides could have separate identifiers as the resources 
are themselves different (or use cover pages to hold them together)
- you can use "relation" tags (in DC or IMS-MS) to connect them if they 
are separate
- instead of using metadata formats (which differentiate among views of 
the metadata) or sets (which differentiate among categorisations of the 
resource/metadata), you should use a metadata format with the inherent 
ability to differentiate among views of the same item

the book example is more complicated and i haven't seen a solution i 
like yet :) however, the principle i would apply is:
- expose the most useful granularity of data i.e. if a book is most 
often used as a single unit, expose it as such ... and similarly for 
separate chapters.

hope at least some of this makes sense :)


Jakob Voss wrote:
> Hi!
> I am relatively new to OAI and going to set up a simple Data Provider
> for different kind of publications. Many publications consist of
> different parts in different file formats and publication types but
> I do not know how to deal with them. For instance:
> Title: All about nothing
> Author: Mr. Smith
> Files:
>   Slides
>     PowerPoint Source: all-about-nothing.slides.ppt
>     PDF Output:        all-about-nothing.slides.pdf
>   Paper
>     OpenOffice Source: all-about-nothing.paper.ooo
>     RTF Exchange:      all-about-nothing.paper.rtf
>     PDF Output:        all-about-nothing.paper.pdf
> I thought about several possibilities:
> - All files are one document with several dc:indentifier
>   for each file (information about type and format is lost)
> - Each file is one document (much duplication)
> - Create sets for each publication and type (more sets than
>   publications)
> How do you store information about hierarchical
> document-relationship? For instance an article (with several
> versions, types and formats) can be part of a book that
> is part of a series and so on.
> Thanks for your comments!
> Jakob Voß
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers

hussein suleman ~ hussein@cs.uct.ac.za ~ http://www.husseinsspace.com