[OAI-implementers] Sets in and subjects in OAI-PMH

Caroline Arms caar@loc.gov
Wed, 22 Oct 2003 09:52:49 -0400 (EDT)


On Tue, 21 Oct 2003, Hussein Suleman wrote:
> 
> p.s. is LCSH international? is it used in South Africa? Ethiopia?
> 

No. LCSH is not international.  It is used in many places, but the
policies and practices do not include an attempt to serve international
needs (to judge from discussions of the problem issues that required
policy decisions, when I had a detail to the part of LC that maintains
LCSH).  LCSH is also enormous.  I remember an analysis in the fairly early
days of OAI by Old Dominion that implied that LC didn't use controlled
vocabulary in its subject terms in OAI records, because of the number of
distinct terms in our records.

Back to sets in OAI.

I agree that discussion about what is useful for harvesters is valuable.
If there is good evidence that certain practices for sets work for
certain categories of service providers, data providers who care
about whether or how their stuff appears in those particular services
may take steps to use sets that aren't related to internal processes or
organization.  Absent any such evidence, you can hardly blame them for
doing what is easy.  And for many, there are no resources to do
anything more than a simple technical transformation from internal
records.

For what it's worth, the issue of how early implementer were using sets
was looked at in the technical discussions that led to the development of
version 2 of OAI-PMH.  

"Out of 49 repositories, 39 are using sets. Of these 13 appear to
partition their collection by subject area, 13 by genre, and 9 by source
of records."

See the table at the end of 
http://www.ukoln.ac.uk/distributed-systems/oai/collection-description/whitepaper.html
for more detail.  [Thanks to Andy Powell for being a pack-rat and still
having the file at the URL I found by mining my e-mail!]

It's not clear that anyone was using sets to provide fine-grained subject
breakdowns (which some repositories now are).  There may be communities of
interest where that is useful and feasible, but in the cultural heritage
area (personal papers, digitized, manuscripts, photographs, museum
artifacts, etc.) there is plenty of evidence that agreement on "topical"
terminology and rules for application across heterogeneous domains is
neither feasible, nor indeed helpful.

For the content in the LCOA1 repository, LC has more interest in creating
sets that are useful for Kat Hagedorn (OAIster) and Sara Shreeves (UIUC)
and other service providers that focus on building services that are
likely to serve people to whom our digitized historical content is of
interest.  My sense is that a set breakdown that suits those service
-providers (and this sort of content) would be very different from a
breakdown that is useful for current scientific scholarly communication.

Perhaps providers of production services can indicate if this list
is a useful forum for discussing what practices in relation to sets make
sense to them.  They may be having discussions within a community through
other lists or meetings.

   Caroline Arms                                      caar@loc.gov
   Library of Congress
   Office of Strategic Initiatives

**** All views expressed are personal *********