[OAI-implementers] Minor annoyance - what is the official name of an OAI site?

Simeon Warner simeon@cs.cornell.edu
Thu, 31 Oct 2002 09:55:45 -0500 (EST)


In the general case, the best identifier for a repository is the <baseURL>
(the <baseURL>, identifier, metadataPrefix and datestamp uniquely identify
a record). I would suggest that you use the baseURL internally and display
the <repositoryName> for human readability.

It is not mandatory that repositories support oai-identifier and it is
thus not mandatory that repositories have a <repositoryIdentifier>. I
think that, if anything, the OAI web site should make information about
repositoryIdenifiers less prominent to avoid this confusion.

In the case of AIM25, they appear to use the oai identifier scheme
(identifiers like oai:aim25.ac.uk:69) but fail to include the appropriate
description block in the Identify response. This is broken. (The
description block should declare aim25.ac.uk to be the
repositoryIdentifier).

The aggregator guidelines say:
"Agents which re-export harvested records should do so with different 
identifiers unless the metadata is unaltered and the original identifier 
corresponds to a recognized URI scheme. It is also recommended that all 
repositories re-exporting harvested records use the repeatable provenance 
containers to provide provenance information."

The case of AIM25 is unclear, it seems that one could decide that the
URI scheme is recognized (it starts with oai:) or not (looks like 
oai-identifier but missing the description).

Dedupping is going to be an interesting problem as the number of OAI
services that re-export metadata grows. Both the use of globally
resolvable identifier schemes and provenance containers will help address
problem.

At the moment, the situation is complicated by the presence of a number of
duplicate data providers for v1.1 and v2.0 listed on the OAI website. This
should be improved when v1.1 repositories are removed from the canonical
data-providers list (scheduled for 1 December 2002:
http://www.openarchives.org/news/oaiv2press020614.html).

I hope that the central registry of OAI sites will become less important
over time. I hope that the "friends" container
(http://www.openarchives.org/OAI/2.0/guidelines-friends.htm) will be used
to create a decentralized web allowing repository discovery.

Cheers,
Simeon



On Thu, 31 Oct 2002, Alan Kent wrote:
> Hi,
> 
> One minor annoyance (that I think I have reported before) is that the
> list of OAI data providers on the www.openarchives.org site does not
> list a 'id' for all bases in the XML document returned.
> 
> I then started thinking, what was the id attribute? Is it the repository
> name, repository identifier, or just some other id that people decided
> to type in when registering the repository? (I think its the latter)
> 
> So I thought, well OAI 2.0 has the <oaiIdentifier> description stuff now,
> so I can go to a site and work out its identifier. The problem is not
> everyone supports it. I just went to AIM25 for example (because
> it was alphabetically at the start of the list). It returns a
> repository name with spaces
> 
>     <repositoryName>AIM25 - Archives in London</repositoryName>
> 
> but no repository identifier. Doing a ListRecords showed me the
> repository identifier I think is aim25.ac.uk, but I guessed this
> as a human by looking at the first record that came back.
> Records in a repository should keep the identifier of the original
> record in the case of an aggregator, so this is not a reliable
> approach to use.
> 
> Is it mandatory that all repositories have a 'repository identifier'?
> 
> Is it mandatory that Identify for OAI 2.0 make the identifier available?
> 
> Should the list on the open archives web site be updated to make sure
> it has the correct repository identifiers for all sites?
> 
> I know I can go look up the spec, but I am trying to be provocative
> here and elicit responses like "no, but it should be" or "don't be silly".
> Do aggregators (who just get other people's data) have repository
> identifiers even if they don't have any of their "own" content?
> 
> I guess my bottom line is that I think the page on the open archives
> web site would be better if it included the official repository identifiers
> for each registered data provider. I can write a script to generate my
> own XML document (get all the URLs, do an Identify - if not good enough,
> do a ListRecords).
> 
> I guess I am also encouraging people to go to the effort of including
> the <oaiIdentifier> description in their OAI data provider implementations
> too.
> 
> Maybe its just me being pedantic. We have tried to automate the updating
> of our list of sites to harvest (for interop testing), but it keeps
> getting duplicates.
> 
> Alan
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers