[OAI-implementers] Handling digital duplicates of physical content

Xiaoming Liu liu_x at lanl.gov
Wed Aug 4 10:50:44 EDT 2004

On Wed, 4 Aug 2004, Sebastian Bossung wrote:

> Hi
> we have an application that stores metadata for physical documents. In
> most cases the documents are also digitized, but the original are not
> under our institution's control. I am wondering whether it makes sense
> to expose the metadata via OAI-PMH.
> The Problem I see is that there is danger of duplicate metadata
> records as other people might have the same idea of creating a
> meta-library and exposing it via OAI-PMH. However, we think that most

I believe the de-dup has to be done in service provider side. The
situation is similar to the Web, it's up to the search engine to figure
out various mirror sites or duplicate files.

The topic has been studied in web community, some references:

A. Broder, S. Glassman, M. Manasse, and G. Zweig. Syntactic clustering of
the web. In Proc. of the 6th Int'l World Wide Web Conf.(WWW), pages
391-404, 1997. http://decweb.ethz.ch/WWW6/Technical/Paper205/Paper205.html

N. Shivakumar and H. Garcia-Molina. Building a scalable and accurate
copy detection mechanism. In Proceedings of the First ACM International
Conference on Digital Libraries, 1996.


> of the content will not be digitized in the near future anywhere
> else.
> In this scenario does it make sense to implement OAI-PMH in our
> system? I am especially interested in any pointers to possible
> solutions of the duplicate-metadata-problem.
> Thanks
> Sebastian
> --
> Sebastian Bossung - sb at jbib.de - http://www.jbib.de/sb
> A banker is a fellow who lends you his umbrella when the sun
> is shining and wants it back the minute it begins to rain.
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://openarchives.org/mailman/listinfo/oai-implementers

More information about the OAI-implementers mailing list