From lming@vt.edu Mon Mar 8 17:47:04 2004
From: lming@vt.edu (Ming Luo)
Date: Mon, 08 Mar 2004 12:47:04 -0500
Subject: [OAI-implementers] upgrading machine hosting OAI Explorer
Message-ID: <404CB198.3020703@vt.edu>
Hi All.:
I'm upgrading the machine hosting Virginia Tech OAI Explorer.
Things may be a little bit messy in the next few days.
Will get back to you after the upgrading finish.
Thanks,
Ming Luo
From orient_lo@163.com Thu Mar 4 01:08:44 2004
From: orient_lo@163.com (=?GB2312?Q?=C2=DE=CA=B1=BB=D4?=)
Date: Thu, 4 Mar 2004 9:8:44 +0800
Subject: [OAI-implementers] resumptionToken cursor
Message-ID: <200403040108.i2418iC16494@nsdlib.nsdl.cornell.edu>
Why the value of cursor of resumptionToken is alway "0" in the first incomplete list response? It seems odd. In terms of its definition(a count of the number of elements of the complete list thus far returned ), it should be the number of records
returned in the first incomplete list response, because when the first resumptionToken returned,a certain number(i.e. 1000) of records was already returned. Right?
Very Respectfully,
Steve Luo
orient_lo@163.com
From tanderson@collegis.com Thu Mar 4 23:19:15 2004
From: tanderson@collegis.com (Thor Anderson)
Date: Thu, 4 Mar 2004 18:19:15 -0500
Subject: [OAI-implementers] Knowing you have harvested the whole collection
Message-ID: <74365BBB0B30774182643C124412FCB842CC1F@EXCHCLUS.collegis.com>
This is a multi-part message in MIME format.
------_=_NextPart_001_01C4023F.1C1B2161
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Greeting OAI-implementers,
=20
I was wondering if someone could tell me the most accurate way to ensure
my OAI harvesting program has harvested all of the records of a
repository. Is there any way to get some sort of collection or
repository metadata that holds a "total number of records" value? Or,
because oai_dc metadata records are the most common denominator (and
required for minimal OAI compliance?), can I assume that a request like
this:
http://services.nsdl.org:8080/nsdloai/OAI?verb=3DListRecords&metadataPref=
i
x=3Doai_dc
=20
will give me the most complete set of records possible (once no more
resumptionTokens are available)?
=20
TIA for any help. Hope this wasn't in a FAQ somewhere that I missed.
=20
Thor
=20
----------------------------------------
Thor Anderson, Ph.D.
Collegis, Inc.
tanderson@collegis.com
=20
=20
------_=_NextPart_001_01C4023F.1C1B2161
Content-Type: text/html;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Message
Greeting=20
OAI-implementers,
I was =
wondering if=20
someone could tell me the most accurate way to ensure my OAI harvesting =
program=20
has harvested all of the records of a repository. Is there any way =
to get=20
some sort of collection or repository metadata that holds a "total =
number of=20
records" value? Or, because oai_dc metadata records are the most =
common=20
denominator (and required for minimal OAI compliance?), can I assume =
that a=20
request like this: http://services.nsdl.org:8080/nsdloai/OAI?verb=3D=
ListRecords&metadataPrefix=3Doai_dc
will =
give me the=20
most complete set of records possible (once no more resumptionTokens are =
available)?
TIA =
for any=20
help. Hope this wasn't in a FAQ somewhere that I=20
missed.
=00
------_=_NextPart_001_01C4023F.1C1B2161--
From simeon@cs.cornell.edu Mon Mar 8 18:11:11 2004
From: simeon@cs.cornell.edu (Simeon Warner)
Date: Mon, 8 Mar 2004 13:11:11 -0500 (EST)
Subject: [OAI-implementers] ADMIN NOTE
Message-ID:
I'm afraid there has been a problem with the OAI-implementers mail
server which resulted in a number of messages being delayed. The
queued message should now have been sent out.
Cheers,
Simeon
From simeon@cs.cornell.edu Mon Mar 8 18:13:09 2004
From: simeon@cs.cornell.edu (Simeon Warner)
Date: Mon, 8 Mar 2004 13:13:09 -0500 (EST)
Subject: [OAI-implementers] well-known port
In-Reply-To: <5.2.0.9.0.20040209085933.025d61e0@popserv.ucop.edu>
References: <5.2.0.9.0.20040209085933.025d61e0@popserv.ucop.edu>
Message-ID:
Since OAI-PMH works over HTTP, port 80 is the norm. The baseURL can
include a port so anything else can be used. I don't see a need for
agreement on any particular port.
Cheers,
Simeon
On Mon, 9 Feb 2004, David Loy wrote:
> Question: has a well-known port been adopted for OAI and if not have people
> settled on a specific port (80?)
>
> Thanks
> David Loy
>
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
From simeon@cs.cornell.edu Mon Mar 8 18:38:55 2004
From: simeon@cs.cornell.edu (Simeon Warner)
Date: Mon, 8 Mar 2004 13:38:55 -0500 (EST)
Subject: [OAI-implementers] resumptionToken cursor
In-Reply-To: <200403040108.i2418iC16494@nsdlib.nsdl.cornell.edu>
References: <200403040108.i2418iC16494@nsdlib.nsdl.cornell.edu>
Message-ID:
On Thu, 4 Mar 2004, [GB2312] ÂÞʱ»Ô wrote:
> Why the value of cursor of resumptionToken is alway "0" in the first incomplete list response? It seems odd. In terms of its definition(a count of the number of elements of the complete list thus far returned ), it should be the number of records
> returned in the first incomplete list response, because when the first resumptionToken returned,a certain number(i.e. 1000) of records was already returned. Right?
See the example in:
http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm#FlowControl
The cursor is the number of records or headers returned up to the start of
the incomplete list response. Thus the first response always has cursor=0
if it is specified.
Cheers,
Simeon
> Very Respectfully,
>
> Steve Luo
> orient_lo@163.com
From caar@loc.gov Mon Mar 8 18:42:56 2004
From: caar@loc.gov (Caroline Arms)
Date: Mon, 8 Mar 2004 13:42:56 -0500 (EST)
Subject: [OAI-implementers] set description
In-Reply-To: <1063.160.36.192.134.1076705626.squirrel@kiva.lib.utk.edu>
Message-ID:
The Library of Congress is providing set descriptions, see
http://memory.loc.gov/cgi-bin/oai2_0?verb=ListSets
The descriptions use the oai_dc schema. They are essentially
collection-level records for the underlying collection of content EXCEPT
that I use "Records for " at the beginning of the title. When I wanted to
do this, I could not find another pattern; you should not look at the
practice you infer from them as authoritative in any way.
I would be interested in knowing who (if anyone) is finding them useful.
Caroline Arms caar@loc.gov
Office of Strategic Initiatives
On Fri, 13 Feb 2004, Jody DeRidder wrote:
> Is anyone out there using set descriptions (optional tag in header)?
> If so, would you please send us a link to your repository/service
> provider so we could see it applied?
>
> We'd be grateful...
>
> --jody (for Anthony Smith)
>
>
> --
> Jody DeRidder
> IT Administrator II
> Digital Library Center
> 648A John C. Hodges Library
> University of Tennessee
> Knoxville, TN 37996
>
> Phone: (865) 974-4796
> Email: deridder@aztec.lib.utk.edu
>
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>
>
From khage@umich.edu Mon Mar 8 18:59:55 2004
From: khage@umich.edu (Kat Hagedorn)
Date: Mon, 8 Mar 2004 13:59:55 -0500
Subject: [OAI-implementers] set description
In-Reply-To:
Message-ID:
Any further description of sets is useful for service providers. The
set name, being short, can't always completely describe the contents of
the set. More description helps us decide whether we want to harvest a
particular set.
- Kat
On Monday, Mar 8, 2004, at 13:42 America/Detroit, Caroline Arms wrote:
>
> The Library of Congress is providing set descriptions, see
> http://memory.loc.gov/cgi-bin/oai2_0?verb=ListSets
>
> The descriptions use the oai_dc schema. They are essentially
> collection-level records for the underlying collection of content
> EXCEPT
> that I use "Records for " at the beginning of the title. When I
> wanted to
> do this, I could not find another pattern; you should not look at the
> practice you infer from them as authoritative in any way.
>
> I would be interested in knowing who (if anyone) is finding them
> useful.
>
> Caroline Arms caar@loc.gov
> Office of Strategic Initiatives
>
> On Fri, 13 Feb 2004, Jody DeRidder wrote:
>
>> Is anyone out there using set descriptions (optional tag in header)?
>> If so, would you please send us a link to your repository/service
>> provider so we could see it applied?
>>
>> We'd be grateful...
>>
>> --jody (for Anthony Smith)
>>
>>
>> --
>> Jody DeRidder
>> IT Administrator II
>> Digital Library Center
>> 648A John C. Hodges Library
>> University of Tennessee
>> Knoxville, TN 37996
>>
>> Phone: (865) 974-4796
>> Email: deridder@aztec.lib.utk.edu
>>
>> _______________________________________________
>> OAI-implementers mailing list
>> List information, archives, preferences and to unsubscribe:
>> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>>
>>
>
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>
-------------------
Kat Hagedorn
OAIster/Metadata Harvesting Librarian
DLXS Bibliographic Class Coordinator
DLXS Text Class Co-coordinator
Digital Library Production Service
University of Michigan
http://www.oaister.org/
http://www.dlxs.org/
email: khage@umich.edu
phone: 734-615-7618
From liu_x@lanl.gov Mon Mar 8 19:57:25 2004
From: liu_x@lanl.gov (Xiaoming Liu)
Date: Mon, 8 Mar 2004 12:57:25 -0700 (MST)
Subject: [OAI-implementers] set description
In-Reply-To:
References:
Message-ID:
With one small program to survey data providers in OAI website and UIUC
registry, I am able to generate following baseURLs with set description.
http://129.252.51.52/OAI/WebOAI.aspx
http://alcme.oclc.org/ndltd/servlet/OAIHandler
http://csc000.cscaustria.at/oai/OAI.ASP
http://conferences.arts.usyd.edu.au/oai/
http://cgi.vtt.fi/progs/inf/OAI
http://dataprovider.ibict.br/mypoai/oai2.php
http://gita.grainger.uiuc.edu/registry/px/oai.asp
http://hbllmedia.lib.byu.edu/test/PhpOai2/oai/oai2.php
http://memory.loc.gov/cgi-bin/oai2_0
http://oai.lib.duke.edu:8081/smc/servlet/OAIHandler
http://oai.lib.msu.edu/oai/oai.cfm
http://infomine.ucr.edu/cgi-bin/OAI-PMH-server
http://ibiblio.org/oaibiblio/data/software/app/oai2.php
http://infsearch.cs.cmu.edu/cgi-bin/oai.pl
http://pkp.ubc.ca/harvester/oai/
http://rea.uninet.edu/ojs/oai/
http://publications.uu.se/portal/OAI
http://services.nsdl.org:8080/nsdloai/OAI
http://oai.ub.rub.de/oai/oai2.php
http://wo.uio.no/as/WebObjects/theses.woa/wa/oai
http://www.hcu.ox.ac.uk/ocs/oai/
http://www.aim25.ac.uk/cgi-bin/oai/OAI2.0
http://www.cis.unisa.edu.au/aiwsc03/ocs/ocs/oai/
http://www.entomotropica.org/ojs/oai/
http://www.husseinsspace.com/cgi-bin/VTOAI/hspics/hspics/oai.pl
http://www.math.washington.edu/~ejpecp/oai/
http://www.math.washington.edu/~ejpecp/ECP/oai/
http://www.pkp.ubc.ca/harvester/oai/
http://www.pubmedcentral.gov/oai/oai.cgi
regards,
Xiaoming
On Mon, 8 Mar 2004, Kat Hagedorn wrote:
> Any further description of sets is useful for service providers. The
> set name, being short, can't always completely describe the contents of
> the set. More description helps us decide whether we want to harvest a
> particular set.
>
> - Kat
>
> On Monday, Mar 8, 2004, at 13:42 America/Detroit, Caroline Arms wrote:
>
> >
> > The Library of Congress is providing set descriptions, see
> > http://memory.loc.gov/cgi-bin/oai2_0?verb=ListSets
> >
> > The descriptions use the oai_dc schema. They are essentially
> > collection-level records for the underlying collection of content
> > EXCEPT
> > that I use "Records for " at the beginning of the title. When I
> > wanted to
> > do this, I could not find another pattern; you should not look at the
> > practice you infer from them as authoritative in any way.
> >
> > I would be interested in knowing who (if anyone) is finding them
> > useful.
> >
> > Caroline Arms caar@loc.gov
> > Office of Strategic Initiatives
> >
> > On Fri, 13 Feb 2004, Jody DeRidder wrote:
> >
> >> Is anyone out there using set descriptions (optional tag in header)?
> >> If so, would you please send us a link to your repository/service
> >> provider so we could see it applied?
> >>
> >> We'd be grateful...
> >>
> >> --jody (for Anthony Smith)
> >>
> >>
> >> --
> >> Jody DeRidder
> >> IT Administrator II
> >> Digital Library Center
> >> 648A John C. Hodges Library
> >> University of Tennessee
> >> Knoxville, TN 37996
> >>
> >> Phone: (865) 974-4796
> >> Email: deridder@aztec.lib.utk.edu
> >>
> >> _______________________________________________
> >> OAI-implementers mailing list
> >> List information, archives, preferences and to unsubscribe:
> >> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> >>
> >>
> >
> > _______________________________________________
> > OAI-implementers mailing list
> > List information, archives, preferences and to unsubscribe:
> > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> >
> -------------------
> Kat Hagedorn
> OAIster/Metadata Harvesting Librarian
> DLXS Bibliographic Class Coordinator
> DLXS Text Class Co-coordinator
> Digital Library Production Service
> University of Michigan
>
> http://www.oaister.org/
> http://www.dlxs.org/
> email: khage@umich.edu
> phone: 734-615-7618
>
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>
>
From hussein@cs.uct.ac.za Tue Mar 9 05:10:26 2004
From: hussein@cs.uct.ac.za (Hussein Suleman)
Date: Tue, 09 Mar 2004 07:10:26 +0200
Subject: [OAI-implementers] set description
In-Reply-To:
References:
Message-ID: <404D51C2.60705@cs.uct.ac.za>
following up on Kat's comment, and seeing that my personal website is on
Xiaoming's listing ...
try
http://www.husseinsspace.com/cgi-bin/VTOAI/hspics/hspics/oai.pl?verb=ListSets
i have a description for each set in my source metadata so i encode that
in a simple DC record with only a description tag. the idea being that
set descriptions are meant to help people understand the contents of
sets (for the purposes of selective harvesting) so any
human-understandable information you can provide is useful while at the
same time there is probably no point in automatically generating
non-textual/descriptive fields such as date.
of course this is all arguable so feel free to vociferously disagree
with me about the date or the human-understanding bits :)
ttfn,
----hussein
Xiaoming Liu wrote:
> With one small program to survey data providers in OAI website and UIUC
> registry, I am able to generate following baseURLs with set description.
>
> http://129.252.51.52/OAI/WebOAI.aspx
> http://alcme.oclc.org/ndltd/servlet/OAIHandler
> http://csc000.cscaustria.at/oai/OAI.ASP
> http://conferences.arts.usyd.edu.au/oai/
> http://cgi.vtt.fi/progs/inf/OAI
> http://dataprovider.ibict.br/mypoai/oai2.php
> http://gita.grainger.uiuc.edu/registry/px/oai.asp
> http://hbllmedia.lib.byu.edu/test/PhpOai2/oai/oai2.php
> http://memory.loc.gov/cgi-bin/oai2_0
> http://oai.lib.duke.edu:8081/smc/servlet/OAIHandler
> http://oai.lib.msu.edu/oai/oai.cfm
> http://infomine.ucr.edu/cgi-bin/OAI-PMH-server
> http://ibiblio.org/oaibiblio/data/software/app/oai2.php
> http://infsearch.cs.cmu.edu/cgi-bin/oai.pl
> http://pkp.ubc.ca/harvester/oai/
> http://rea.uninet.edu/ojs/oai/
> http://publications.uu.se/portal/OAI
> http://services.nsdl.org:8080/nsdloai/OAI
> http://oai.ub.rub.de/oai/oai2.php
> http://wo.uio.no/as/WebObjects/theses.woa/wa/oai
> http://www.hcu.ox.ac.uk/ocs/oai/
> http://www.aim25.ac.uk/cgi-bin/oai/OAI2.0
> http://www.cis.unisa.edu.au/aiwsc03/ocs/ocs/oai/
> http://www.entomotropica.org/ojs/oai/
> http://www.husseinsspace.com/cgi-bin/VTOAI/hspics/hspics/oai.pl
> http://www.math.washington.edu/~ejpecp/oai/
> http://www.math.washington.edu/~ejpecp/ECP/oai/
> http://www.pkp.ubc.ca/harvester/oai/
> http://www.pubmedcentral.gov/oai/oai.cgi
>
> regards,
> Xiaoming
>
>
> On Mon, 8 Mar 2004, Kat Hagedorn wrote:
>
>
>>Any further description of sets is useful for service providers. The
>>set name, being short, can't always completely describe the contents of
>>the set. More description helps us decide whether we want to harvest a
>>particular set.
>>
>>- Kat
>>
>>On Monday, Mar 8, 2004, at 13:42 America/Detroit, Caroline Arms wrote:
>>
>>
>>>The Library of Congress is providing set descriptions, see
>>> http://memory.loc.gov/cgi-bin/oai2_0?verb=ListSets
>>>
>>>The descriptions use the oai_dc schema. They are essentially
>>>collection-level records for the underlying collection of content
>>>EXCEPT
>>>that I use "Records for " at the beginning of the title. When I
>>>wanted to
>>>do this, I could not find another pattern; you should not look at the
>>>practice you infer from them as authoritative in any way.
>>>
>>>I would be interested in knowing who (if anyone) is finding them
>>>useful.
>>>
>>> Caroline Arms caar@loc.gov
>>> Office of Strategic Initiatives
>>>
>>>On Fri, 13 Feb 2004, Jody DeRidder wrote:
>>>
>>>
>>>>Is anyone out there using set descriptions (optional tag in header)?
>>>> If so, would you please send us a link to your repository/service
>>>>provider so we could see it applied?
>>>>
>>>> We'd be grateful...
>>>>
>>>> --jody (for Anthony Smith)
>>>>
>>>>
>>>>--
>>>> Jody DeRidder
>>>> IT Administrator II
>>>> Digital Library Center
>>>> 648A John C. Hodges Library
>>>> University of Tennessee
>>>> Knoxville, TN 37996
>>>>
>>>> Phone: (865) 974-4796
>>>> Email: deridder@aztec.lib.utk.edu
>>>>
>>>>_______________________________________________
>>>>OAI-implementers mailing list
>>>>List information, archives, preferences and to unsubscribe:
>>>>http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>>>>
>>>>
>>>
>>>_______________________________________________
>>>OAI-implementers mailing list
>>>List information, archives, preferences and to unsubscribe:
>>>http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>>>
>>
>>-------------------
>>Kat Hagedorn
>>OAIster/Metadata Harvesting Librarian
>>DLXS Bibliographic Class Coordinator
>>DLXS Text Class Co-coordinator
>>Digital Library Production Service
>>University of Michigan
>>
>>http://www.oaister.org/
>>http://www.dlxs.org/
>>email: khage@umich.edu
>>phone: 734-615-7618
>>
>>_______________________________________________
>>OAI-implementers mailing list
>>List information, archives, preferences and to unsubscribe:
>>http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>>
>>
>
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>
--
=====================================================================
hussein suleman ~ hussein@cs.uct.ac.za ~ http://www.husseinsspace.com
=====================================================================
From tajoli@cilea.it Tue Mar 9 09:24:28 2004
From: tajoli@cilea.it (Zeno Tajoli)
Date: Tue, 09 Mar 2004 10:24:28 +0100
Subject: [OAI-implementers] Knowing you have harvested the whole collection
Message-ID: <6.0.3.0.0.20040309102420.02488510@mail.cilea.it>
Hi,
>I was wondering if someone could tell me the most accurate way to ensure
>my OAI harvesting program has harvested all of the records of a
>repository. Is there any way to get some sort of collection or repository
>metadata that holds a "total number of records" value?
as I know there isn't a specific instruction.
>Or, because oai_dc metadata records are the most common denominator (and
>required for minimal OAI compliance?), can I assume that a request like
>this:
>http://services.nsdl.org:8080/nsdloai/OAI?verb=ListRecords&metadataPrefix=oai_dc
> will give me the most complete set of records possible (once no more
> resumptionTokens are available)?
In my opinion yes.
Bye
Zeno Tajoli
CILEA - Segrate (MI)
tajoliAT_SPAM_no_prendiATcilea.it
(Indirizzo mascherato anti-spam; sostituisci quanto tra AT con @)
From deridder@aztec.lib.utk.edu Tue Mar 9 20:25:44 2004
From: deridder@aztec.lib.utk.edu (Jody DeRidder)
Date: Tue, 9 Mar 2004 15:25:44 -0500 (EST)
Subject: [OAI-implementers] set description
In-Reply-To: <404D51C2.60705@cs.uct.ac.za>
References:
<404D51C2.60705@cs.uct.ac.za>
Message-ID: <1640.160.36.192.134.1078863944.squirrel@kiva.lib.utk.edu>
Thank you all for your help!!
Xiaoming, we are checking out all those you listed-- and
Caroline, we especially love what you are doing. We want to be
able to harvest sets based on subject content, and then make
those available to (first) our subject librarians. Eventually,
we hope to set up a portal for use by students and researchers
that will provide access to different collections (sets) based on
general topical areas.
I think I will bring this up at the next DLF developer's forum
(as well as my hopes for authority control on subjects) where the
topic is to be "improving the harvestability of digital content
and metadata".
Thanks again!
--jody
--
Jody DeRidder
IT Administrator II
Digital Library Center
648A John C. Hodges Library
University of Tennessee
Knoxville, TN 37996
Phone: (865) 974-4796
Email: deridder@aztec.lib.utk.edu
From sshreeve@uiuc.edu Tue Mar 9 20:43:32 2004
From: sshreeve@uiuc.edu (Sarah L. Shreeves)
Date: Tue, 09 Mar 2004 14:43:32 -0600
Subject: [OAI-implementers] set description
In-Reply-To: <1640.160.36.192.134.1078863944.squirrel@kiva.lib.utk.edu>
References:
<404D51C2.60705@cs.uct.ac.za>
<1640.160.36.192.134.1078863944.squirrel@kiva.lib.utk.edu>
Message-ID: <6.0.1.1.2.20040309143732.02685040@express.cites.uiuc.edu>
This has been mentioned before, but it might be useful to take a look at
what the DC Collection Description Working Group is doing around collection
description. I think that this work could have great applicability for set
descriptions. See http://dublincore.org/groups/collections/.
Sarah
-----------------------------------------------------------------------------------------------
Sarah L. Shreeves
Visiting Project Coordinator, IMLS Digital Collections and Content
University of Illinois Library at Urbana-Champaign
Phone: 217-244-7809
Fax: 217-244-7764
Email: sshreeve@uiuc.edu
Web: http://imlsdcc.grainger.uiuc.edu
At 02:25 PM 3/9/2004, Jody DeRidder wrote:
>Thank you all for your help!!
> Xiaoming, we are checking out all those you listed-- and
>Caroline, we especially love what you are doing. We want to be
>able to harvest sets based on subject content, and then make
>those available to (first) our subject librarians. Eventually,
>we hope to set up a portal for use by students and researchers
>that will provide access to different collections (sets) based on
>general topical areas.
> I think I will bring this up at the next DLF developer's forum
>(as well as my hopes for authority control on subjects) where the
>topic is to be "improving the harvestability of digital content
>and metadata".
>
> Thanks again!
>
> --jody
>
>
>--
> Jody DeRidder
> IT Administrator II
> Digital Library Center
> 648A John C. Hodges Library
> University of Tennessee
> Knoxville, TN 37996
>
> Phone: (865) 974-4796
> Email: deridder@aztec.lib.utk.edu
>
>_______________________________________________
>OAI-implementers mailing list
>List information, archives, preferences and to unsubscribe:
>http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
From khage@umich.edu Wed Mar 10 20:16:43 2004
From: khage@umich.edu (Kat Hagedorn)
Date: Wed, 10 Mar 2004 15:16:43 -0500
Subject: [OAI-implementers] static repository status
Message-ID:
Hello,
Could someone tell me the status of the OAI Static Repository? I get a
couple requests each week from people who don't have the resources to
become full-fledged data providers. I would like to point them to a
place where they can drop their XML files, but I gather this doesn't
officially exist yet.
Thanks,
- Kat
-------------------
Kat Hagedorn
OAIster/Metadata Harvesting Librarian
DLXS Bibliographic Class Coordinator
DLXS Text Class Co-coordinator
Digital Library Production Service
University of Michigan
http://www.oaister.org/
http://www.dlxs.org/
email: khage@umich.edu
phone: 734-615-7618
From herbertv@lanl.gov Wed Mar 10 20:28:19 2004
From: herbertv@lanl.gov (herbert van de sompel)
Date: Wed, 10 Mar 2004 13:28:19 -0700
Subject: [OAI-implementers] static repository status
In-Reply-To:
References:
Message-ID: <404F7A63.60706@lanl.gov>
Kat Hagedorn wrote:
> Hello,
>
> Could someone tell me the status of the OAI Static Repository? I get a
> couple requests each week from people who don't have the resources to
> become full-fledged data providers. I would like to point them to a
> place where they can drop their XML files, but I gather this doesn't
> officially exist yet.
>
We are finalizing the Static Repository specification at this very moment.
Feedback from the NSDL community suggested the inclusion of a mechanism to
"unregister" a Static Repository from a Static Repository Gateway. We are
looking into accomodating that need.
When it comes to a "place to drop" XML files (we refer to that spot as a Static
Repository Gateway):
(1) There is LANL-created software available to create such spots (see
http://srepod.sourceforge.net/ )
(2) LANL operates a demo version of such a spot (see
http://libtest.lanl.gov/cgi-bin/gateway.cgi ). However, this is for demo
purposes only; we have no intention of running a production version.
I invite you to install and operate our Static Repository Gateway software at
OAIster.
many greetings
herbert
> Thanks,
> - Kat
>
> -------------------
> Kat Hagedorn
> OAIster/Metadata Harvesting Librarian
> DLXS Bibliographic Class Coordinator
> DLXS Text Class Co-coordinator
> Digital Library Production Service
> University of Michigan
>
> http://www.oaister.org/
> http://www.dlxs.org/
> email: khage@umich.edu
> phone: 734-615-7618
>
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>
>
--
Herbert Van de Sompel
digital library research & prototyping
Los Alamos National Laboratory - Research Library
+ 1 (505) 667 1267 / http://lib-www.lanl.gov/~herbertv/
"met gestreken jeans de dansvloer penetreren"
From thabing@uiuc.edu Wed Mar 10 21:46:30 2004
From: thabing@uiuc.edu (Thomas G. Habing)
Date: Wed, 10 Mar 2004 15:46:30 -0600
Subject: [OAI-implementers] static repository status
In-Reply-To:
References:
Message-ID: <404F8CB6.3090508@uiuc.edu>
Kat Hagedorn wrote:
> Hello,
>
> Could someone tell me the status of the OAI Static Repository? I get a
> couple requests each week from people who don't have the resources to
> become full-fledged data providers. I would like to point them to a
> place where they can drop their XML files, but I gather this doesn't
> officially exist yet.
>
> Thanks,
> - Kat
>
> -------------------
> Kat Hagedorn
> OAIster/Metadata Harvesting Librarian
> DLXS Bibliographic Class Coordinator
> DLXS Text Class Co-coordinator
> Digital Library Production Service
> University of Michigan
>
> http://www.oaister.org/
> http://www.dlxs.org/
> email: khage@umich.edu
> phone: 734-615-7618
>
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>
>
Hi Kat,
We also have an alpha/beta level implementation of an OAI Static Repository
Gateway which is available on SourceForge:
http://uilib-oai.sourceforge.net/
http://sourceforge.net/project/showfiles.php?group_id=47963&package_id=85826
It is implemented as an IIS Active Server Page (ASP) script.
Currently it is being used to gateway one repository:
http://imlsdcc.grainger.uiuc.edu/gateway/oai.asp/www.acnatsci.org/library/collections/imls/nlg/AcadNatSciStatic.xml?verb=Identify,
However, we would be willing to act as a gateway for other collections. We
are not currently responding to the 'initiate' request (Section 3.3 of the
spec.), so if you have a collection you would like to be added, you should
send me an email with the URL to the static XML file, and I will make sure
it validates and add it to the gateway if it does.
This is still an experimental implementation, so the usual caveats apply,
but we do intend on keeping it running for the foreseeable future, probably
with some downtime to introduce changes or fixes as the spec evolves.
--
Thomas Habing
Research Programmer, Digital Library Projects
University of Illinois at Urbana-Champaign
155 Grainger Engineering Library Information Center, MC-274
thabing@uiuc.edu, (217) 244-4425
http://dli.grainger.uiuc.edu
From chrish@athabascau.ca Wed Mar 10 22:58:38 2004
From: chrish@athabascau.ca (Chris Hubick)
Date: Wed, 10 Mar 2004 15:58:38 -0700
Subject: [OAI-implementers] OAI-PMH + IEEE LTSC LOM
Message-ID: <1078959517.21846.52.camel@edtech25.edtech.athabascau.ca>
Hi.
I have a preliminary implementation of OAI-PMH around our native IEEE
LTSC LOM repository:
http://adlibx.athabascau.ca/ADLib/OAI/?verb=Identify
I provide both the required Dublin Core XML, as well as IEEE LOM XML:
http://adlibx.athabascau.ca/ADLib/OAI/?verb=ListMetadataFormats
You can retrieve Erik Duval's sample LOM record at:
DC: http://adlibx.athabascau.ca/ADLib/OAI/?verb=GetRecord&metadataPrefix=oai_dc&identifier=urn:x-ims-plirid-v0:urn%3Ax-ims-plirid-v0%3Aadlib.athabascau.ca%3Asys%3Aadlibx%3A1
LOM: http://adlibx.athabascau.ca/ADLib/OAI/?verb=GetRecord&metadataPrefix=lom&identifier=urn:x-ims-plirid-v0:urn%3Ax-ims-plirid-v0%3Aadlib.athabascau.ca%3Asys%3Aadlibx%3A1
All responses validate against the appropriate schema's, and the
repository passes all the tests on Virginia Tech's OAI Repository
Explorer.
I didn't know of any other OAI+LOM implementations to compare against,
so I hope my implementation is sane (?).
--
FYI:
The human oriented interface to our repository is at:
Athabasca Digital Library: http://adlib.athabascau.ca/
The implementation is written in Java, and provided as Free Software
under the LGPL. This work is part of a larger project which includes
Java interfaces for a representing a LOM record, and JAXB Marshallers
for serializing to LOM, DC, and OAI XML. There is also a Java interface
for a whole Repository, and service implementations (GUI, SOAP, OAI,
HTTP, RSS, etc) built around that. It's a work in progress and
documentation is currently minimal. You can find details at:
http://adlib.athabascau.ca/~hubick/
Feedback/Comments appreciated. Thanks.
--
Chris Hubick
mailto:chrish@athabascau.ca
mailto:chris@hubick.com
phone:1-780-421-2533 (work)
phone:1-780-721-9932 (cell)
http://www.hubick.com/
__
This communication is intended for the use of the recipient to whom it
is addressed, and may contain confidential, personal, and or privileged
information. Please contact us immediately if you are not the intended
recipient of this communication, and do not copy, distribute, or take
action relying on it. Any communications received in error, or
subsequent reply, should be deleted or destroyed.
---
From hussein@cs.uct.ac.za Thu Mar 11 05:52:44 2004
From: hussein@cs.uct.ac.za (Hussein Suleman)
Date: Thu, 11 Mar 2004 07:52:44 +0200
Subject: [OAI-implementers] OAI-PMH + IEEE LTSC LOM
In-Reply-To: <1078959517.21846.52.camel@edtech25.edtech.athabascau.ca>
References: <1078959517.21846.52.camel@edtech25.edtech.athabascau.ca>
Message-ID: <404FFEAC.7020706@cs.uct.ac.za>
hi
for not exactly LOM, but the IMS metadata set (LOM + minor
modifications) ... check the CSTC archive with baseURL
http://www.cstc.org/cgi-bin/OAI/CSTC.pl
its the older v1.1 PMH, but the XML encoding of records and IMS standard
have not changed since then (i think). this mapping was set up and
tested with the iLumina project, that uses the IMS metadata set internally.
for an example record, try:
http://www.cstc.org/cgi-bin/OAI/CSTC.pl?verb=GetRecord&metadataPrefix=ims1_2_1&identifier=oai:CSTC:60
in general, to find other implementations for a metadata standard, you
can also use the UIUC registry (which is apparently not linked into the
OAI website yet). if you go to:
http://gita.grainger.uiuc.edu/registry/ListSchemas.asp
you will be able to find all archives that support a particular metadata
format.
ttfn,
----hussein
Chris Hubick wrote:
> Hi.
>
> I have a preliminary implementation of OAI-PMH around our native IEEE
> LTSC LOM repository:
> http://adlibx.athabascau.ca/ADLib/OAI/?verb=Identify
>
> I provide both the required Dublin Core XML, as well as IEEE LOM XML:
> http://adlibx.athabascau.ca/ADLib/OAI/?verb=ListMetadataFormats
>
> You can retrieve Erik Duval's sample LOM record at:
> DC: http://adlibx.athabascau.ca/ADLib/OAI/?verb=GetRecord&metadataPrefix=oai_dc&identifier=urn:x-ims-plirid-v0:urn%3Ax-ims-plirid-v0%3Aadlib.athabascau.ca%3Asys%3Aadlibx%3A1
> LOM: http://adlibx.athabascau.ca/ADLib/OAI/?verb=GetRecord&metadataPrefix=lom&identifier=urn:x-ims-plirid-v0:urn%3Ax-ims-plirid-v0%3Aadlib.athabascau.ca%3Asys%3Aadlibx%3A1
>
> All responses validate against the appropriate schema's, and the
> repository passes all the tests on Virginia Tech's OAI Repository
> Explorer.
>
> I didn't know of any other OAI+LOM implementations to compare against,
> so I hope my implementation is sane (?).
>
> --
> FYI:
>
> The human oriented interface to our repository is at:
> Athabasca Digital Library: http://adlib.athabascau.ca/
>
> The implementation is written in Java, and provided as Free Software
> under the LGPL. This work is part of a larger project which includes
> Java interfaces for a representing a LOM record, and JAXB Marshallers
> for serializing to LOM, DC, and OAI XML. There is also a Java interface
> for a whole Repository, and service implementations (GUI, SOAP, OAI,
> HTTP, RSS, etc) built around that. It's a work in progress and
> documentation is currently minimal. You can find details at:
> http://adlib.athabascau.ca/~hubick/
>
> Feedback/Comments appreciated. Thanks.
>
--
=====================================================================
hussein suleman ~ hussein@cs.uct.ac.za ~ http://www.husseinsspace.com
=====================================================================
From khage@umich.edu Thu Mar 11 14:49:56 2004
From: khage@umich.edu (Kat Hagedorn)
Date: Thu, 11 Mar 2004 09:49:56 -0500
Subject: [OAI-implementers] static repository status
In-Reply-To: <404F8CB6.3090508@uiuc.edu>
Message-ID: <5CC61CAF-736B-11D8-B59D-0003934CA344@umich.edu>
Thank you for the information, Herbert and Tom. The idea of
implementing a Static Repository Gateway at UM is a great idea. We'll
bring it up in discussions here soon.
- Kat
On Wednesday, Mar 10, 2004, at 16:46 America/Detroit, Thomas G. Habing
wrote:
> Kat Hagedorn wrote:
>
>> Hello,
>> Could someone tell me the status of the OAI Static Repository? I get
>> a couple requests each week from people who don't have the resources
>> to become full-fledged data providers. I would like to point them to
>> a place where they can drop their XML files, but I gather this
>> doesn't officially exist yet.
>> Thanks,
>> - Kat
>> -------------------
>> Kat Hagedorn
>> OAIster/Metadata Harvesting Librarian
>> DLXS Bibliographic Class Coordinator
>> DLXS Text Class Co-coordinator
>> Digital Library Production Service
>> University of Michigan
>> http://www.oaister.org/
>> http://www.dlxs.org/
>> email: khage@umich.edu
>> phone: 734-615-7618
>> _______________________________________________
>> OAI-implementers mailing list
>> List information, archives, preferences and to unsubscribe:
>> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> Hi Kat,
>
> We also have an alpha/beta level implementation of an OAI Static
> Repository Gateway which is available on SourceForge:
>
> http://uilib-oai.sourceforge.net/
> http://sourceforge.net/project/
> showfiles.php?group_id=47963&package_id=85826
>
> It is implemented as an IIS Active Server Page (ASP) script.
>
> Currently it is being used to gateway one repository:
>
> http://imlsdcc.grainger.uiuc.edu/gateway/oai.asp/www.acnatsci.org/
> library/collections/imls/nlg/AcadNatSciStatic.xml?verb=Identify,
>
> However, we would be willing to act as a gateway for other
> collections. We are not currently responding to the 'initiate'
> request (Section 3.3 of the spec.), so if you have a collection you
> would like to be added, you should send me an email with the URL to
> the static XML file, and I will make sure it validates and add it to
> the gateway if it does.
>
> This is still an experimental implementation, so the usual caveats
> apply, but we do intend on keeping it running for the foreseeable
> future, probably with some downtime to introduce changes or fixes as
> the spec evolves.
>
> --
> Thomas Habing
> Research Programmer, Digital Library Projects
> University of Illinois at Urbana-Champaign
> 155 Grainger Engineering Library Information Center, MC-274
> thabing@uiuc.edu, (217) 244-4425
> http://dli.grainger.uiuc.edu
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
From philb@icbl.hw.ac.uk Thu Mar 11 17:58:31 2004
From: philb@icbl.hw.ac.uk (Phil Barker)
Date: Thu, 11 Mar 2004 17:58:31 +0000
Subject: [OAI-implementers] OAI-PMH + IEEE LTSC LOM
In-Reply-To: <1078959517.21846.52.camel@edtech25.edtech.athabascau.ca>
References: <1078959517.21846.52.camel@edtech25.edtech.athabascau.ca>
Message-ID: <4050A8C7.2080802@icbl.hw.ac.uk>
Chris Hubick wrote:
> I didn't know of any other OAI+LOM implementations to compare against,
> so I hope my implementation is sane (?).
>
There's a fair amount of work going on the UK related to this, it goes
under the catchy title of the RDN/LTSN Interoperability Project,
http://www.ltsn.ac.uk/genericcentre/interop/ . I don't think anyone has yet
gone public with a harvester, but I know folk are working on them and I've
forwarded your message to them.
An article about a loosely related earlier implementation can be found at
http://www.ariadne.ac.uk/issue34/powell/intro.html , and there'll be
another in the next issue of Ariadne.
Phil
--
Phil Barker Learning Technology Adviser
ICBL, School of Mathematical and Computer Sciences
Mountbatten Building, Heriot-Watt University,
Edinburgh, EH14 4AS
Tel: 0131 451 3278 Fax: 0131 451 3327
Web: http://www.icbl.hw.ac.uk/~philb/
From chrish@athabascau.ca Thu Mar 11 19:05:31 2004
From: chrish@athabascau.ca (Chris Hubick)
Date: Thu, 11 Mar 2004 12:05:31 -0700
Subject: [OAI-implementers] Identifiers [was: Re: OAI-PMH + IEEE LTSC LOM]
In-Reply-To: <404FFEAC.7020706@cs.uct.ac.za>
References: <1078959517.21846.52.camel@edtech25.edtech.athabascau.ca>
<404FFEAC.7020706@cs.uct.ac.za>
Message-ID: <1079031930.22737.56.camel@edtech25.edtech.athabascau.ca>
On Wed, 2004-03-10 at 22:52, Hussein Suleman wrote:
> for not exactly LOM, but the IMS metadata set (LOM + minor
> modifications) ... check the CSTC archive with baseURL
> http://www.cstc.org/cgi-bin/OAI/CSTC.pl
Hi again :)
This, and an email Kat Hagedorn sent me off list (hi Kat), reminds me of
a big question about identifiers...
As you may know, LOM identifiers are catalog/entry *pairs*. That is to
say, the entry is namespaced by it's catalog - the repository could
conceivably have two different records with the same identifier entry in
different catalogs. However, OAI and Dublin Core, and RSS, etc, use a
*single* string as an identifier.
In a repository that harvests from a number of different systems through
a variety of protocols, and has identifiers from many catalog types (not
necessarily URI's)...
How does one map an arbitrary catalog/entry *pair*, to a *single*
identifier string?
My answer was to use a URN:
'urn:' + + ':' +
Note:
1) Identifiers used in OAI messages must be URI's.
2) In the OAI Identifier format ('oai:'), the namespace ID must be a
domain name. The repository Hussein linked violates this.
3) The LOM/RDF stuff seems to expect all people to use 'URI' as a
catalog in their LOM data (?).
My runner up was a Universal Name:
'{' + + '}' +
That notation was invented by James Clark
http://www.jclark.com/xml/xmlns.htm ), but the URI req killed that idea.
Ideally, the LOM to Dublin Core mapping in Appendix B of the IEEE LTSC
LOM spec would have set up a practice this, but alas, it does not.
Has anyone else tackled this problem?
Thanks.
--
Chris Hubick
mailto:chrish@athabascau.ca
mailto:chris@hubick.com
phone:1-780-421-2533 (work)
phone:1-780-721-9932 (cell)
http://www.hubick.com/
__
This communication is intended for the use of the recipient to whom it
is addressed, and may contain confidential, personal, and or privileged
information. Please contact us immediately if you are not the intended
recipient of this communication, and do not copy, distribute, or take
action relying on it. Any communications received in error, or
subsequent reply, should be deleted or destroyed.
---
From a.powell@ukoln.ac.uk Thu Mar 11 23:58:03 2004
From: a.powell@ukoln.ac.uk (Andy Powell)
Date: Thu, 11 Mar 2004 23:58:03 +0000 (GMT Standard Time)
Subject: [OAI-implementers] Identifiers [was: Re: OAI-PMH + IEEE LTSC
LOM]
In-Reply-To: <1079031930.22737.56.camel@edtech25.edtech.athabascau.ca>
References: <1078959517.21846.52.camel@edtech25.edtech.athabascau.ca>
<404FFEAC.7020706@cs.uct.ac.za> <1079031930.22737.56.camel@edtech25.edtech.athabascau.ca>
Message-ID:
On Thu, 11 Mar 2004, Chris Hubick wrote:
> In a repository that harvests from a number of different systems through
> a variety of protocols, and has identifiers from many catalog types (not
> necessarily URI's)...
>
> How does one map an arbitrary catalog/entry *pair*, to a *single*
> identifier string?
>
> My answer was to use a URN:
>
> 'urn:' + + ':' +
One problem with this approach is that there is presumably very little
consistency across services in the way that 'catalog' is assigned - i.e.
the 'catalog' is not taken from a controlled vocabulary. So although you
end up with a single single string identifier (the URN) you don't really
have a mechanism for reliably comparing URNs from different sources.
It seems to me that the 'catalog'/'entry' pairing in LOM is a bit broken
- because it really requires a global registry of 'catalog' names to work
properly. (At least, without a global registry I can have no way of
knowing if your 'catalog' is the same as my 'catalog'). URIs already
provide a global space within which new identifier schemes can be created
- why not use it, rather than building a LOM-specific registry.
In partricular, the proposed 'info' URI scheme
http://info-uri.info/registry/docs/misc/faq.html
provides an open mechanisn for assigning URIs to information assets that
have identifiers in public namespaces but have no representation within
URI space.
> Has anyone else tackled this problem?
Not really, but you might be interested in
Guidelines for encoding identifiers in Dublin Core and IEEE LOM metadata
http://www.ukoln.ac.uk/metadata/dcmi-ieee/identifiers/
which basically suggests that URIs should *always* be used.
Andy
--
Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK
http://www.ukoln.ac.uk/ukoln/staff/a.powell +44 1225 383933
Resource Discovery Network http://www.rdn.ac.uk/
From chrish@athabascau.ca Fri Mar 12 00:40:19 2004
From: chrish@athabascau.ca (Chris Hubick)
Date: Thu, 11 Mar 2004 17:40:19 -0700
Subject: [OAI-implementers] Re: Identifiers
In-Reply-To:
References: <1078959517.21846.52.camel@edtech25.edtech.athabascau.ca>
<404FFEAC.7020706@cs.uct.ac.za>
<1079031930.22737.56.camel@edtech25.edtech.athabascau.ca>
Message-ID: <1079052018.22737.286.camel@edtech25.edtech.athabascau.ca>
On Thu, 2004-03-11 at 16:58, Andy Powell wrote:
> On Thu, 11 Mar 2004, Chris Hubick wrote:
>
> > In a repository that harvests from a number of different systems through
> > a variety of protocols, and has identifiers from many catalog types (not
> > necessarily URI's)...
> >
> > How does one map an arbitrary catalog/entry *pair*, to a *single*
> > identifier string?
> >
> > My answer was to use a URN:
> >
> > 'urn:' + + ':' +
>
> It seems to me that the 'catalog'/'entry' pairing in LOM is a bit broken
> - because it really requires a global registry of 'catalog' names to work
> properly. (At least, without a global registry I can have no way of
> knowing if your 'catalog' is the same as my 'catalog'). URIs already
> provide a global space within which new identifier schemes can be created
> - why not use it, rather than building a LOM-specific registry.
First, can we assume, for the sake of my problem discussion, that all
those who create some new identifier format do in fact manage to choose
a truly unique catalog name, just as if there were in fact a registry
(that's a tangental discussion :).
> In partricular, the proposed 'info' URI scheme
>
> http://info-uri.info/registry/docs/misc/faq.html
>
> provides an open mechanisn for assigning URIs to information assets that
> have identifiers in public namespaces but have no representation within
> URI space.
Oooh, that's new, thanks for that :)
Ok, so, if I use that, then the algorithm would be:
IF (LOM.Identifier.Catalog == 'URI')
THEN export LOM.Identifier.Entry unmodified
ELSE
export info URI as:
'info:' + + '/' +
I will read more about these info URI's to see if that's a valid use
(?).
> > Has anyone else tackled this problem?
>
> Not really, but you might be interested in
>
> Guidelines for encoding identifiers in Dublin Core and IEEE LOM metadata
> http://www.ukoln.ac.uk/metadata/dcmi-ieee/identifiers/
>
> which basically suggests that URIs should *always* be used.
Hrm, that's interesting too, thanks. Though it's basically the reverse
problem.
One thing that does bring to light is using 'URI' as your LOM Catalog
whenever your entries are in URI format. I have been using our URN's
Namespace Identifier (NID) as our Catalog, when I perhaps should be
using 'URI' instead (or the more specific 'URN')?
Thanks!
--
Chris Hubick
mailto:chrish@athabascau.ca
mailto:chris@hubick.com
phone:1-780-421-2533 (work)
phone:1-780-721-9932 (cell)
http://www.hubick.com/
__
This communication is intended for the use of the recipient to whom it
is addressed, and may contain confidential, personal, and or privileged
information. Please contact us immediately if you are not the intended
recipient of this communication, and do not copy, distribute, or take
action relying on it. Any communications received in error, or
subsequent reply, should be deleted or destroyed.
---
From adam.cooper@fdlearning.com"
[relates to OAI-implementers digest, Vol 1 #426 - 5 msgs]
A number of IMS specs have played with this 2 part/1part problem. RDCEO
(Reusable Definition of Competences and Educational Objectives) talks about
URN scheme, and I think we had assumed that the catalog would map to the
NSS and the entry to the NID. We also considered URL#fragment identifier ,
where the fragment identifier was the entry part. The RDCEO binding
actually used a single identifier string, whereas the information model DID
follow LOM practice.
Exactly what the significance of LOM catalog might be is probably another
question, and one that is intentionally open I think. Is it necessarily
more than an _indicator_ of the creator of the identifier?
Adam
From a.powell@ukoln.ac.uk Wed Mar 17 13:28:29 2004
From: a.powell@ukoln.ac.uk (Andy Powell)
Date: Wed, 17 Mar 2004 13:28:29 +0000 (GMT Standard Time)
Subject: [OAI-implementers] Automatically gathering the full-text of eprints
Message-ID:
The JISC-funded ePrints UK project has a requirement to automatically
harvest both metadata and full-text from the eprint archives within UK
academia (and potentially elsewhere). This is so that we can pass both
metadata and full-text to the various 'enhancement' Web services offered
by our partners.
http://www.rdn.ac.uk/projects/eprints-uk/
In order for our harvesting robot to be able to do this, it must be able
to reliably (and automatically) determine the correct URL(s) for the
various full-text manifestation(s) (HTML, PDF, RTF, etc.) of each eprint.
Our "Using simple Dublin Core to describe eprints" guidelines are intended
to encourage greater consistency in the metadata that is exposed by eprint
archives using the 'oai_dc' format within the OAI Protocol for Metadata
Harvesting (OAI-PMH). Somewhat perversely, because we stick rigidly to
the semantics of the DC element set, our guidelines make determining the
URL of each manifestation that is available quite difficult. (This is
largely a consequence of the 'simple' nature of 'simple DC'!). In
general, the URL in the element of the oai_dc record is
the URL of a jump-off page, rather than a direct link to the full-text.
We would like to suggest a new proposal for unambiguously embedding the
URL for each manifestation of an eprint into the (X)HTML jump-off page for
that eprint. Since the jump-off page is generated automatically by the
eprint archive software, doing this shouldn't be too difficult (in fact,
we would hope that archive software, such as eprints.org, will be
configured to do this out of the box).
If this proposal is adopted, it will make it much easier to write OAI
service provider software that can reliably gather the full-text of an
eprint, given only the oai_dc record for that eprint.
The proposal is at
http://www.rdn.ac.uk/projects/eprints-uk/docs/encoding-fulltext-links/
Comments are welcome,
Andy
--
Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK
http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933
Resource Discovery Network http://www.rdn.ac.uk/
ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/
From herbertv@lanl.gov Wed Mar 17 19:19:39 2004
From: herbertv@lanl.gov (herbert van de sompel)
Date: Wed, 17 Mar 2004 12:19:39 -0700
Subject: [OAI-implementers] Automatically gathering the full-text of eprints
In-Reply-To:
References:
Message-ID: <4058A4CB.6040200@lanl.gov>
Dear Andy,
The problem of service providers needing access to content in addition to
metadata has come up in many discussions, lately, including in the realm of the
DARE, DINI, JISC, DSpace, Fedora, etc work. It so happens that my team in Los
Alamos has recently done quite some work in this realm, as is illustrated by the
most recent papers listed on my personal web site.
Here is some initial feedback to the proposal. The proposal relies on:
a. The assumption that a harvester knows that something that is in the
dc.identifier element of oai_dc points to a - compliant - jump-off page. There
are two problems with this assumption:
- lots of things can be in the dc.identifier element both resolvable and
unresolvable
- lots of things at the end of the thing identified by the content of
dc.identifier (if resolvable) will not be compliant jump-off pages
This means harvesters never really know when they are facing the scenario that
you target, and hence will do a lot of meaningless dererferencing and parsing.
One could think of addressing this to some extent by a special-purpose
Descriptor in the Identify response to indicate that a repository actually is
'compliant' but that would still leave the harvester guessing about which of the
dc.identifiers (if there are multiple) is the magic one.
b. The actual existence of a 'jump-off' page. This is something that - in the
context of the OAI-PMH (with its disconnection of DP and SP) we can not just
take for granted or assume.
There are other problems related to obtaining content which are not covered by
the solution:
* How does a harvester know when to go after an update to content? The OAI-PMH
indicates that the datestamp of a record only changes when the metadata has
changed, it doesn't say anything about the content. I suggest it should stay
that way. So, in the proposed solution, content in a repo can change without
the harvester ever knowing about it.
* The scenario as described in the propsoal, in which a single metadata record
corresponds with a single "preprint" is only a special case of - future -
reality. Increasingly, objects held in and described by repositories will be
"compound" or "complex", i.e. consisting of multiple datastreams, not just a
single "preprint". I find that it would be desirable that a solution to get to
the content would be able to handle such situations. The proposed solution
could actually accomodate such 'compound' objects, because the mutliple
datastreams are linked off the jump-off page. There is, however, a problem.
Let's presume we have a situation in which an object is deposited in an
institutional repository that has 2 datastreams, each of which actually has a
unique identifier, say a doi or something. Thinking of a - future -
self-archiving scenario and the trend to accord identifiers at finer levels of
granularity, this is not unlikely at all. Now we get 3 things in dc.identifier
(2 doi's and a link to a jump-off page), and 2 things in the jump-off page
(links to the 2 datastreams). How do I know which doi goes with which
datastream? Information that - I hope we will all agree - is rather significant.
OK. The point I am trying to make is that the described scenario and its more
general problem domain (beyond eprints, and into the realm of objects with
multiple datastreams) may call for another approach. Our research has shown
that such an approach can remain 100% OAI-PMH-based if a complex object format
such as METS, MPEG-21 DIDL or SCORM is used. These formats can be "parallel"
OAI-PMH "metadata formats" through which harvesters can get to the content
without running into issues such as the ones mentioned above. Content can be
embedded in the XML wrappers or pointed at by them. Identifiers can be
unambiguously connected to content. If content changes, the datstamp of the
"conplex" record changes.
I anticipate concerns re the overhead of introducing a solution based on a
complex object format. At this point, I would like to say 2 things with this
respect:
* It took 2 people on my team about 2 days to create a prototype plug-in that
enables OAI-PMH harvesting of content from DSpace repositories. Our plug-in
rendered content using the MPEG-21 DIDL XML wrapper format. Most of the time
invested in this plug-in was spent figuring out the DSpace API and a sensible
way to map the DSpace data model to the DIDL data model. The prototype was
demonstrated at the DSpace federation meeting, last week. Although
questions/issues did arise in the course of our work, non seemed unsolvable.
But it is my impression that the very fast delivery of a prototype indicates the
feasibility of the complex format approach.
* I would personally be very willing to spend time with the apporpiate
representatives of the community - including yourself - to work towards a
solution that is future-proof and provides adequate guarantees regarding
perceived requirements of a content-harvesting solution. I would actually
prefer that over going for a solution which is attractive at first glance
because of its obvious simplicity, but which seems to raise some relevant
questions upon closer inspection.
To end, I would like to thank you for bringing this topic to the list. I have
had many private email exchanges over the last few months especially with
representatives from DARE and DINI about this and related problem domains. I
hope that your mail can be another impulse towards a joint action in this realm.
The problem is very real, and I would love our community to jointly create a
really good solution to it.
many greetings
herbert
Andy Powell wrote:
> The JISC-funded ePrints UK project has a requirement to automatically
> harvest both metadata and full-text from the eprint archives within UK
> academia (and potentially elsewhere). This is so that we can pass both
> metadata and full-text to the various 'enhancement' Web services offered
> by our partners.
>
> http://www.rdn.ac.uk/projects/eprints-uk/
>
> In order for our harvesting robot to be able to do this, it must be able
> to reliably (and automatically) determine the correct URL(s) for the
> various full-text manifestation(s) (HTML, PDF, RTF, etc.) of each eprint.
>
> Our "Using simple Dublin Core to describe eprints" guidelines are intended
> to encourage greater consistency in the metadata that is exposed by eprint
> archives using the 'oai_dc' format within the OAI Protocol for Metadata
> Harvesting (OAI-PMH). Somewhat perversely, because we stick rigidly to
> the semantics of the DC element set, our guidelines make determining the
> URL of each manifestation that is available quite difficult. (This is
> largely a consequence of the 'simple' nature of 'simple DC'!). In
> general, the URL in the element of the oai_dc record is
> the URL of a jump-off page, rather than a direct link to the full-text.
>
> We would like to suggest a new proposal for unambiguously embedding the
> URL for each manifestation of an eprint into the (X)HTML jump-off page for
> that eprint. Since the jump-off page is generated automatically by the
> eprint archive software, doing this shouldn't be too difficult (in fact,
> we would hope that archive software, such as eprints.org, will be
> configured to do this out of the box).
>
> If this proposal is adopted, it will make it much easier to write OAI
> service provider software that can reliably gather the full-text of an
> eprint, given only the oai_dc record for that eprint.
>
> The proposal is at
>
> http://www.rdn.ac.uk/projects/eprints-uk/docs/encoding-fulltext-links/
>
> Comments are welcome,
>
> Andy
> --
> Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK
> http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933
> Resource Discovery Network http://www.rdn.ac.uk/
> ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>
>
--
Herbert Van de Sompel
digital library research & prototyping
Los Alamos National Laboratory - Research Library
+ 1 (505) 667 1267 / http://lib-www.lanl.gov/~herbertv/
"met gestreken jeans de dansvloer penetreren"
From tdb01r@ecs.soton.ac.uk Wed Mar 17 21:17:54 2004
From: tdb01r@ecs.soton.ac.uk (Tim Brody)
Date: Wed, 17 Mar 2004 21:17:54 +0000
Subject: [OAI-implementers] Automatically gathering the full-text of eprints
In-Reply-To:
References:
Message-ID: <4058C082.3000001@ecs.soton.ac.uk>
We've done a preliminary implementation of this at Southampton for:
eprints.ecs.soton.ac.uk
and
eprints.soton.ac.uk
It took me about an hour to do, I suspect Chris did it in much less time :-)
All the best,
Tim.
Andy Powell wrote:
> The JISC-funded ePrints UK project has a requirement to automatically
> harvest both metadata and full-text from the eprint archives within UK
> academia (and potentially elsewhere). This is so that we can pass both
> metadata and full-text to the various 'enhancement' Web services offered
> by our partners.
>
> http://www.rdn.ac.uk/projects/eprints-uk/
>
> In order for our harvesting robot to be able to do this, it must be able
> to reliably (and automatically) determine the correct URL(s) for the
> various full-text manifestation(s) (HTML, PDF, RTF, etc.) of each eprint.
>
> Our "Using simple Dublin Core to describe eprints" guidelines are intended
> to encourage greater consistency in the metadata that is exposed by eprint
> archives using the 'oai_dc' format within the OAI Protocol for Metadata
> Harvesting (OAI-PMH). Somewhat perversely, because we stick rigidly to
> the semantics of the DC element set, our guidelines make determining the
> URL of each manifestation that is available quite difficult. (This is
> largely a consequence of the 'simple' nature of 'simple DC'!). In
> general, the URL in the element of the oai_dc record is
> the URL of a jump-off page, rather than a direct link to the full-text.
>
> We would like to suggest a new proposal for unambiguously embedding the
> URL for each manifestation of an eprint into the (X)HTML jump-off page for
> that eprint. Since the jump-off page is generated automatically by the
> eprint archive software, doing this shouldn't be too difficult (in fact,
> we would hope that archive software, such as eprints.org, will be
> configured to do this out of the box).
>
> If this proposal is adopted, it will make it much easier to write OAI
> service provider software that can reliably gather the full-text of an
> eprint, given only the oai_dc record for that eprint.
>
> The proposal is at
>
> http://www.rdn.ac.uk/projects/eprints-uk/docs/encoding-fulltext-links/
>
> Comments are welcome,
>
> Andy
> --
> Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK
> http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933
> Resource Discovery Network http://www.rdn.ac.uk/
> ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>
>
From herbertv@lanl.gov Wed Mar 17 22:54:55 2004
From: herbertv@lanl.gov (herbert van de sompel)
Date: Wed, 17 Mar 2004 15:54:55 -0700
Subject: [OAI-implementers] Automatically gathering the full-text of eprints
In-Reply-To: <4058C082.3000001@ecs.soton.ac.uk>
References: <4058C082.3000001@ecs.soton.ac.uk>
Message-ID: <4058D73F.9030203@lanl.gov>
Tim Brody wrote:
> We've done a preliminary implementation of this at Southampton for:
> eprints.ecs.soton.ac.uk
> and
> eprints.soton.ac.uk
>
> It took me about an hour to do, I suspect Chris did it in much less time
> :-)
>
I trust that the amount of seconds it takes to implement a solution is not the
only evaluation criterion. I very much agree it is an important one, and it is
one that has always played a significant role in designing the OAI-PMH and
related specifications. But it seems to me that there are other criteria such
as meeting functional requirements that play. I have, obviously, not seen the
list of requirements. I do understand the goals, however. And, as described in
my previous mail, I can think of some possible requirements related to those
goals that may not be met by the proposed solution.
This consideration clearly allows for alternative solutions to the problem than
the one based on complex objects, which I described in my previous mail. I
suggested the complex object path because we have done quite some work in that
realm, and because that work has urged us to think in a general way about the
content-harvesting problem.
cheers
herbert
> All the best,
> Tim.
>
> Andy Powell wrote:
>
>> The JISC-funded ePrints UK project has a requirement to automatically
>> harvest both metadata and full-text from the eprint archives within UK
>> academia (and potentially elsewhere). This is so that we can pass both
>> metadata and full-text to the various 'enhancement' Web services offered
>> by our partners.
>>
>> http://www.rdn.ac.uk/projects/eprints-uk/
>>
>> In order for our harvesting robot to be able to do this, it must be able
>> to reliably (and automatically) determine the correct URL(s) for the
>> various full-text manifestation(s) (HTML, PDF, RTF, etc.) of each eprint.
>>
>> Our "Using simple Dublin Core to describe eprints" guidelines are
>> intended
>> to encourage greater consistency in the metadata that is exposed by
>> eprint
>> archives using the 'oai_dc' format within the OAI Protocol for Metadata
>> Harvesting (OAI-PMH). Somewhat perversely, because we stick rigidly to
>> the semantics of the DC element set, our guidelines make determining the
>> URL of each manifestation that is available quite difficult. (This is
>> largely a consequence of the 'simple' nature of 'simple DC'!). In
>> general, the URL in the element of the oai_dc record is
>> the URL of a jump-off page, rather than a direct link to the full-text.
>>
>> We would like to suggest a new proposal for unambiguously embedding the
>> URL for each manifestation of an eprint into the (X)HTML jump-off page
>> for
>> that eprint. Since the jump-off page is generated automatically by the
>> eprint archive software, doing this shouldn't be too difficult (in fact,
>> we would hope that archive software, such as eprints.org, will be
>> configured to do this out of the box).
>>
>> If this proposal is adopted, it will make it much easier to write OAI
>> service provider software that can reliably gather the full-text of an
>> eprint, given only the oai_dc record for that eprint.
>>
>> The proposal is at
>>
>> http://www.rdn.ac.uk/projects/eprints-uk/docs/encoding-fulltext-links/
>>
>> Comments are welcome,
>>
>> Andy
>> --
>> Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK
>> http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933
>> Resource Discovery Network http://www.rdn.ac.uk/
>> ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/
>> _______________________________________________
>> OAI-implementers mailing list
>> List information, archives, preferences and to unsubscribe:
>> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>>
>>
>
>
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>
>
--
Herbert Van de Sompel
digital library research & prototyping
Los Alamos National Laboratory - Research Library
+ 1 (505) 667 1267 / http://lib-www.lanl.gov/~herbertv/
"met gestreken jeans de dansvloer penetreren"
From chrish@athabascau.ca Thu Mar 18 21:28:18 2004
From: chrish@athabascau.ca (Chris Hubick)
Date: Thu, 18 Mar 2004 14:28:18 -0700
Subject: [OAI-implementers] Identifiers (catalog/entry)
In-Reply-To: <01C40B5B.2898AB80.adam.cooper@fdlearning.com>
References: <01C40B5B.2898AB80.adam.cooper@fdlearning.com>
Message-ID: <1079645297.8376.55.camel@edtech25.edtech.athabascau.ca>
On Tue, 2004-03-16 at 06:32, Adam Cooper wrote:
> A number of IMS specs have played with this 2 part/1part problem. RDCEO
> (Reusable Definition of Competences and Educational Objectives) talks about
> URN scheme, and I think we had assumed that the catalog would map to the
> NSS and the entry to the NID.
Hrm, that is what I have been doing up until now, but I am coming to
think I was wrong...
> Exactly what the significance of LOM catalog might be is probably another
> question, and one that is intentionally open I think. Is it necessarily
> more than an _indicator_ of the creator of the identifier?
I don't think it is open, the LOM spec says the catalog is "A namespace
scheme". This would, at least to me, clearly indicate that the entry is
namespaced by it's catalog. Any other interpretation would lead to much
greater problems, in that many people simply use increasing integer
numbers to identify their metadata records, and without those numbers
being namespaced by the catalog, we would have *many* collisions.
We basically have a 'three level' system. It is up to all those people
sharing any particular LOM catalog to guarantee uniqueness within *that*
catalog. For those (most) of us who share the 'URI' catalog, we have
the same uniqueness requirement, which we use the NID to satisfy. This
pushes it down a level, where all those people sharing any particular
NID must also guarantee uniqueness within that NID. A system like 'oai'
uses DNS names to do this.
IMHO, three levels is overly complex. Yes, we could all run off and use
whatever format entries we like, namespaced by our catalog (which is
pretty much what people have done up until recently). In an ideal
world, however, we would remove this extra level by all using the same
catalog, and partition within that ourselves. The URI system gives us
that catalog, partitioned by NID. All those using URI's have done the
extra work in agreeing to share a common syntax and associated
facilities for partioning the 'URI' namespace. By not using a 'URI'
catalog, you basically mitigate that effort by making the fact you are
actually using URI format entries *opaque* to others from a LOM
perspective.
I think LOM might have been better off to do as OAI has done and just
force everyone to uncode their id's as URI's, rather than having a
separate catalog field, but they didn't, so here we are.
--
Chris Hubick
mailto:chrish@athabascau.ca
mailto:chris@hubick.com
phone:1-780-421-2533 (work)
phone:1-780-721-9932 (cell)
http://www.hubick.com/
__
This communication is intended for the use of the recipient to whom it
is addressed, and may contain confidential, personal, and or privileged
information. Please contact us immediately if you are not the intended
recipient of this communication, and do not copy, distribute, or take
action relying on it. Any communications received in error, or
subsequent reply, should be deleted or destroyed.
---
From herbertv@lanl.gov Thu Mar 18 21:51:45 2004
From: herbertv@lanl.gov (herbert van de sompel)
Date: Thu, 18 Mar 2004 14:51:45 -0700
Subject: [OAI-implementers] Identifiers (catalog/entry)
In-Reply-To: <1079645297.8376.55.camel@edtech25.edtech.athabascau.ca>
References: <01C40B5B.2898AB80.adam.cooper@fdlearning.com> <1079645297.8376.55.camel@edtech25.edtech.athabascau.ca>
Message-ID: <405A19F1.6090100@lanl.gov>
Chris Hubick wrote:
> I think LOM might have been better off to do as OAI has done and just
> force everyone to uncode their id's as URI's, rather than having a
> separate catalog field, but they didn't, so here we are.
>
You may still be able to talk in URI terms about objects with these ids by using
the info URI scheme. See http://info-uri.info/ .
herbert
--
Herbert Van de Sompel
digital library research & prototyping
Los Alamos National Laboratory - Research Library
+ 1 (505) 667 1267 / http://lib-www.lanl.gov/~herbertv/
"met gestreken jeans de dansvloer penetreren"
From a.powell@ukoln.ac.uk Fri Mar 19 16:51:02 2004
From: a.powell@ukoln.ac.uk (Andy Powell)
Date: Fri, 19 Mar 2004 16:51:02 +0000 (GMT Standard Time)
Subject: [OAI-implementers] Automatically gathering the full-text of
eprints
In-Reply-To: <4058A4CB.6040200@lanl.gov>
References:
<4058A4CB.6040200@lanl.gov>
Message-ID:
On Wed, 17 Mar 2004, herbert van de sompel wrote:
> a. The assumption that a harvester knows that something that is in the
> dc.identifier element of oai_dc points to a - compliant - jump-off page. There
> are two problems with this assumption:
> - lots of things can be in the dc.identifier element both resolvable and
> unresolvable
> - lots of things at the end of the thing identified by the content of
> dc.identifier (if resolvable) will not be compliant jump-off pages
Herbert,
thanks for the email. Yes, I completely agree with this analysis. The
proposal is a bit of a hack, and we should perhaps have made this clearer
in the document!
However, I think it is a useful hack :-). Particularly so in the context
of our other recommendations for using simple DC to describe eprints.
Furthermore, I would see it as good practice to embed XHTML
elements into jump-off pages anyway - irrespective of whether the
intention is to ease harvesting by robots or not. So I certainly don't
see our proposal as causing any harm.
The rest of your email raises some quite significant issues - some of
which I suspect are not very easy to discuss by email. I don't propose
giving a detailed response here, but I would like to note a few issues for
consideration...
Firstly, your comments about the complexity of the objects being described
only goes part-way to describing the problem. The OAI-PMH specification,
rightly, says very little about the nature of the resources that are
described by the records exchanged using the protocol. However,
particular applications of the protocol do need to be clear about the
nature of the resources being described. Furthermore, the complexity of
the problem is not just about whether the resources being described are
aggregations of multiple objects. Part of the complexity arises because
those those resources/objects fit into a model of the real world that
spans both 'conceptual' works and specific digital or physical
'manifestations' of those conceptual works.
Does the oai_dc record that I allow you to harvest describe a conceptual
work (or expression of a work), an article for example, or does it
describe one of the particular manifestations of that work, the PDF copy
of the article for example?
You'll note that I am intentionally using terms from the IFLA FRBR
(Functional Requirements for Bibliographic Records) model here.
In our guidelines for using simple DC to describe eprints we made the
explicit decision to reflect the fact that most implementations of eprint
archives (that we looked at) appeared to be configured to expose oai_dc
metadata about the 'work' rather than about the particular manifestations
of the work (though actually, in many cases (even in our own guidelines
to a certain extent) there is a certain amount of fuzziness going on!).
Unfortunately, there is no real way of indicating in a simple DC record
that the work (as opposed to the manifestation) is being described - this
would be difficult even in qualified DC currently, because the current
DCMI Type vocabulary doesn't allow us to make those distinctions. But, in
principle, the DC model is rich enough to handle this complexity - if
we are prepared to put the effort in to agree how to do it.
But the situation is even more complex than that because it is not clear
to me where OAI resources and records sit within the Web architectural
model of 'resources' and 'representations'. My suspicion is that the FRBR
'manifestation' is the equivalent of the Web architecture
'represresentation' of the FRBR 'work' (if you see what I mean!). The
oai_dc record (and indeed the jump-off page) is a 'representation' of
the 'work' (assuming that is what is being described). But at this point
we almost certainly need a diagram or two! :-(
OK, so on then to the question about whether the protocol can and/or
should be used to exchange 'resources' as well as 'metadata' about
'resources'.
The protocol spec is very explicit in differentiating 'resources' from
'items' and 'records' and makes it very clear that the protocol be used to
exchange 'metadata' between services - I'm thinking of section 2.2 in
particular. Now, with hindsight, I really wish we'd talked instead about
'resources' and 'representations' rather than resources, items, records
and metadata, because that would have given us much more flexibility about
what we do with the protocol. But we didn't - and therefore, I think we
are constrained in terms of what we can do within the semantics of the
protocol spec.
This is not just to do with the words being used in the spec. It has to
do with the entities in the model used by the protocol and the identifiers
that are assigned to those entities. An oai-identifier, for exanmple, is
an identifier of an 'item', not of a 'resource' (in terms of the protocol
usage of those words). It seems to me that things are likely to become
very fuzzy if the 'item' or 'record' suddenly becomes the 'resource' and
vice versa.
So, based on this, it seems to me that the protocol will 'break' if we
start using it to carry the 'resource' where the protocol expects to see
the 'record about the resource'.
Now, your complex example of the METS package or the MPEG-21 DIDL is an
interesting case - because those things can be used to carry both the
metadata and the object. Is a METS package the 'resource' or the 'record'
in OAI terms? The answer is that it is somewhere in-between. I certainly
accept that the METS package is a 'representation' of a 'resource' - but,
as I mentioned above, unfortunately we didn't use the words 'resource' and
'representation' in the protocol spec. Yes, the complex package can be
viewed as metadata - but metadata about what - about the 'work' that the
objects in the package 'represent', or about the particular manifestations
contained in the package??!
All in all, I think I'm happy with the case where OAI is used to carry the
METS or DIDL package that contain objects - but I would be much less happy
with a situation where the OAI-PMH is used to carry individual
manifestations (an XHTML document for example). But the fuzziness between
the package and the item worries me and I'm not sure that we are going to
be able to tell them apart very easily in all cases.
Enough for now... I agree with you that much more discussion and thinking
about these issues is required. I'm certainly happy (and indeed
expecting) to be told I'm wrong about any or all of the above! :-)
Regards,
Andy.
> * The scenario as described in the propsoal, in which a single metadata record
> corresponds with a single "preprint" is only a special case of - future -
> reality. Increasingly, objects held in and described by repositories will be
> "compound" or "complex", i.e. consisting of multiple datastreams, not just a
> single "preprint". I find that it would be desirable that a solution to get to
> the content would be able to handle such situations. The proposed solution
> could actually accomodate such 'compound' objects, because the mutliple
> datastreams are linked off the jump-off page. There is, however, a problem.
> Let's presume we have a situation in which an object is deposited in an
> institutional repository that has 2 datastreams, each of which actually has a
> unique identifier, say a doi or something. Thinking of a - future -
> self-archiving scenario and the trend to accord identifiers at finer levels of
> granularity, this is not unlikely at all. Now we get 3 things in dc.identifier
> (2 doi's and a link to a jump-off page), and 2 things in the jump-off page
> (links to the 2 datastreams). How do I know which doi goes with which
> datastream? Information that - I hope we will all agree - is rather significant.
>
> OK. The point I am trying to make is that the described scenario and its more
> general problem domain (beyond eprints, and into the realm of objects with
> multiple datastreams) may call for another approach. Our research has shown
> that such an approach can remain 100% OAI-PMH-based if a complex object format
> such as METS, MPEG-21 DIDL or SCORM is used. These formats can be "parallel"
> OAI-PMH "metadata formats" through which harvesters can get to the content
> without running into issues such as the ones mentioned above. Content can be
> embedded in the XML wrappers or pointed at by them. Identifiers can be
> unambiguously connected to content. If content changes, the datstamp of the
> "conplex" record changes.
>
> I anticipate concerns re the overhead of introducing a solution based on a
> complex object format. At this point, I would like to say 2 things with this
> respect:
>
> * It took 2 people on my team about 2 days to create a prototype plug-in that
> enables OAI-PMH harvesting of content from DSpace repositories. Our plug-in
> rendered content using the MPEG-21 DIDL XML wrapper format. Most of the time
> invested in this plug-in was spent figuring out the DSpace API and a sensible
> way to map the DSpace data model to the DIDL data model. The prototype was
> demonstrated at the DSpace federation meeting, last week. Although
> questions/issues did arise in the course of our work, non seemed unsolvable.
> But it is my impression that the very fast delivery of a prototype indicates the
> feasibility of the complex format approach.
>
> * I would personally be very willing to spend time with the apporpiate
> representatives of the community - including yourself - to work towards a
> solution that is future-proof and provides adequate guarantees regarding
> perceived requirements of a content-harvesting solution. I would actually
> prefer that over going for a solution which is attractive at first glance
> because of its obvious simplicity, but which seems to raise some relevant
> questions upon closer inspection.
>
> To end, I would like to thank you for bringing this topic to the list. I have
> had many private email exchanges over the last few months especially with
> representatives from DARE and DINI about this and related problem domains. I
> hope that your mail can be another impulse towards a joint action in this realm.
> The problem is very real, and I would love our community to jointly create a
> really good solution to it.
>
> many greetings
>
> herbert
>
>
> Andy Powell wrote:
>
> > The JISC-funded ePrints UK project has a requirement to automatically
> > harvest both metadata and full-text from the eprint archives within UK
> > academia (and potentially elsewhere). This is so that we can pass both
> > metadata and full-text to the various 'enhancement' Web services offered
> > by our partners.
> >
> > http://www.rdn.ac.uk/projects/eprints-uk/
> >
> > In order for our harvesting robot to be able to do this, it must be able
> > to reliably (and automatically) determine the correct URL(s) for the
> > various full-text manifestation(s) (HTML, PDF, RTF, etc.) of each eprint.
> >
> > Our "Using simple Dublin Core to describe eprints" guidelines are intended
> > to encourage greater consistency in the metadata that is exposed by eprint
> > archives using the 'oai_dc' format within the OAI Protocol for Metadata
> > Harvesting (OAI-PMH). Somewhat perversely, because we stick rigidly to
> > the semantics of the DC element set, our guidelines make determining the
> > URL of each manifestation that is available quite difficult. (This is
> > largely a consequence of the 'simple' nature of 'simple DC'!). In
> > general, the URL in the element of the oai_dc record is
> > the URL of a jump-off page, rather than a direct link to the full-text.
> >
> > We would like to suggest a new proposal for unambiguously embedding the
> > URL for each manifestation of an eprint into the (X)HTML jump-off page for
> > that eprint. Since the jump-off page is generated automatically by the
> > eprint archive software, doing this shouldn't be too difficult (in fact,
> > we would hope that archive software, such as eprints.org, will be
> > configured to do this out of the box).
> >
> > If this proposal is adopted, it will make it much easier to write OAI
> > service provider software that can reliably gather the full-text of an
> > eprint, given only the oai_dc record for that eprint.
> >
> > The proposal is at
> >
> > http://www.rdn.ac.uk/projects/eprints-uk/docs/encoding-fulltext-links/
> >
> > Comments are welcome,
> >
> > Andy
> > --
> > Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK
> > http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933
> > Resource Discovery Network http://www.rdn.ac.uk/
> > ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/
> > _______________________________________________
> > OAI-implementers mailing list
> > List information, archives, preferences and to unsubscribe:
> > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> >
> >
>
> --
> Herbert Van de Sompel
> digital library research & prototyping
> Los Alamos National Laboratory - Research Library
> + 1 (505) 667 1267 / http://lib-www.lanl.gov/~herbertv/
>
> "met gestreken jeans de dansvloer penetreren"
>
>
>
Andy
--
Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK
http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933
Resource Discovery Network http://www.rdn.ac.uk/
ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/
From herbertv@lanl.gov Fri Mar 19 19:47:19 2004
From: herbertv@lanl.gov (herbert van de sompel)
Date: Fri, 19 Mar 2004 12:47:19 -0700
Subject: [OAI-implementers] Automatically gathering the full-text of eprints
In-Reply-To:
References: <4058A4CB.6040200@lanl.gov>
Message-ID: <405B4E47.5090309@lanl.gov>
Dear Andy,
Thanks a lot for your thoughtful comments. I provide some feedback, here,
hoping that we can find some time to discuss all of this with representatives
from the community in front of a much needed blackboard (whiteboard?).
First, let me say that this mail isn't at all about trying to prove you wrong.
Quite to the contrary. This is about conveying my perception of matters,
hoping to further our joint insights in this rather complicated domain.
Second, I feel that your FRBR-related comments, while very legitimate, are on
quite the opposite end of the scale of your pragmatic, useful hack. I need to
take some time to try and think about how much or how little a discussion
related to making content accessible to service providers in the OAI-PMH
framework should get involved with this. At this point I am puzzled. For
example, I am not sure how much Google cares about the work/manifestation issue.
Google wants a FRBR Item.
Third, I must emphasize that I am very pleased to hear that - in principle - you
find the notion of shipping modelled representations (DIDL, METS, ...) of
resources through the OAI-PMH acceptable. Below, I hope to give some more
indications as to why I feel that is indeed acceptable/appropriate in the
context of the OAI-PMH.
=> Your W3C TAG resource/representation perspective is very helpful. Building
on your insights, and with some stretching, one could distinguish the following
levels:
level 1: W3C.resource ~ FRBR.work ~ OAI-PMH.resource
level 2: W3C.representation ~ FRBR.manifestation ~ OAI-PMH.record
=> Comments I want to make at this point:
* The OAI-PMH doesn't really say that an OAI-PMH.resource sits at level 1, and I
am not even sure it matters a lot for this discussion because the action we are
interested in is at level 2 at which the OAI-PMH.resource clearly does not sit.
* Putting OAI-PMH.record at the level of W3C.representation makes a lot of
sense, especially since v.2.0 of the protocol in which OAI-PMH.records have
gained autonomy by getting their own datestamp.
* I dare to suggest that OAI-PMH.record ~ OAI-PMH.metadata ~ structured data
pertaining to an OAI-PMH.resource. I dare to make this statement because the
OAI-PMH has this built-in notion that equals metadata to XML. So I feel it is
actually constructive for our reasoning to get rid of the term 'metadata' with
its numerous and in many cases vague/loaded interpretations, and consider an
OAI-PMH.record to be structured data pertaining to an OAI-PMH.resource. That is
a quite unambiguous definition.
* I think OAI-PMH.item doesn't matter in this discussion as it merely is a
gateway to OAI-PMH.records. I feel we can loose that term too for our discussion.
=> I really don't think that OAI-PMH.identifier matters in any of this. While
the OAI-PMH.identifier is a crucial key for harvesting, it doesn't need to have
anything to do with any 'real' identifiers, nor with the real world data. Agreed
that in some implementations - for practical reasons - it does, but there is no
reason for it to. So we should not be distracted by it. If we really need to
accord meaning to OAI-PMH.identifier, we could consider it to be the identifier
shared by all W3C.representations of a W3C.resource (~OAI-PMH.resource), as it
acts as the gateway to all OAI-PMH.records pertaining to a OAI-PMH.resource.
Even in this interpretation, the OAI-PMH.identifier doesn't come close to
becoming the identifier of the OAI-PMH.resource, irrespective of what the exact
nature of the OAI-PMH.record is.
- Cf. the W3C TAG distinction between URI for resource and URI for representation.
- Cf URI for resource == doi / URI for representation is OAI-PMH request using
unrelated OAI-PMH.identifier
=> So, I think we lost quite some overhead in the above. We are down to
OAI-PMH.resource (rather undefined, how nice) and OAI-PMH.record (well defined,
as being structured data pertaining to OAI-PMH.resource) to play with. I think
we can all go along with an interpretation that an oai_dc record is a
W3C.representation of an OAI-PMH.resource. I trust we would also agree this is
the case for a special-purpose QDC record that 'models' the OAI-PMH.resource,
and in doing so includes some links to datastreams of which that
OAI-PMH.resource consists. The step to a complex object solution (METS, DIDL,
...) is really small from here as those indeed provide such by-reference
technique to include datastreams, as well as by-value techniques to do so. In
addtion, some complex object approaches actually have a data model so that the
required 'modelling' boils down to mapping a specific world view to the existing
data model.
=> I very much share your opinion that directly shipping a
datastream/representation of the resource in an 'unmodelled' manner smells
really fishy, as it makes us loose the 'structured data pertaining to the
resource' life buoy.
Jeez, this took me ages to write. And now it is my turn to be proven wrong ;-)
cheers
herbert
Andy Powell wrote:
> On Wed, 17 Mar 2004, herbert van de sompel wrote:
>
>
>>a. The assumption that a harvester knows that something that is in the
>>dc.identifier element of oai_dc points to a - compliant - jump-off page. There
>>are two problems with this assumption:
>>- lots of things can be in the dc.identifier element both resolvable and
>>unresolvable
>>- lots of things at the end of the thing identified by the content of
>>dc.identifier (if resolvable) will not be compliant jump-off pages
>
>
> Herbert,
> thanks for the email. Yes, I completely agree with this analysis. The
> proposal is a bit of a hack, and we should perhaps have made this clearer
> in the document!
>
> However, I think it is a useful hack :-). Particularly so in the context
> of our other recommendations for using simple DC to describe eprints.
> Furthermore, I would see it as good practice to embed XHTML
> elements into jump-off pages anyway - irrespective of whether the
> intention is to ease harvesting by robots or not. So I certainly don't
> see our proposal as causing any harm.
>
> The rest of your email raises some quite significant issues - some of
> which I suspect are not very easy to discuss by email. I don't propose
> giving a detailed response here, but I would like to note a few issues for
> consideration...
>
> Firstly, your comments about the complexity of the objects being described
> only goes part-way to describing the problem. The OAI-PMH specification,
> rightly, says very little about the nature of the resources that are
> described by the records exchanged using the protocol. However,
> particular applications of the protocol do need to be clear about the
> nature of the resources being described. Furthermore, the complexity of
> the problem is not just about whether the resources being described are
> aggregations of multiple objects. Part of the complexity arises because
> those those resources/objects fit into a model of the real world that
> spans both 'conceptual' works and specific digital or physical
> 'manifestations' of those conceptual works.
>
> Does the oai_dc record that I allow you to harvest describe a conceptual
> work (or expression of a work), an article for example, or does it
> describe one of the particular manifestations of that work, the PDF copy
> of the article for example?
>
> You'll note that I am intentionally using terms from the IFLA FRBR
> (Functional Requirements for Bibliographic Records) model here.
>
> In our guidelines for using simple DC to describe eprints we made the
> explicit decision to reflect the fact that most implementations of eprint
> archives (that we looked at) appeared to be configured to expose oai_dc
> metadata about the 'work' rather than about the particular manifestations
> of the work (though actually, in many cases (even in our own guidelines
> to a certain extent) there is a certain amount of fuzziness going on!).
>
> Unfortunately, there is no real way of indicating in a simple DC record
> that the work (as opposed to the manifestation) is being described - this
> would be difficult even in qualified DC currently, because the current
> DCMI Type vocabulary doesn't allow us to make those distinctions. But, in
> principle, the DC model is rich enough to handle this complexity - if
> we are prepared to put the effort in to agree how to do it.
>
> But the situation is even more complex than that because it is not clear
> to me where OAI resources and records sit within the Web architectural
> model of 'resources' and 'representations'. My suspicion is that the FRBR
> 'manifestation' is the equivalent of the Web architecture
> 'represresentation' of the FRBR 'work' (if you see what I mean!). The
> oai_dc record (and indeed the jump-off page) is a 'representation' of
> the 'work' (assuming that is what is being described). But at this point
> we almost certainly need a diagram or two! :-(
>
> OK, so on then to the question about whether the protocol can and/or
> should be used to exchange 'resources' as well as 'metadata' about
> 'resources'.
>
> The protocol spec is very explicit in differentiating 'resources' from
> 'items' and 'records' and makes it very clear that the protocol be used to
> exchange 'metadata' between services - I'm thinking of section 2.2 in
> particular. Now, with hindsight, I really wish we'd talked instead about
> 'resources' and 'representations' rather than resources, items, records
> and metadata, because that would have given us much more flexibility about
> what we do with the protocol. But we didn't - and therefore, I think we
> are constrained in terms of what we can do within the semantics of the
> protocol spec.
>
> This is not just to do with the words being used in the spec. It has to
> do with the entities in the model used by the protocol and the identifiers
> that are assigned to those entities. An oai-identifier, for exanmple, is
> an identifier of an 'item', not of a 'resource' (in terms of the protocol
> usage of those words). It seems to me that things are likely to become
> very fuzzy if the 'item' or 'record' suddenly becomes the 'resource' and
> vice versa.
>
> So, based on this, it seems to me that the protocol will 'break' if we
> start using it to carry the 'resource' where the protocol expects to see
> the 'record about the resource'.
>
> Now, your complex example of the METS package or the MPEG-21 DIDL is an
> interesting case - because those things can be used to carry both the
> metadata and the object. Is a METS package the 'resource' or the 'record'
> in OAI terms? The answer is that it is somewhere in-between. I certainly
> accept that the METS package is a 'representation' of a 'resource' - but,
> as I mentioned above, unfortunately we didn't use the words 'resource' and
> 'representation' in the protocol spec. Yes, the complex package can be
> viewed as metadata - but metadata about what - about the 'work' that the
> objects in the package 'represent', or about the particular manifestations
> contained in the package??!
>
> All in all, I think I'm happy with the case where OAI is used to carry the
> METS or DIDL package that contain objects - but I would be much less happy
> with a situation where the OAI-PMH is used to carry individual
> manifestations (an XHTML document for example). But the fuzziness between
> the package and the item worries me and I'm not sure that we are going to
> be able to tell them apart very easily in all cases.
>
> Enough for now... I agree with you that much more discussion and thinking
> about these issues is required. I'm certainly happy (and indeed
> expecting) to be told I'm wrong about any or all of the above! :-)
>
> Regards,
>
> Andy.
>
>
>>* The scenario as described in the propsoal, in which a single metadata record
>>corresponds with a single "preprint" is only a special case of - future -
>>reality. Increasingly, objects held in and described by repositories will be
>>"compound" or "complex", i.e. consisting of multiple datastreams, not just a
>>single "preprint". I find that it would be desirable that a solution to get to
>>the content would be able to handle such situations. The proposed solution
>>could actually accomodate such 'compound' objects, because the mutliple
>>datastreams are linked off the jump-off page. There is, however, a problem.
>>Let's presume we have a situation in which an object is deposited in an
>>institutional repository that has 2 datastreams, each of which actually has a
>>unique identifier, say a doi or something. Thinking of a - future -
>>self-archiving scenario and the trend to accord identifiers at finer levels of
>>granularity, this is not unlikely at all. Now we get 3 things in dc.identifier
>>(2 doi's and a link to a jump-off page), and 2 things in the jump-off page
>>(links to the 2 datastreams). How do I know which doi goes with which
>>datastream? Information that - I hope we will all agree - is rather significant.
>>
>>OK. The point I am trying to make is that the described scenario and its more
>>general problem domain (beyond eprints, and into the realm of objects with
>>multiple datastreams) may call for another approach. Our research has shown
>>that such an approach can remain 100% OAI-PMH-based if a complex object format
>>such as METS, MPEG-21 DIDL or SCORM is used. These formats can be "parallel"
>>OAI-PMH "metadata formats" through which harvesters can get to the content
>>without running into issues such as the ones mentioned above. Content can be
>>embedded in the XML wrappers or pointed at by them. Identifiers can be
>>unambiguously connected to content. If content changes, the datstamp of the
>>"conplex" record changes.
>>
>>I anticipate concerns re the overhead of introducing a solution based on a
>>complex object format. At this point, I would like to say 2 things with this
>>respect:
>>
>>* It took 2 people on my team about 2 days to create a prototype plug-in that
>>enables OAI-PMH harvesting of content from DSpace repositories. Our plug-in
>>rendered content using the MPEG-21 DIDL XML wrapper format. Most of the time
>>invested in this plug-in was spent figuring out the DSpace API and a sensible
>>way to map the DSpace data model to the DIDL data model. The prototype was
>>demonstrated at the DSpace federation meeting, last week. Although
>>questions/issues did arise in the course of our work, non seemed unsolvable.
>>But it is my impression that the very fast delivery of a prototype indicates the
>>feasibility of the complex format approach.
>>
>>* I would personally be very willing to spend time with the apporpiate
>>representatives of the community - including yourself - to work towards a
>>solution that is future-proof and provides adequate guarantees regarding
>>perceived requirements of a content-harvesting solution. I would actually
>>prefer that over going for a solution which is attractive at first glance
>>because of its obvious simplicity, but which seems to raise some relevant
>>questions upon closer inspection.
>>
>>To end, I would like to thank you for bringing this topic to the list. I have
>>had many private email exchanges over the last few months especially with
>>representatives from DARE and DINI about this and related problem domains. I
>>hope that your mail can be another impulse towards a joint action in this realm.
>> The problem is very real, and I would love our community to jointly create a
>>really good solution to it.
>>
>>many greetings
>>
>>herbert
>>
>>
>>Andy Powell wrote:
>>
>>
>>>The JISC-funded ePrints UK project has a requirement to automatically
>>>harvest both metadata and full-text from the eprint archives within UK
>>>academia (and potentially elsewhere). This is so that we can pass both
>>>metadata and full-text to the various 'enhancement' Web services offered
>>>by our partners.
>>>
>>>http://www.rdn.ac.uk/projects/eprints-uk/
>>>
>>>In order for our harvesting robot to be able to do this, it must be able
>>>to reliably (and automatically) determine the correct URL(s) for the
>>>various full-text manifestation(s) (HTML, PDF, RTF, etc.) of each eprint.
>>>
>>>Our "Using simple Dublin Core to describe eprints" guidelines are intended
>>>to encourage greater consistency in the metadata that is exposed by eprint
>>>archives using the 'oai_dc' format within the OAI Protocol for Metadata
>>>Harvesting (OAI-PMH). Somewhat perversely, because we stick rigidly to
>>>the semantics of the DC element set, our guidelines make determining the
>>>URL of each manifestation that is available quite difficult. (This is
>>>largely a consequence of the 'simple' nature of 'simple DC'!). In
>>>general, the URL in the element of the oai_dc record is
>>>the URL of a jump-off page, rather than a direct link to the full-text.
>>>
>>>We would like to suggest a new proposal for unambiguously embedding the
>>>URL for each manifestation of an eprint into the (X)HTML jump-off page for
>>>that eprint. Since the jump-off page is generated automatically by the
>>>eprint archive software, doing this shouldn't be too difficult (in fact,
>>>we would hope that archive software, such as eprints.org, will be
>>>configured to do this out of the box).
>>>
>>>If this proposal is adopted, it will make it much easier to write OAI
>>>service provider software that can reliably gather the full-text of an
>>>eprint, given only the oai_dc record for that eprint.
>>>
>>>The proposal is at
>>>
>>>http://www.rdn.ac.uk/projects/eprints-uk/docs/encoding-fulltext-links/
>>>
>>>Comments are welcome,
>>>
>>>Andy
>>>--
>>>Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK
>>>http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933
>>>Resource Discovery Network http://www.rdn.ac.uk/
>>>ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/
>>>_______________________________________________
>>>OAI-implementers mailing list
>>>List information, archives, preferences and to unsubscribe:
>>>http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>>>
>>>
>>
>>--
>>Herbert Van de Sompel
>>digital library research & prototyping
>>Los Alamos National Laboratory - Research Library
>>+ 1 (505) 667 1267 / http://lib-www.lanl.gov/~herbertv/
>>
>>"met gestreken jeans de dansvloer penetreren"
>>
>>
>>
>
>
> Andy
> --
> Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK
> http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933
> Resource Discovery Network http://www.rdn.ac.uk/
> ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>
>
--
Herbert Van de Sompel
digital library research & prototyping
Los Alamos National Laboratory - Research Library
+ 1 (505) 667 1267 / http://lib-www.lanl.gov/~herbertv/
"met gestreken jeans de dansvloer penetreren"
From evelyn@ime.usp.br Fri Mar 26 16:06:39 2004
From: evelyn@ime.usp.br (Evelyn Cristina Pinto)
Date: Fri, 26 Mar 2004 13:06:39 -0300 (EST)
Subject: [OAI-implementers] Publication date and Datestamps
Message-ID:
I noticed that the datestamp field is usually used for harvesting date,
but some repositories use it for submission date. Is the dc:date field
used for publication date and submission date too? How could I
differentiate them? Does anyone could give me more information about it?
Thanks,
Evelyn.
=================================================================
master student in Computer Science at USP-Brazil
From simeon@cs.cornell.edu Fri Mar 26 16:49:27 2004
From: simeon@cs.cornell.edu (Simeon Warner)
Date: Fri, 26 Mar 2004 11:49:27 -0500 (EST)
Subject: [OAI-implementers] Publication date and Datestamps
In-Reply-To:
References:
Message-ID:
Within OAI the record datestamp MUST be the date of last update of
the metadata record. Otherwise incremental harveting will not work.
Within simple dc metadata there is no (accepted) way to differentiate
types of date in the dc:date fields. For e-prints, a good recommendation
is that in the RDN guidelines given by Andy Powell, Michael Day and Peter
Cliff:
http://www.rdn.ac.uk/projects/eprints-uk/docs/simpledc-guidelines/#date
dc:date (*) Eprint-specific Recommendation:
The 'last-modified' date of the eprint and/or the date of its accession
into the archive.
The date should be formatted according to the W3C encoding rules for
dates and times [9] (a profile based on ISO 8601 known as W3C-DTF), for
example:
2000-12-2519992003-01
If necessary, repeat this element to provide both the last-modified date
and the date of accession. The last-modified date will be assumed to be
the more recent of the two dates. If only one date is provided, it will
be assumed that the last-modified date and the date of accession are the
--
Simeon
same. On Fri, 26 Mar 2004, Evelyn Cristina Pinto wrote:
> I noticed that the datestamp field is usually used for harvesting date,
> but some repositories use it for submission date. Is the dc:date field
> used for publication date and submission date too? How could I
> differentiate them? Does anyone could give me more information about it?
>
> Thanks,
> Evelyn.
>
> =================================================================
> master student in Computer Science at USP-Brazil
>
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>
From evelyn@ime.usp.br Sat Mar 27 13:35:34 2004
From: evelyn@ime.usp.br (Evelyn Cristina Pinto)
Date: Sat, 27 Mar 2004 10:35:34 -0300 (EST)
Subject: [OAI-implementers] Publication date and Datestamps
In-Reply-To:
Message-ID:
First at all, thanks a lot for the information. And I have a suggestion.
I know that OAI-PHM is a general protocol, not only for eprints. However I
think that the publication date is a *very* important information about an
eprint life-cycle. So I think that it should be shown explicitly through the
protocol.
Regards,
Evelyn.
On Fri, 26 Mar 2004, Simeon Warner wrote:
>
> Within OAI the record datestamp MUST be the date of last update of
> the metadata record. Otherwise incremental harveting will not work.
>
> Within simple dc metadata there is no (accepted) way to differentiate
> types of date in the dc:date fields. For e-prints, a good recommendation
> is that in the RDN guidelines given by Andy Powell, Michael Day and Peter
> Cliff:
> http://www.rdn.ac.uk/projects/eprints-uk/docs/simpledc-guidelines/#date
>
> dc:date (*) Eprint-specific Recommendation:
>
> The 'last-modified' date of the eprint and/or the date of its accession
> into the archive.
>
> The date should be formatted according to the W3C encoding rules for
> dates and times [9] (a profile based on ISO 8601 known as W3C-DTF), for
> example:
>
> 2000-12-25
> 1999
> 2003-01
>
> If necessary, repeat this element to provide both the last-modified date
> and the date of accession. The last-modified date will be assumed to be
> the more recent of the two dates. If only one date is provided, it will
> be assumed that the last-modified date and the date of accession are the
>
> --
> Simeon
>
>
> same. On Fri, 26 Mar 2004, Evelyn Cristina Pinto wrote:
> > I noticed that the datestamp field is usually used for harvesting date,
> > but some repositories use it for submission date. Is the dc:date field
> > used for publication date and submission date too? How could I
> > differentiate them? Does anyone could give me more information about it?
> >
> > Thanks,
> > Evelyn.
> >
> > =================================================================
> > master student in Computer Science at USP-Brazil
> >
> > _______________________________________________
> > OAI-implementers mailing list
> > List information, archives, preferences and to unsubscribe:
> > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> >
>
From herbertv@lanl.gov Sun Mar 28 16:25:16 2004
From: herbertv@lanl.gov (herbert van de sompel)
Date: Sun, 28 Mar 2004 09:25:16 -0700
Subject: [OAI-implementers] Carl Lagoze receives 2004 Kilgour Award
Message-ID: <4066FC6C.7020109@lanl.gov>
Please join me in congratulating Carl Lagoze on receiving the 2004 Kilgour Award
for Research in Library and Information Technology.
More information at
http://www.lita.org/ala/lita/litaresources/litascholarships/04fred.htm .
herbert van de sompel
--
Herbert Van de Sompel
digital library research & prototyping
Los Alamos National Laboratory - Research Library
+ 1 (505) 667 1267 / http://lib-www.lanl.gov/~herbertv/