From lming@vt.edu Mon Mar 8 17:47:04 2004 From: lming@vt.edu (Ming Luo) Date: Mon, 08 Mar 2004 12:47:04 -0500 Subject: [OAI-implementers] upgrading machine hosting OAI Explorer Message-ID: <404CB198.3020703@vt.edu> Hi All.: I'm upgrading the machine hosting Virginia Tech OAI Explorer. Things may be a little bit messy in the next few days. Will get back to you after the upgrading finish. Thanks, Ming Luo From orient_lo@163.com Thu Mar 4 01:08:44 2004 From: orient_lo@163.com (=?GB2312?Q?=C2=DE=CA=B1=BB=D4?=) Date: Thu, 4 Mar 2004 9:8:44 +0800 Subject: [OAI-implementers] resumptionToken cursor Message-ID: <200403040108.i2418iC16494@nsdlib.nsdl.cornell.edu> Why the value of cursor of resumptionToken is alway "0" in the first incomplete list response? It seems odd. In terms of its definition(a count of the number of elements of the complete list thus far returned ), it should be the number of records returned in the first incomplete list response, because when the first resumptionToken returned,a certain number(i.e. 1000) of records was already returned. Right? Very Respectfully, Steve Luo orient_lo@163.com From tanderson@collegis.com Thu Mar 4 23:19:15 2004 From: tanderson@collegis.com (Thor Anderson) Date: Thu, 4 Mar 2004 18:19:15 -0500 Subject: [OAI-implementers] Knowing you have harvested the whole collection Message-ID: <74365BBB0B30774182643C124412FCB842CC1F@EXCHCLUS.collegis.com> This is a multi-part message in MIME format. ------_=_NextPart_001_01C4023F.1C1B2161 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Greeting OAI-implementers, =20 I was wondering if someone could tell me the most accurate way to ensure my OAI harvesting program has harvested all of the records of a repository. Is there any way to get some sort of collection or repository metadata that holds a "total number of records" value? Or, because oai_dc metadata records are the most common denominator (and required for minimal OAI compliance?), can I assume that a request like this: http://services.nsdl.org:8080/nsdloai/OAI?verb=3DListRecords&metadataPref= i x=3Doai_dc =20 will give me the most complete set of records possible (once no more resumptionTokens are available)? =20 TIA for any help. Hope this wasn't in a FAQ somewhere that I missed. =20 Thor =20 ---------------------------------------- Thor Anderson, Ph.D. Collegis, Inc. tanderson@collegis.com =20 =20 ------_=_NextPart_001_01C4023F.1C1B2161 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Message
Greeting=20 OAI-implementers,
 
I was = wondering if=20 someone could tell me the most accurate way to ensure my OAI harvesting = program=20 has harvested all of the records of a repository.  Is there any way = to get=20 some sort of collection or repository metadata that holds a "total = number of=20 records" value?  Or, because oai_dc metadata records are the most = common=20 denominator (and required for minimal OAI compliance?), can I assume = that a=20 request like this: http://services.nsdl.org:8080/nsdloai/OAI?verb=3D= ListRecords&metadataPrefix=3Doai_dc
 
will = give me the=20 most complete set of records possible (once no more resumptionTokens are = available)?
 
TIA = for any=20 help.  Hope this wasn't in a FAQ somewhere that I=20 missed.
 
Thor
 
----------------------------------------
Thor = Anderson,=20 Ph.D.
Collegis,=20 Inc.
tanderson@collegis.com<= /FONT>
 
 
=00 ------_=_NextPart_001_01C4023F.1C1B2161-- From simeon@cs.cornell.edu Mon Mar 8 18:11:11 2004 From: simeon@cs.cornell.edu (Simeon Warner) Date: Mon, 8 Mar 2004 13:11:11 -0500 (EST) Subject: [OAI-implementers] ADMIN NOTE Message-ID: I'm afraid there has been a problem with the OAI-implementers mail server which resulted in a number of messages being delayed. The queued message should now have been sent out. Cheers, Simeon From simeon@cs.cornell.edu Mon Mar 8 18:13:09 2004 From: simeon@cs.cornell.edu (Simeon Warner) Date: Mon, 8 Mar 2004 13:13:09 -0500 (EST) Subject: [OAI-implementers] well-known port In-Reply-To: <5.2.0.9.0.20040209085933.025d61e0@popserv.ucop.edu> References: <5.2.0.9.0.20040209085933.025d61e0@popserv.ucop.edu> Message-ID: Since OAI-PMH works over HTTP, port 80 is the norm. The baseURL can include a port so anything else can be used. I don't see a need for agreement on any particular port. Cheers, Simeon On Mon, 9 Feb 2004, David Loy wrote: > Question: has a well-known port been adopted for OAI and if not have people > settled on a specific port (80?) > > Thanks > David Loy > > _______________________________________________ > OAI-implementers mailing list > List information, archives, preferences and to unsubscribe: > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers From simeon@cs.cornell.edu Mon Mar 8 18:38:55 2004 From: simeon@cs.cornell.edu (Simeon Warner) Date: Mon, 8 Mar 2004 13:38:55 -0500 (EST) Subject: [OAI-implementers] resumptionToken cursor In-Reply-To: <200403040108.i2418iC16494@nsdlib.nsdl.cornell.edu> References: <200403040108.i2418iC16494@nsdlib.nsdl.cornell.edu> Message-ID: On Thu, 4 Mar 2004, [GB2312] ÂÞʱ»Ô wrote: > Why the value of cursor of resumptionToken is alway "0" in the first incomplete list response? It seems odd. In terms of its definition(a count of the number of elements of the complete list thus far returned ), it should be the number of records > returned in the first incomplete list response, because when the first resumptionToken returned,a certain number(i.e. 1000) of records was already returned. Right? See the example in: http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm#FlowControl The cursor is the number of records or headers returned up to the start of the incomplete list response. Thus the first response always has cursor=0 if it is specified. Cheers, Simeon > Very Respectfully, > > Steve Luo > orient_lo@163.com From caar@loc.gov Mon Mar 8 18:42:56 2004 From: caar@loc.gov (Caroline Arms) Date: Mon, 8 Mar 2004 13:42:56 -0500 (EST) Subject: [OAI-implementers] set description In-Reply-To: <1063.160.36.192.134.1076705626.squirrel@kiva.lib.utk.edu> Message-ID: The Library of Congress is providing set descriptions, see http://memory.loc.gov/cgi-bin/oai2_0?verb=ListSets The descriptions use the oai_dc schema. They are essentially collection-level records for the underlying collection of content EXCEPT that I use "Records for " at the beginning of the title. When I wanted to do this, I could not find another pattern; you should not look at the practice you infer from them as authoritative in any way. I would be interested in knowing who (if anyone) is finding them useful. Caroline Arms caar@loc.gov Office of Strategic Initiatives On Fri, 13 Feb 2004, Jody DeRidder wrote: > Is anyone out there using set descriptions (optional tag in header)? > If so, would you please send us a link to your repository/service > provider so we could see it applied? > > We'd be grateful... > > --jody (for Anthony Smith) > > > -- > Jody DeRidder > IT Administrator II > Digital Library Center > 648A John C. Hodges Library > University of Tennessee > Knoxville, TN 37996 > > Phone: (865) 974-4796 > Email: deridder@aztec.lib.utk.edu > > _______________________________________________ > OAI-implementers mailing list > List information, archives, preferences and to unsubscribe: > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers > > From khage@umich.edu Mon Mar 8 18:59:55 2004 From: khage@umich.edu (Kat Hagedorn) Date: Mon, 8 Mar 2004 13:59:55 -0500 Subject: [OAI-implementers] set description In-Reply-To: Message-ID: Any further description of sets is useful for service providers. The set name, being short, can't always completely describe the contents of the set. More description helps us decide whether we want to harvest a particular set. - Kat On Monday, Mar 8, 2004, at 13:42 America/Detroit, Caroline Arms wrote: > > The Library of Congress is providing set descriptions, see > http://memory.loc.gov/cgi-bin/oai2_0?verb=ListSets > > The descriptions use the oai_dc schema. They are essentially > collection-level records for the underlying collection of content > EXCEPT > that I use "Records for " at the beginning of the title. When I > wanted to > do this, I could not find another pattern; you should not look at the > practice you infer from them as authoritative in any way. > > I would be interested in knowing who (if anyone) is finding them > useful. > > Caroline Arms caar@loc.gov > Office of Strategic Initiatives > > On Fri, 13 Feb 2004, Jody DeRidder wrote: > >> Is anyone out there using set descriptions (optional tag in header)? >> If so, would you please send us a link to your repository/service >> provider so we could see it applied? >> >> We'd be grateful... >> >> --jody (for Anthony Smith) >> >> >> -- >> Jody DeRidder >> IT Administrator II >> Digital Library Center >> 648A John C. Hodges Library >> University of Tennessee >> Knoxville, TN 37996 >> >> Phone: (865) 974-4796 >> Email: deridder@aztec.lib.utk.edu >> >> _______________________________________________ >> OAI-implementers mailing list >> List information, archives, preferences and to unsubscribe: >> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers >> >> > > _______________________________________________ > OAI-implementers mailing list > List information, archives, preferences and to unsubscribe: > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers > ------------------- Kat Hagedorn OAIster/Metadata Harvesting Librarian DLXS Bibliographic Class Coordinator DLXS Text Class Co-coordinator Digital Library Production Service University of Michigan http://www.oaister.org/ http://www.dlxs.org/ email: khage@umich.edu phone: 734-615-7618 From liu_x@lanl.gov Mon Mar 8 19:57:25 2004 From: liu_x@lanl.gov (Xiaoming Liu) Date: Mon, 8 Mar 2004 12:57:25 -0700 (MST) Subject: [OAI-implementers] set description In-Reply-To: References: Message-ID: With one small program to survey data providers in OAI website and UIUC registry, I am able to generate following baseURLs with set description. http://129.252.51.52/OAI/WebOAI.aspx http://alcme.oclc.org/ndltd/servlet/OAIHandler http://csc000.cscaustria.at/oai/OAI.ASP http://conferences.arts.usyd.edu.au/oai/ http://cgi.vtt.fi/progs/inf/OAI http://dataprovider.ibict.br/mypoai/oai2.php http://gita.grainger.uiuc.edu/registry/px/oai.asp http://hbllmedia.lib.byu.edu/test/PhpOai2/oai/oai2.php http://memory.loc.gov/cgi-bin/oai2_0 http://oai.lib.duke.edu:8081/smc/servlet/OAIHandler http://oai.lib.msu.edu/oai/oai.cfm http://infomine.ucr.edu/cgi-bin/OAI-PMH-server http://ibiblio.org/oaibiblio/data/software/app/oai2.php http://infsearch.cs.cmu.edu/cgi-bin/oai.pl http://pkp.ubc.ca/harvester/oai/ http://rea.uninet.edu/ojs/oai/ http://publications.uu.se/portal/OAI http://services.nsdl.org:8080/nsdloai/OAI http://oai.ub.rub.de/oai/oai2.php http://wo.uio.no/as/WebObjects/theses.woa/wa/oai http://www.hcu.ox.ac.uk/ocs/oai/ http://www.aim25.ac.uk/cgi-bin/oai/OAI2.0 http://www.cis.unisa.edu.au/aiwsc03/ocs/ocs/oai/ http://www.entomotropica.org/ojs/oai/ http://www.husseinsspace.com/cgi-bin/VTOAI/hspics/hspics/oai.pl http://www.math.washington.edu/~ejpecp/oai/ http://www.math.washington.edu/~ejpecp/ECP/oai/ http://www.pkp.ubc.ca/harvester/oai/ http://www.pubmedcentral.gov/oai/oai.cgi regards, Xiaoming On Mon, 8 Mar 2004, Kat Hagedorn wrote: > Any further description of sets is useful for service providers. The > set name, being short, can't always completely describe the contents of > the set. More description helps us decide whether we want to harvest a > particular set. > > - Kat > > On Monday, Mar 8, 2004, at 13:42 America/Detroit, Caroline Arms wrote: > > > > > The Library of Congress is providing set descriptions, see > > http://memory.loc.gov/cgi-bin/oai2_0?verb=ListSets > > > > The descriptions use the oai_dc schema. They are essentially > > collection-level records for the underlying collection of content > > EXCEPT > > that I use "Records for " at the beginning of the title. When I > > wanted to > > do this, I could not find another pattern; you should not look at the > > practice you infer from them as authoritative in any way. > > > > I would be interested in knowing who (if anyone) is finding them > > useful. > > > > Caroline Arms caar@loc.gov > > Office of Strategic Initiatives > > > > On Fri, 13 Feb 2004, Jody DeRidder wrote: > > > >> Is anyone out there using set descriptions (optional tag in header)? > >> If so, would you please send us a link to your repository/service > >> provider so we could see it applied? > >> > >> We'd be grateful... > >> > >> --jody (for Anthony Smith) > >> > >> > >> -- > >> Jody DeRidder > >> IT Administrator II > >> Digital Library Center > >> 648A John C. Hodges Library > >> University of Tennessee > >> Knoxville, TN 37996 > >> > >> Phone: (865) 974-4796 > >> Email: deridder@aztec.lib.utk.edu > >> > >> _______________________________________________ > >> OAI-implementers mailing list > >> List information, archives, preferences and to unsubscribe: > >> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers > >> > >> > > > > _______________________________________________ > > OAI-implementers mailing list > > List information, archives, preferences and to unsubscribe: > > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers > > > ------------------- > Kat Hagedorn > OAIster/Metadata Harvesting Librarian > DLXS Bibliographic Class Coordinator > DLXS Text Class Co-coordinator > Digital Library Production Service > University of Michigan > > http://www.oaister.org/ > http://www.dlxs.org/ > email: khage@umich.edu > phone: 734-615-7618 > > _______________________________________________ > OAI-implementers mailing list > List information, archives, preferences and to unsubscribe: > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers > > From hussein@cs.uct.ac.za Tue Mar 9 05:10:26 2004 From: hussein@cs.uct.ac.za (Hussein Suleman) Date: Tue, 09 Mar 2004 07:10:26 +0200 Subject: [OAI-implementers] set description In-Reply-To: References: Message-ID: <404D51C2.60705@cs.uct.ac.za> following up on Kat's comment, and seeing that my personal website is on Xiaoming's listing ... try http://www.husseinsspace.com/cgi-bin/VTOAI/hspics/hspics/oai.pl?verb=ListSets i have a description for each set in my source metadata so i encode that in a simple DC record with only a description tag. the idea being that set descriptions are meant to help people understand the contents of sets (for the purposes of selective harvesting) so any human-understandable information you can provide is useful while at the same time there is probably no point in automatically generating non-textual/descriptive fields such as date. of course this is all arguable so feel free to vociferously disagree with me about the date or the human-understanding bits :) ttfn, ----hussein Xiaoming Liu wrote: > With one small program to survey data providers in OAI website and UIUC > registry, I am able to generate following baseURLs with set description. > > http://129.252.51.52/OAI/WebOAI.aspx > http://alcme.oclc.org/ndltd/servlet/OAIHandler > http://csc000.cscaustria.at/oai/OAI.ASP > http://conferences.arts.usyd.edu.au/oai/ > http://cgi.vtt.fi/progs/inf/OAI > http://dataprovider.ibict.br/mypoai/oai2.php > http://gita.grainger.uiuc.edu/registry/px/oai.asp > http://hbllmedia.lib.byu.edu/test/PhpOai2/oai/oai2.php > http://memory.loc.gov/cgi-bin/oai2_0 > http://oai.lib.duke.edu:8081/smc/servlet/OAIHandler > http://oai.lib.msu.edu/oai/oai.cfm > http://infomine.ucr.edu/cgi-bin/OAI-PMH-server > http://ibiblio.org/oaibiblio/data/software/app/oai2.php > http://infsearch.cs.cmu.edu/cgi-bin/oai.pl > http://pkp.ubc.ca/harvester/oai/ > http://rea.uninet.edu/ojs/oai/ > http://publications.uu.se/portal/OAI > http://services.nsdl.org:8080/nsdloai/OAI > http://oai.ub.rub.de/oai/oai2.php > http://wo.uio.no/as/WebObjects/theses.woa/wa/oai > http://www.hcu.ox.ac.uk/ocs/oai/ > http://www.aim25.ac.uk/cgi-bin/oai/OAI2.0 > http://www.cis.unisa.edu.au/aiwsc03/ocs/ocs/oai/ > http://www.entomotropica.org/ojs/oai/ > http://www.husseinsspace.com/cgi-bin/VTOAI/hspics/hspics/oai.pl > http://www.math.washington.edu/~ejpecp/oai/ > http://www.math.washington.edu/~ejpecp/ECP/oai/ > http://www.pkp.ubc.ca/harvester/oai/ > http://www.pubmedcentral.gov/oai/oai.cgi > > regards, > Xiaoming > > > On Mon, 8 Mar 2004, Kat Hagedorn wrote: > > >>Any further description of sets is useful for service providers. The >>set name, being short, can't always completely describe the contents of >>the set. More description helps us decide whether we want to harvest a >>particular set. >> >>- Kat >> >>On Monday, Mar 8, 2004, at 13:42 America/Detroit, Caroline Arms wrote: >> >> >>>The Library of Congress is providing set descriptions, see >>> http://memory.loc.gov/cgi-bin/oai2_0?verb=ListSets >>> >>>The descriptions use the oai_dc schema. They are essentially >>>collection-level records for the underlying collection of content >>>EXCEPT >>>that I use "Records for " at the beginning of the title. When I >>>wanted to >>>do this, I could not find another pattern; you should not look at the >>>practice you infer from them as authoritative in any way. >>> >>>I would be interested in knowing who (if anyone) is finding them >>>useful. >>> >>> Caroline Arms caar@loc.gov >>> Office of Strategic Initiatives >>> >>>On Fri, 13 Feb 2004, Jody DeRidder wrote: >>> >>> >>>>Is anyone out there using set descriptions (optional tag in header)? >>>> If so, would you please send us a link to your repository/service >>>>provider so we could see it applied? >>>> >>>> We'd be grateful... >>>> >>>> --jody (for Anthony Smith) >>>> >>>> >>>>-- >>>> Jody DeRidder >>>> IT Administrator II >>>> Digital Library Center >>>> 648A John C. Hodges Library >>>> University of Tennessee >>>> Knoxville, TN 37996 >>>> >>>> Phone: (865) 974-4796 >>>> Email: deridder@aztec.lib.utk.edu >>>> >>>>_______________________________________________ >>>>OAI-implementers mailing list >>>>List information, archives, preferences and to unsubscribe: >>>>http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers >>>> >>>> >>> >>>_______________________________________________ >>>OAI-implementers mailing list >>>List information, archives, preferences and to unsubscribe: >>>http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers >>> >> >>------------------- >>Kat Hagedorn >>OAIster/Metadata Harvesting Librarian >>DLXS Bibliographic Class Coordinator >>DLXS Text Class Co-coordinator >>Digital Library Production Service >>University of Michigan >> >>http://www.oaister.org/ >>http://www.dlxs.org/ >>email: khage@umich.edu >>phone: 734-615-7618 >> >>_______________________________________________ >>OAI-implementers mailing list >>List information, archives, preferences and to unsubscribe: >>http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers >> >> > > _______________________________________________ > OAI-implementers mailing list > List information, archives, preferences and to unsubscribe: > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers > -- ===================================================================== hussein suleman ~ hussein@cs.uct.ac.za ~ http://www.husseinsspace.com ===================================================================== From tajoli@cilea.it Tue Mar 9 09:24:28 2004 From: tajoli@cilea.it (Zeno Tajoli) Date: Tue, 09 Mar 2004 10:24:28 +0100 Subject: [OAI-implementers] Knowing you have harvested the whole collection Message-ID: <6.0.3.0.0.20040309102420.02488510@mail.cilea.it> Hi, >I was wondering if someone could tell me the most accurate way to ensure >my OAI harvesting program has harvested all of the records of a >repository. Is there any way to get some sort of collection or repository >metadata that holds a "total number of records" value? as I know there isn't a specific instruction. >Or, because oai_dc metadata records are the most common denominator (and >required for minimal OAI compliance?), can I assume that a request like >this: >http://services.nsdl.org:8080/nsdloai/OAI?verb=ListRecords&metadataPrefix=oai_dc > will give me the most complete set of records possible (once no more > resumptionTokens are available)? In my opinion yes. Bye Zeno Tajoli CILEA - Segrate (MI) tajoliAT_SPAM_no_prendiATcilea.it (Indirizzo mascherato anti-spam; sostituisci quanto tra AT con @) From deridder@aztec.lib.utk.edu Tue Mar 9 20:25:44 2004 From: deridder@aztec.lib.utk.edu (Jody DeRidder) Date: Tue, 9 Mar 2004 15:25:44 -0500 (EST) Subject: [OAI-implementers] set description In-Reply-To: <404D51C2.60705@cs.uct.ac.za> References: <404D51C2.60705@cs.uct.ac.za> Message-ID: <1640.160.36.192.134.1078863944.squirrel@kiva.lib.utk.edu> Thank you all for your help!! Xiaoming, we are checking out all those you listed-- and Caroline, we especially love what you are doing. We want to be able to harvest sets based on subject content, and then make those available to (first) our subject librarians. Eventually, we hope to set up a portal for use by students and researchers that will provide access to different collections (sets) based on general topical areas. I think I will bring this up at the next DLF developer's forum (as well as my hopes for authority control on subjects) where the topic is to be "improving the harvestability of digital content and metadata". Thanks again! --jody -- Jody DeRidder IT Administrator II Digital Library Center 648A John C. Hodges Library University of Tennessee Knoxville, TN 37996 Phone: (865) 974-4796 Email: deridder@aztec.lib.utk.edu From sshreeve@uiuc.edu Tue Mar 9 20:43:32 2004 From: sshreeve@uiuc.edu (Sarah L. Shreeves) Date: Tue, 09 Mar 2004 14:43:32 -0600 Subject: [OAI-implementers] set description In-Reply-To: <1640.160.36.192.134.1078863944.squirrel@kiva.lib.utk.edu> References: <404D51C2.60705@cs.uct.ac.za> <1640.160.36.192.134.1078863944.squirrel@kiva.lib.utk.edu> Message-ID: <6.0.1.1.2.20040309143732.02685040@express.cites.uiuc.edu> This has been mentioned before, but it might be useful to take a look at what the DC Collection Description Working Group is doing around collection description. I think that this work could have great applicability for set descriptions. See http://dublincore.org/groups/collections/. Sarah ----------------------------------------------------------------------------------------------- Sarah L. Shreeves Visiting Project Coordinator, IMLS Digital Collections and Content University of Illinois Library at Urbana-Champaign Phone: 217-244-7809 Fax: 217-244-7764 Email: sshreeve@uiuc.edu Web: http://imlsdcc.grainger.uiuc.edu At 02:25 PM 3/9/2004, Jody DeRidder wrote: >Thank you all for your help!! > Xiaoming, we are checking out all those you listed-- and >Caroline, we especially love what you are doing. We want to be >able to harvest sets based on subject content, and then make >those available to (first) our subject librarians. Eventually, >we hope to set up a portal for use by students and researchers >that will provide access to different collections (sets) based on >general topical areas. > I think I will bring this up at the next DLF developer's forum >(as well as my hopes for authority control on subjects) where the >topic is to be "improving the harvestability of digital content >and metadata". > > Thanks again! > > --jody > > >-- > Jody DeRidder > IT Administrator II > Digital Library Center > 648A John C. Hodges Library > University of Tennessee > Knoxville, TN 37996 > > Phone: (865) 974-4796 > Email: deridder@aztec.lib.utk.edu > >_______________________________________________ >OAI-implementers mailing list >List information, archives, preferences and to unsubscribe: >http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers From khage@umich.edu Wed Mar 10 20:16:43 2004 From: khage@umich.edu (Kat Hagedorn) Date: Wed, 10 Mar 2004 15:16:43 -0500 Subject: [OAI-implementers] static repository status Message-ID: Hello, Could someone tell me the status of the OAI Static Repository? I get a couple requests each week from people who don't have the resources to become full-fledged data providers. I would like to point them to a place where they can drop their XML files, but I gather this doesn't officially exist yet. Thanks, - Kat ------------------- Kat Hagedorn OAIster/Metadata Harvesting Librarian DLXS Bibliographic Class Coordinator DLXS Text Class Co-coordinator Digital Library Production Service University of Michigan http://www.oaister.org/ http://www.dlxs.org/ email: khage@umich.edu phone: 734-615-7618 From herbertv@lanl.gov Wed Mar 10 20:28:19 2004 From: herbertv@lanl.gov (herbert van de sompel) Date: Wed, 10 Mar 2004 13:28:19 -0700 Subject: [OAI-implementers] static repository status In-Reply-To: References: Message-ID: <404F7A63.60706@lanl.gov> Kat Hagedorn wrote: > Hello, > > Could someone tell me the status of the OAI Static Repository? I get a > couple requests each week from people who don't have the resources to > become full-fledged data providers. I would like to point them to a > place where they can drop their XML files, but I gather this doesn't > officially exist yet. > We are finalizing the Static Repository specification at this very moment. Feedback from the NSDL community suggested the inclusion of a mechanism to "unregister" a Static Repository from a Static Repository Gateway. We are looking into accomodating that need. When it comes to a "place to drop" XML files (we refer to that spot as a Static Repository Gateway): (1) There is LANL-created software available to create such spots (see http://srepod.sourceforge.net/ ) (2) LANL operates a demo version of such a spot (see http://libtest.lanl.gov/cgi-bin/gateway.cgi ). However, this is for demo purposes only; we have no intention of running a production version. I invite you to install and operate our Static Repository Gateway software at OAIster. many greetings herbert > Thanks, > - Kat > > ------------------- > Kat Hagedorn > OAIster/Metadata Harvesting Librarian > DLXS Bibliographic Class Coordinator > DLXS Text Class Co-coordinator > Digital Library Production Service > University of Michigan > > http://www.oaister.org/ > http://www.dlxs.org/ > email: khage@umich.edu > phone: 734-615-7618 > > _______________________________________________ > OAI-implementers mailing list > List information, archives, preferences and to unsubscribe: > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers > > -- Herbert Van de Sompel digital library research & prototyping Los Alamos National Laboratory - Research Library + 1 (505) 667 1267 / http://lib-www.lanl.gov/~herbertv/ "met gestreken jeans de dansvloer penetreren" From thabing@uiuc.edu Wed Mar 10 21:46:30 2004 From: thabing@uiuc.edu (Thomas G. Habing) Date: Wed, 10 Mar 2004 15:46:30 -0600 Subject: [OAI-implementers] static repository status In-Reply-To: References: Message-ID: <404F8CB6.3090508@uiuc.edu> Kat Hagedorn wrote: > Hello, > > Could someone tell me the status of the OAI Static Repository? I get a > couple requests each week from people who don't have the resources to > become full-fledged data providers. I would like to point them to a > place where they can drop their XML files, but I gather this doesn't > officially exist yet. > > Thanks, > - Kat > > ------------------- > Kat Hagedorn > OAIster/Metadata Harvesting Librarian > DLXS Bibliographic Class Coordinator > DLXS Text Class Co-coordinator > Digital Library Production Service > University of Michigan > > http://www.oaister.org/ > http://www.dlxs.org/ > email: khage@umich.edu > phone: 734-615-7618 > > _______________________________________________ > OAI-implementers mailing list > List information, archives, preferences and to unsubscribe: > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers > > Hi Kat, We also have an alpha/beta level implementation of an OAI Static Repository Gateway which is available on SourceForge: http://uilib-oai.sourceforge.net/ http://sourceforge.net/project/showfiles.php?group_id=47963&package_id=85826 It is implemented as an IIS Active Server Page (ASP) script. Currently it is being used to gateway one repository: http://imlsdcc.grainger.uiuc.edu/gateway/oai.asp/www.acnatsci.org/library/collections/imls/nlg/AcadNatSciStatic.xml?verb=Identify, However, we would be willing to act as a gateway for other collections. We are not currently responding to the 'initiate' request (Section 3.3 of the spec.), so if you have a collection you would like to be added, you should send me an email with the URL to the static XML file, and I will make sure it validates and add it to the gateway if it does. This is still an experimental implementation, so the usual caveats apply, but we do intend on keeping it running for the foreseeable future, probably with some downtime to introduce changes or fixes as the spec evolves. -- Thomas Habing Research Programmer, Digital Library Projects University of Illinois at Urbana-Champaign 155 Grainger Engineering Library Information Center, MC-274 thabing@uiuc.edu, (217) 244-4425 http://dli.grainger.uiuc.edu From chrish@athabascau.ca Wed Mar 10 22:58:38 2004 From: chrish@athabascau.ca (Chris Hubick) Date: Wed, 10 Mar 2004 15:58:38 -0700 Subject: [OAI-implementers] OAI-PMH + IEEE LTSC LOM Message-ID: <1078959517.21846.52.camel@edtech25.edtech.athabascau.ca> Hi. I have a preliminary implementation of OAI-PMH around our native IEEE LTSC LOM repository: http://adlibx.athabascau.ca/ADLib/OAI/?verb=Identify I provide both the required Dublin Core XML, as well as IEEE LOM XML: http://adlibx.athabascau.ca/ADLib/OAI/?verb=ListMetadataFormats You can retrieve Erik Duval's sample LOM record at: DC: http://adlibx.athabascau.ca/ADLib/OAI/?verb=GetRecord&metadataPrefix=oai_dc&identifier=urn:x-ims-plirid-v0:urn%3Ax-ims-plirid-v0%3Aadlib.athabascau.ca%3Asys%3Aadlibx%3A1 LOM: http://adlibx.athabascau.ca/ADLib/OAI/?verb=GetRecord&metadataPrefix=lom&identifier=urn:x-ims-plirid-v0:urn%3Ax-ims-plirid-v0%3Aadlib.athabascau.ca%3Asys%3Aadlibx%3A1 All responses validate against the appropriate schema's, and the repository passes all the tests on Virginia Tech's OAI Repository Explorer. I didn't know of any other OAI+LOM implementations to compare against, so I hope my implementation is sane (?). -- FYI: The human oriented interface to our repository is at: Athabasca Digital Library: http://adlib.athabascau.ca/ The implementation is written in Java, and provided as Free Software under the LGPL. This work is part of a larger project which includes Java interfaces for a representing a LOM record, and JAXB Marshallers for serializing to LOM, DC, and OAI XML. There is also a Java interface for a whole Repository, and service implementations (GUI, SOAP, OAI, HTTP, RSS, etc) built around that. It's a work in progress and documentation is currently minimal. You can find details at: http://adlib.athabascau.ca/~hubick/ Feedback/Comments appreciated. Thanks. -- Chris Hubick mailto:chrish@athabascau.ca mailto:chris@hubick.com phone:1-780-421-2533 (work) phone:1-780-721-9932 (cell) http://www.hubick.com/ __ This communication is intended for the use of the recipient to whom it is addressed, and may contain confidential, personal, and or privileged information. Please contact us immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communications received in error, or subsequent reply, should be deleted or destroyed. --- From hussein@cs.uct.ac.za Thu Mar 11 05:52:44 2004 From: hussein@cs.uct.ac.za (Hussein Suleman) Date: Thu, 11 Mar 2004 07:52:44 +0200 Subject: [OAI-implementers] OAI-PMH + IEEE LTSC LOM In-Reply-To: <1078959517.21846.52.camel@edtech25.edtech.athabascau.ca> References: <1078959517.21846.52.camel@edtech25.edtech.athabascau.ca> Message-ID: <404FFEAC.7020706@cs.uct.ac.za> hi for not exactly LOM, but the IMS metadata set (LOM + minor modifications) ... check the CSTC archive with baseURL http://www.cstc.org/cgi-bin/OAI/CSTC.pl its the older v1.1 PMH, but the XML encoding of records and IMS standard have not changed since then (i think). this mapping was set up and tested with the iLumina project, that uses the IMS metadata set internally. for an example record, try: http://www.cstc.org/cgi-bin/OAI/CSTC.pl?verb=GetRecord&metadataPrefix=ims1_2_1&identifier=oai:CSTC:60 in general, to find other implementations for a metadata standard, you can also use the UIUC registry (which is apparently not linked into the OAI website yet). if you go to: http://gita.grainger.uiuc.edu/registry/ListSchemas.asp you will be able to find all archives that support a particular metadata format. ttfn, ----hussein Chris Hubick wrote: > Hi. > > I have a preliminary implementation of OAI-PMH around our native IEEE > LTSC LOM repository: > http://adlibx.athabascau.ca/ADLib/OAI/?verb=Identify > > I provide both the required Dublin Core XML, as well as IEEE LOM XML: > http://adlibx.athabascau.ca/ADLib/OAI/?verb=ListMetadataFormats > > You can retrieve Erik Duval's sample LOM record at: > DC: http://adlibx.athabascau.ca/ADLib/OAI/?verb=GetRecord&metadataPrefix=oai_dc&identifier=urn:x-ims-plirid-v0:urn%3Ax-ims-plirid-v0%3Aadlib.athabascau.ca%3Asys%3Aadlibx%3A1 > LOM: http://adlibx.athabascau.ca/ADLib/OAI/?verb=GetRecord&metadataPrefix=lom&identifier=urn:x-ims-plirid-v0:urn%3Ax-ims-plirid-v0%3Aadlib.athabascau.ca%3Asys%3Aadlibx%3A1 > > All responses validate against the appropriate schema's, and the > repository passes all the tests on Virginia Tech's OAI Repository > Explorer. > > I didn't know of any other OAI+LOM implementations to compare against, > so I hope my implementation is sane (?). > > -- > FYI: > > The human oriented interface to our repository is at: > Athabasca Digital Library: http://adlib.athabascau.ca/ > > The implementation is written in Java, and provided as Free Software > under the LGPL. This work is part of a larger project which includes > Java interfaces for a representing a LOM record, and JAXB Marshallers > for serializing to LOM, DC, and OAI XML. There is also a Java interface > for a whole Repository, and service implementations (GUI, SOAP, OAI, > HTTP, RSS, etc) built around that. It's a work in progress and > documentation is currently minimal. You can find details at: > http://adlib.athabascau.ca/~hubick/ > > Feedback/Comments appreciated. Thanks. > -- ===================================================================== hussein suleman ~ hussein@cs.uct.ac.za ~ http://www.husseinsspace.com ===================================================================== From khage@umich.edu Thu Mar 11 14:49:56 2004 From: khage@umich.edu (Kat Hagedorn) Date: Thu, 11 Mar 2004 09:49:56 -0500 Subject: [OAI-implementers] static repository status In-Reply-To: <404F8CB6.3090508@uiuc.edu> Message-ID: <5CC61CAF-736B-11D8-B59D-0003934CA344@umich.edu> Thank you for the information, Herbert and Tom. The idea of implementing a Static Repository Gateway at UM is a great idea. We'll bring it up in discussions here soon. - Kat On Wednesday, Mar 10, 2004, at 16:46 America/Detroit, Thomas G. Habing wrote: > Kat Hagedorn wrote: > >> Hello, >> Could someone tell me the status of the OAI Static Repository? I get >> a couple requests each week from people who don't have the resources >> to become full-fledged data providers. I would like to point them to >> a place where they can drop their XML files, but I gather this >> doesn't officially exist yet. >> Thanks, >> - Kat >> ------------------- >> Kat Hagedorn >> OAIster/Metadata Harvesting Librarian >> DLXS Bibliographic Class Coordinator >> DLXS Text Class Co-coordinator >> Digital Library Production Service >> University of Michigan >> http://www.oaister.org/ >> http://www.dlxs.org/ >> email: khage@umich.edu >> phone: 734-615-7618 >> _______________________________________________ >> OAI-implementers mailing list >> List information, archives, preferences and to unsubscribe: >> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers > Hi Kat, > > We also have an alpha/beta level implementation of an OAI Static > Repository Gateway which is available on SourceForge: > > http://uilib-oai.sourceforge.net/ > http://sourceforge.net/project/ > showfiles.php?group_id=47963&package_id=85826 > > It is implemented as an IIS Active Server Page (ASP) script. > > Currently it is being used to gateway one repository: > > http://imlsdcc.grainger.uiuc.edu/gateway/oai.asp/www.acnatsci.org/ > library/collections/imls/nlg/AcadNatSciStatic.xml?verb=Identify, > > However, we would be willing to act as a gateway for other > collections. We are not currently responding to the 'initiate' > request (Section 3.3 of the spec.), so if you have a collection you > would like to be added, you should send me an email with the URL to > the static XML file, and I will make sure it validates and add it to > the gateway if it does. > > This is still an experimental implementation, so the usual caveats > apply, but we do intend on keeping it running for the foreseeable > future, probably with some downtime to introduce changes or fixes as > the spec evolves. > > -- > Thomas Habing > Research Programmer, Digital Library Projects > University of Illinois at Urbana-Champaign > 155 Grainger Engineering Library Information Center, MC-274 > thabing@uiuc.edu, (217) 244-4425 > http://dli.grainger.uiuc.edu > _______________________________________________ > OAI-implementers mailing list > List information, archives, preferences and to unsubscribe: > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers From philb@icbl.hw.ac.uk Thu Mar 11 17:58:31 2004 From: philb@icbl.hw.ac.uk (Phil Barker) Date: Thu, 11 Mar 2004 17:58:31 +0000 Subject: [OAI-implementers] OAI-PMH + IEEE LTSC LOM In-Reply-To: <1078959517.21846.52.camel@edtech25.edtech.athabascau.ca> References: <1078959517.21846.52.camel@edtech25.edtech.athabascau.ca> Message-ID: <4050A8C7.2080802@icbl.hw.ac.uk> Chris Hubick wrote: > I didn't know of any other OAI+LOM implementations to compare against, > so I hope my implementation is sane (?). > There's a fair amount of work going on the UK related to this, it goes under the catchy title of the RDN/LTSN Interoperability Project, http://www.ltsn.ac.uk/genericcentre/interop/ . I don't think anyone has yet gone public with a harvester, but I know folk are working on them and I've forwarded your message to them. An article about a loosely related earlier implementation can be found at http://www.ariadne.ac.uk/issue34/powell/intro.html , and there'll be another in the next issue of Ariadne. Phil -- Phil Barker Learning Technology Adviser ICBL, School of Mathematical and Computer Sciences Mountbatten Building, Heriot-Watt University, Edinburgh, EH14 4AS Tel: 0131 451 3278 Fax: 0131 451 3327 Web: http://www.icbl.hw.ac.uk/~philb/ From chrish@athabascau.ca Thu Mar 11 19:05:31 2004 From: chrish@athabascau.ca (Chris Hubick) Date: Thu, 11 Mar 2004 12:05:31 -0700 Subject: [OAI-implementers] Identifiers [was: Re: OAI-PMH + IEEE LTSC LOM] In-Reply-To: <404FFEAC.7020706@cs.uct.ac.za> References: <1078959517.21846.52.camel@edtech25.edtech.athabascau.ca> <404FFEAC.7020706@cs.uct.ac.za> Message-ID: <1079031930.22737.56.camel@edtech25.edtech.athabascau.ca> On Wed, 2004-03-10 at 22:52, Hussein Suleman wrote: > for not exactly LOM, but the IMS metadata set (LOM + minor > modifications) ... check the CSTC archive with baseURL > http://www.cstc.org/cgi-bin/OAI/CSTC.pl Hi again :) This, and an email Kat Hagedorn sent me off list (hi Kat), reminds me of a big question about identifiers... As you may know, LOM identifiers are catalog/entry *pairs*. That is to say, the entry is namespaced by it's catalog - the repository could conceivably have two different records with the same identifier entry in different catalogs. However, OAI and Dublin Core, and RSS, etc, use a *single* string as an identifier. In a repository that harvests from a number of different systems through a variety of protocols, and has identifiers from many catalog types (not necessarily URI's)... How does one map an arbitrary catalog/entry *pair*, to a *single* identifier string? My answer was to use a URN: 'urn:' + + ':' + Note: 1) Identifiers used in OAI messages must be URI's. 2) In the OAI Identifier format ('oai:'), the namespace ID must be a domain name. The repository Hussein linked violates this. 3) The LOM/RDF stuff seems to expect all people to use 'URI' as a catalog in their LOM data (?). My runner up was a Universal Name: '{' + + '}' + That notation was invented by James Clark http://www.jclark.com/xml/xmlns.htm ), but the URI req killed that idea. Ideally, the LOM to Dublin Core mapping in Appendix B of the IEEE LTSC LOM spec would have set up a practice this, but alas, it does not. Has anyone else tackled this problem? Thanks. -- Chris Hubick mailto:chrish@athabascau.ca mailto:chris@hubick.com phone:1-780-421-2533 (work) phone:1-780-721-9932 (cell) http://www.hubick.com/ __ This communication is intended for the use of the recipient to whom it is addressed, and may contain confidential, personal, and or privileged information. Please contact us immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communications received in error, or subsequent reply, should be deleted or destroyed. --- From a.powell@ukoln.ac.uk Thu Mar 11 23:58:03 2004 From: a.powell@ukoln.ac.uk (Andy Powell) Date: Thu, 11 Mar 2004 23:58:03 +0000 (GMT Standard Time) Subject: [OAI-implementers] Identifiers [was: Re: OAI-PMH + IEEE LTSC LOM] In-Reply-To: <1079031930.22737.56.camel@edtech25.edtech.athabascau.ca> References: <1078959517.21846.52.camel@edtech25.edtech.athabascau.ca> <404FFEAC.7020706@cs.uct.ac.za> <1079031930.22737.56.camel@edtech25.edtech.athabascau.ca> Message-ID: On Thu, 11 Mar 2004, Chris Hubick wrote: > In a repository that harvests from a number of different systems through > a variety of protocols, and has identifiers from many catalog types (not > necessarily URI's)... > > How does one map an arbitrary catalog/entry *pair*, to a *single* > identifier string? > > My answer was to use a URN: > > 'urn:' + + ':' + One problem with this approach is that there is presumably very little consistency across services in the way that 'catalog' is assigned - i.e. the 'catalog' is not taken from a controlled vocabulary. So although you end up with a single single string identifier (the URN) you don't really have a mechanism for reliably comparing URNs from different sources. It seems to me that the 'catalog'/'entry' pairing in LOM is a bit broken - because it really requires a global registry of 'catalog' names to work properly. (At least, without a global registry I can have no way of knowing if your 'catalog' is the same as my 'catalog'). URIs already provide a global space within which new identifier schemes can be created - why not use it, rather than building a LOM-specific registry. In partricular, the proposed 'info' URI scheme http://info-uri.info/registry/docs/misc/faq.html provides an open mechanisn for assigning URIs to information assets that have identifiers in public namespaces but have no representation within URI space. > Has anyone else tackled this problem? Not really, but you might be interested in Guidelines for encoding identifiers in Dublin Core and IEEE LOM metadata http://www.ukoln.ac.uk/metadata/dcmi-ieee/identifiers/ which basically suggests that URIs should *always* be used. Andy -- Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK http://www.ukoln.ac.uk/ukoln/staff/a.powell +44 1225 383933 Resource Discovery Network http://www.rdn.ac.uk/ From chrish@athabascau.ca Fri Mar 12 00:40:19 2004 From: chrish@athabascau.ca (Chris Hubick) Date: Thu, 11 Mar 2004 17:40:19 -0700 Subject: [OAI-implementers] Re: Identifiers In-Reply-To: References: <1078959517.21846.52.camel@edtech25.edtech.athabascau.ca> <404FFEAC.7020706@cs.uct.ac.za> <1079031930.22737.56.camel@edtech25.edtech.athabascau.ca> Message-ID: <1079052018.22737.286.camel@edtech25.edtech.athabascau.ca> On Thu, 2004-03-11 at 16:58, Andy Powell wrote: > On Thu, 11 Mar 2004, Chris Hubick wrote: > > > In a repository that harvests from a number of different systems through > > a variety of protocols, and has identifiers from many catalog types (not > > necessarily URI's)... > > > > How does one map an arbitrary catalog/entry *pair*, to a *single* > > identifier string? > > > > My answer was to use a URN: > > > > 'urn:' + + ':' + > > It seems to me that the 'catalog'/'entry' pairing in LOM is a bit broken > - because it really requires a global registry of 'catalog' names to work > properly. (At least, without a global registry I can have no way of > knowing if your 'catalog' is the same as my 'catalog'). URIs already > provide a global space within which new identifier schemes can be created > - why not use it, rather than building a LOM-specific registry. First, can we assume, for the sake of my problem discussion, that all those who create some new identifier format do in fact manage to choose a truly unique catalog name, just as if there were in fact a registry (that's a tangental discussion :). > In partricular, the proposed 'info' URI scheme > > http://info-uri.info/registry/docs/misc/faq.html > > provides an open mechanisn for assigning URIs to information assets that > have identifiers in public namespaces but have no representation within > URI space. Oooh, that's new, thanks for that :) Ok, so, if I use that, then the algorithm would be: IF (LOM.Identifier.Catalog == 'URI') THEN export LOM.Identifier.Entry unmodified ELSE export info URI as: 'info:' + + '/' + I will read more about these info URI's to see if that's a valid use (?). > > Has anyone else tackled this problem? > > Not really, but you might be interested in > > Guidelines for encoding identifiers in Dublin Core and IEEE LOM metadata > http://www.ukoln.ac.uk/metadata/dcmi-ieee/identifiers/ > > which basically suggests that URIs should *always* be used. Hrm, that's interesting too, thanks. Though it's basically the reverse problem. One thing that does bring to light is using 'URI' as your LOM Catalog whenever your entries are in URI format. I have been using our URN's Namespace Identifier (NID) as our Catalog, when I perhaps should be using 'URI' instead (or the more specific 'URN')? Thanks! -- Chris Hubick mailto:chrish@athabascau.ca mailto:chris@hubick.com phone:1-780-421-2533 (work) phone:1-780-721-9932 (cell) http://www.hubick.com/ __ This communication is intended for the use of the recipient to whom it is addressed, and may contain confidential, personal, and or privileged information. Please contact us immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communications received in error, or subsequent reply, should be deleted or destroyed. --- From adam.cooper@fdlearning.com" [relates to OAI-implementers digest, Vol 1 #426 - 5 msgs] A number of IMS specs have played with this 2 part/1part problem. RDCEO (Reusable Definition of Competences and Educational Objectives) talks about URN scheme, and I think we had assumed that the catalog would map to the NSS and the entry to the NID. We also considered URL#fragment identifier , where the fragment identifier was the entry part. The RDCEO binding actually used a single identifier string, whereas the information model DID follow LOM practice. Exactly what the significance of LOM catalog might be is probably another question, and one that is intentionally open I think. Is it necessarily more than an _indicator_ of the creator of the identifier? Adam From a.powell@ukoln.ac.uk Wed Mar 17 13:28:29 2004 From: a.powell@ukoln.ac.uk (Andy Powell) Date: Wed, 17 Mar 2004 13:28:29 +0000 (GMT Standard Time) Subject: [OAI-implementers] Automatically gathering the full-text of eprints Message-ID: The JISC-funded ePrints UK project has a requirement to automatically harvest both metadata and full-text from the eprint archives within UK academia (and potentially elsewhere). This is so that we can pass both metadata and full-text to the various 'enhancement' Web services offered by our partners. http://www.rdn.ac.uk/projects/eprints-uk/ In order for our harvesting robot to be able to do this, it must be able to reliably (and automatically) determine the correct URL(s) for the various full-text manifestation(s) (HTML, PDF, RTF, etc.) of each eprint. Our "Using simple Dublin Core to describe eprints" guidelines are intended to encourage greater consistency in the metadata that is exposed by eprint archives using the 'oai_dc' format within the OAI Protocol for Metadata Harvesting (OAI-PMH). Somewhat perversely, because we stick rigidly to the semantics of the DC element set, our guidelines make determining the URL of each manifestation that is available quite difficult. (This is largely a consequence of the 'simple' nature of 'simple DC'!). In general, the URL in the element of the oai_dc record is the URL of a jump-off page, rather than a direct link to the full-text. We would like to suggest a new proposal for unambiguously embedding the URL for each manifestation of an eprint into the (X)HTML jump-off page for that eprint. Since the jump-off page is generated automatically by the eprint archive software, doing this shouldn't be too difficult (in fact, we would hope that archive software, such as eprints.org, will be configured to do this out of the box). If this proposal is adopted, it will make it much easier to write OAI service provider software that can reliably gather the full-text of an eprint, given only the oai_dc record for that eprint. The proposal is at http://www.rdn.ac.uk/projects/eprints-uk/docs/encoding-fulltext-links/ Comments are welcome, Andy -- Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933 Resource Discovery Network http://www.rdn.ac.uk/ ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/ From herbertv@lanl.gov Wed Mar 17 19:19:39 2004 From: herbertv@lanl.gov (herbert van de sompel) Date: Wed, 17 Mar 2004 12:19:39 -0700 Subject: [OAI-implementers] Automatically gathering the full-text of eprints In-Reply-To: References: Message-ID: <4058A4CB.6040200@lanl.gov> Dear Andy, The problem of service providers needing access to content in addition to metadata has come up in many discussions, lately, including in the realm of the DARE, DINI, JISC, DSpace, Fedora, etc work. It so happens that my team in Los Alamos has recently done quite some work in this realm, as is illustrated by the most recent papers listed on my personal web site. Here is some initial feedback to the proposal. The proposal relies on: a. The assumption that a harvester knows that something that is in the dc.identifier element of oai_dc points to a - compliant - jump-off page. There are two problems with this assumption: - lots of things can be in the dc.identifier element both resolvable and unresolvable - lots of things at the end of the thing identified by the content of dc.identifier (if resolvable) will not be compliant jump-off pages This means harvesters never really know when they are facing the scenario that you target, and hence will do a lot of meaningless dererferencing and parsing. One could think of addressing this to some extent by a special-purpose Descriptor in the Identify response to indicate that a repository actually is 'compliant' but that would still leave the harvester guessing about which of the dc.identifiers (if there are multiple) is the magic one. b. The actual existence of a 'jump-off' page. This is something that - in the context of the OAI-PMH (with its disconnection of DP and SP) we can not just take for granted or assume. There are other problems related to obtaining content which are not covered by the solution: * How does a harvester know when to go after an update to content? The OAI-PMH indicates that the datestamp of a record only changes when the metadata has changed, it doesn't say anything about the content. I suggest it should stay that way. So, in the proposed solution, content in a repo can change without the harvester ever knowing about it. * The scenario as described in the propsoal, in which a single metadata record corresponds with a single "preprint" is only a special case of - future - reality. Increasingly, objects held in and described by repositories will be "compound" or "complex", i.e. consisting of multiple datastreams, not just a single "preprint". I find that it would be desirable that a solution to get to the content would be able to handle such situations. The proposed solution could actually accomodate such 'compound' objects, because the mutliple datastreams are linked off the jump-off page. There is, however, a problem. Let's presume we have a situation in which an object is deposited in an institutional repository that has 2 datastreams, each of which actually has a unique identifier, say a doi or something. Thinking of a - future - self-archiving scenario and the trend to accord identifiers at finer levels of granularity, this is not unlikely at all. Now we get 3 things in dc.identifier (2 doi's and a link to a jump-off page), and 2 things in the jump-off page (links to the 2 datastreams). How do I know which doi goes with which datastream? Information that - I hope we will all agree - is rather significant. OK. The point I am trying to make is that the described scenario and its more general problem domain (beyond eprints, and into the realm of objects with multiple datastreams) may call for another approach. Our research has shown that such an approach can remain 100% OAI-PMH-based if a complex object format such as METS, MPEG-21 DIDL or SCORM is used. These formats can be "parallel" OAI-PMH "metadata formats" through which harvesters can get to the content without running into issues such as the ones mentioned above. Content can be embedded in the XML wrappers or pointed at by them. Identifiers can be unambiguously connected to content. If content changes, the datstamp of the "conplex" record changes. I anticipate concerns re the overhead of introducing a solution based on a complex object format. At this point, I would like to say 2 things with this respect: * It took 2 people on my team about 2 days to create a prototype plug-in that enables OAI-PMH harvesting of content from DSpace repositories. Our plug-in rendered content using the MPEG-21 DIDL XML wrapper format. Most of the time invested in this plug-in was spent figuring out the DSpace API and a sensible way to map the DSpace data model to the DIDL data model. The prototype was demonstrated at the DSpace federation meeting, last week. Although questions/issues did arise in the course of our work, non seemed unsolvable. But it is my impression that the very fast delivery of a prototype indicates the feasibility of the complex format approach. * I would personally be very willing to spend time with the apporpiate representatives of the community - including yourself - to work towards a solution that is future-proof and provides adequate guarantees regarding perceived requirements of a content-harvesting solution. I would actually prefer that over going for a solution which is attractive at first glance because of its obvious simplicity, but which seems to raise some relevant questions upon closer inspection. To end, I would like to thank you for bringing this topic to the list. I have had many private email exchanges over the last few months especially with representatives from DARE and DINI about this and related problem domains. I hope that your mail can be another impulse towards a joint action in this realm. The problem is very real, and I would love our community to jointly create a really good solution to it. many greetings herbert Andy Powell wrote: > The JISC-funded ePrints UK project has a requirement to automatically > harvest both metadata and full-text from the eprint archives within UK > academia (and potentially elsewhere). This is so that we can pass both > metadata and full-text to the various 'enhancement' Web services offered > by our partners. > > http://www.rdn.ac.uk/projects/eprints-uk/ > > In order for our harvesting robot to be able to do this, it must be able > to reliably (and automatically) determine the correct URL(s) for the > various full-text manifestation(s) (HTML, PDF, RTF, etc.) of each eprint. > > Our "Using simple Dublin Core to describe eprints" guidelines are intended > to encourage greater consistency in the metadata that is exposed by eprint > archives using the 'oai_dc' format within the OAI Protocol for Metadata > Harvesting (OAI-PMH). Somewhat perversely, because we stick rigidly to > the semantics of the DC element set, our guidelines make determining the > URL of each manifestation that is available quite difficult. (This is > largely a consequence of the 'simple' nature of 'simple DC'!). In > general, the URL in the element of the oai_dc record is > the URL of a jump-off page, rather than a direct link to the full-text. > > We would like to suggest a new proposal for unambiguously embedding the > URL for each manifestation of an eprint into the (X)HTML jump-off page for > that eprint. Since the jump-off page is generated automatically by the > eprint archive software, doing this shouldn't be too difficult (in fact, > we would hope that archive software, such as eprints.org, will be > configured to do this out of the box). > > If this proposal is adopted, it will make it much easier to write OAI > service provider software that can reliably gather the full-text of an > eprint, given only the oai_dc record for that eprint. > > The proposal is at > > http://www.rdn.ac.uk/projects/eprints-uk/docs/encoding-fulltext-links/ > > Comments are welcome, > > Andy > -- > Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK > http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933 > Resource Discovery Network http://www.rdn.ac.uk/ > ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/ > _______________________________________________ > OAI-implementers mailing list > List information, archives, preferences and to unsubscribe: > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers > > -- Herbert Van de Sompel digital library research & prototyping Los Alamos National Laboratory - Research Library + 1 (505) 667 1267 / http://lib-www.lanl.gov/~herbertv/ "met gestreken jeans de dansvloer penetreren" From tdb01r@ecs.soton.ac.uk Wed Mar 17 21:17:54 2004 From: tdb01r@ecs.soton.ac.uk (Tim Brody) Date: Wed, 17 Mar 2004 21:17:54 +0000 Subject: [OAI-implementers] Automatically gathering the full-text of eprints In-Reply-To: References: Message-ID: <4058C082.3000001@ecs.soton.ac.uk> We've done a preliminary implementation of this at Southampton for: eprints.ecs.soton.ac.uk and eprints.soton.ac.uk It took me about an hour to do, I suspect Chris did it in much less time :-) All the best, Tim. Andy Powell wrote: > The JISC-funded ePrints UK project has a requirement to automatically > harvest both metadata and full-text from the eprint archives within UK > academia (and potentially elsewhere). This is so that we can pass both > metadata and full-text to the various 'enhancement' Web services offered > by our partners. > > http://www.rdn.ac.uk/projects/eprints-uk/ > > In order for our harvesting robot to be able to do this, it must be able > to reliably (and automatically) determine the correct URL(s) for the > various full-text manifestation(s) (HTML, PDF, RTF, etc.) of each eprint. > > Our "Using simple Dublin Core to describe eprints" guidelines are intended > to encourage greater consistency in the metadata that is exposed by eprint > archives using the 'oai_dc' format within the OAI Protocol for Metadata > Harvesting (OAI-PMH). Somewhat perversely, because we stick rigidly to > the semantics of the DC element set, our guidelines make determining the > URL of each manifestation that is available quite difficult. (This is > largely a consequence of the 'simple' nature of 'simple DC'!). In > general, the URL in the element of the oai_dc record is > the URL of a jump-off page, rather than a direct link to the full-text. > > We would like to suggest a new proposal for unambiguously embedding the > URL for each manifestation of an eprint into the (X)HTML jump-off page for > that eprint. Since the jump-off page is generated automatically by the > eprint archive software, doing this shouldn't be too difficult (in fact, > we would hope that archive software, such as eprints.org, will be > configured to do this out of the box). > > If this proposal is adopted, it will make it much easier to write OAI > service provider software that can reliably gather the full-text of an > eprint, given only the oai_dc record for that eprint. > > The proposal is at > > http://www.rdn.ac.uk/projects/eprints-uk/docs/encoding-fulltext-links/ > > Comments are welcome, > > Andy > -- > Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK > http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933 > Resource Discovery Network http://www.rdn.ac.uk/ > ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/ > _______________________________________________ > OAI-implementers mailing list > List information, archives, preferences and to unsubscribe: > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers > > From herbertv@lanl.gov Wed Mar 17 22:54:55 2004 From: herbertv@lanl.gov (herbert van de sompel) Date: Wed, 17 Mar 2004 15:54:55 -0700 Subject: [OAI-implementers] Automatically gathering the full-text of eprints In-Reply-To: <4058C082.3000001@ecs.soton.ac.uk> References: <4058C082.3000001@ecs.soton.ac.uk> Message-ID: <4058D73F.9030203@lanl.gov> Tim Brody wrote: > We've done a preliminary implementation of this at Southampton for: > eprints.ecs.soton.ac.uk > and > eprints.soton.ac.uk > > It took me about an hour to do, I suspect Chris did it in much less time > :-) > I trust that the amount of seconds it takes to implement a solution is not the only evaluation criterion. I very much agree it is an important one, and it is one that has always played a significant role in designing the OAI-PMH and related specifications. But it seems to me that there are other criteria such as meeting functional requirements that play. I have, obviously, not seen the list of requirements. I do understand the goals, however. And, as described in my previous mail, I can think of some possible requirements related to those goals that may not be met by the proposed solution. This consideration clearly allows for alternative solutions to the problem than the one based on complex objects, which I described in my previous mail. I suggested the complex object path because we have done quite some work in that realm, and because that work has urged us to think in a general way about the content-harvesting problem. cheers herbert > All the best, > Tim. > > Andy Powell wrote: > >> The JISC-funded ePrints UK project has a requirement to automatically >> harvest both metadata and full-text from the eprint archives within UK >> academia (and potentially elsewhere). This is so that we can pass both >> metadata and full-text to the various 'enhancement' Web services offered >> by our partners. >> >> http://www.rdn.ac.uk/projects/eprints-uk/ >> >> In order for our harvesting robot to be able to do this, it must be able >> to reliably (and automatically) determine the correct URL(s) for the >> various full-text manifestation(s) (HTML, PDF, RTF, etc.) of each eprint. >> >> Our "Using simple Dublin Core to describe eprints" guidelines are >> intended >> to encourage greater consistency in the metadata that is exposed by >> eprint >> archives using the 'oai_dc' format within the OAI Protocol for Metadata >> Harvesting (OAI-PMH). Somewhat perversely, because we stick rigidly to >> the semantics of the DC element set, our guidelines make determining the >> URL of each manifestation that is available quite difficult. (This is >> largely a consequence of the 'simple' nature of 'simple DC'!). In >> general, the URL in the element of the oai_dc record is >> the URL of a jump-off page, rather than a direct link to the full-text. >> >> We would like to suggest a new proposal for unambiguously embedding the >> URL for each manifestation of an eprint into the (X)HTML jump-off page >> for >> that eprint. Since the jump-off page is generated automatically by the >> eprint archive software, doing this shouldn't be too difficult (in fact, >> we would hope that archive software, such as eprints.org, will be >> configured to do this out of the box). >> >> If this proposal is adopted, it will make it much easier to write OAI >> service provider software that can reliably gather the full-text of an >> eprint, given only the oai_dc record for that eprint. >> >> The proposal is at >> >> http://www.rdn.ac.uk/projects/eprints-uk/docs/encoding-fulltext-links/ >> >> Comments are welcome, >> >> Andy >> -- >> Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK >> http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933 >> Resource Discovery Network http://www.rdn.ac.uk/ >> ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/ >> _______________________________________________ >> OAI-implementers mailing list >> List information, archives, preferences and to unsubscribe: >> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers >> >> > > > _______________________________________________ > OAI-implementers mailing list > List information, archives, preferences and to unsubscribe: > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers > > -- Herbert Van de Sompel digital library research & prototyping Los Alamos National Laboratory - Research Library + 1 (505) 667 1267 / http://lib-www.lanl.gov/~herbertv/ "met gestreken jeans de dansvloer penetreren" From chrish@athabascau.ca Thu Mar 18 21:28:18 2004 From: chrish@athabascau.ca (Chris Hubick) Date: Thu, 18 Mar 2004 14:28:18 -0700 Subject: [OAI-implementers] Identifiers (catalog/entry) In-Reply-To: <01C40B5B.2898AB80.adam.cooper@fdlearning.com> References: <01C40B5B.2898AB80.adam.cooper@fdlearning.com> Message-ID: <1079645297.8376.55.camel@edtech25.edtech.athabascau.ca> On Tue, 2004-03-16 at 06:32, Adam Cooper wrote: > A number of IMS specs have played with this 2 part/1part problem. RDCEO > (Reusable Definition of Competences and Educational Objectives) talks about > URN scheme, and I think we had assumed that the catalog would map to the > NSS and the entry to the NID. Hrm, that is what I have been doing up until now, but I am coming to think I was wrong... > Exactly what the significance of LOM catalog might be is probably another > question, and one that is intentionally open I think. Is it necessarily > more than an _indicator_ of the creator of the identifier? I don't think it is open, the LOM spec says the catalog is "A namespace scheme". This would, at least to me, clearly indicate that the entry is namespaced by it's catalog. Any other interpretation would lead to much greater problems, in that many people simply use increasing integer numbers to identify their metadata records, and without those numbers being namespaced by the catalog, we would have *many* collisions. We basically have a 'three level' system. It is up to all those people sharing any particular LOM catalog to guarantee uniqueness within *that* catalog. For those (most) of us who share the 'URI' catalog, we have the same uniqueness requirement, which we use the NID to satisfy. This pushes it down a level, where all those people sharing any particular NID must also guarantee uniqueness within that NID. A system like 'oai' uses DNS names to do this. IMHO, three levels is overly complex. Yes, we could all run off and use whatever format entries we like, namespaced by our catalog (which is pretty much what people have done up until recently). In an ideal world, however, we would remove this extra level by all using the same catalog, and partition within that ourselves. The URI system gives us that catalog, partitioned by NID. All those using URI's have done the extra work in agreeing to share a common syntax and associated facilities for partioning the 'URI' namespace. By not using a 'URI' catalog, you basically mitigate that effort by making the fact you are actually using URI format entries *opaque* to others from a LOM perspective. I think LOM might have been better off to do as OAI has done and just force everyone to uncode their id's as URI's, rather than having a separate catalog field, but they didn't, so here we are. -- Chris Hubick mailto:chrish@athabascau.ca mailto:chris@hubick.com phone:1-780-421-2533 (work) phone:1-780-721-9932 (cell) http://www.hubick.com/ __ This communication is intended for the use of the recipient to whom it is addressed, and may contain confidential, personal, and or privileged information. Please contact us immediately if you are not the intended recipient of this communication, and do not copy, distribute, or take action relying on it. Any communications received in error, or subsequent reply, should be deleted or destroyed. --- From herbertv@lanl.gov Thu Mar 18 21:51:45 2004 From: herbertv@lanl.gov (herbert van de sompel) Date: Thu, 18 Mar 2004 14:51:45 -0700 Subject: [OAI-implementers] Identifiers (catalog/entry) In-Reply-To: <1079645297.8376.55.camel@edtech25.edtech.athabascau.ca> References: <01C40B5B.2898AB80.adam.cooper@fdlearning.com> <1079645297.8376.55.camel@edtech25.edtech.athabascau.ca> Message-ID: <405A19F1.6090100@lanl.gov> Chris Hubick wrote: > I think LOM might have been better off to do as OAI has done and just > force everyone to uncode their id's as URI's, rather than having a > separate catalog field, but they didn't, so here we are. > You may still be able to talk in URI terms about objects with these ids by using the info URI scheme. See http://info-uri.info/ . herbert -- Herbert Van de Sompel digital library research & prototyping Los Alamos National Laboratory - Research Library + 1 (505) 667 1267 / http://lib-www.lanl.gov/~herbertv/ "met gestreken jeans de dansvloer penetreren" From a.powell@ukoln.ac.uk Fri Mar 19 16:51:02 2004 From: a.powell@ukoln.ac.uk (Andy Powell) Date: Fri, 19 Mar 2004 16:51:02 +0000 (GMT Standard Time) Subject: [OAI-implementers] Automatically gathering the full-text of eprints In-Reply-To: <4058A4CB.6040200@lanl.gov> References: <4058A4CB.6040200@lanl.gov> Message-ID: On Wed, 17 Mar 2004, herbert van de sompel wrote: > a. The assumption that a harvester knows that something that is in the > dc.identifier element of oai_dc points to a - compliant - jump-off page. There > are two problems with this assumption: > - lots of things can be in the dc.identifier element both resolvable and > unresolvable > - lots of things at the end of the thing identified by the content of > dc.identifier (if resolvable) will not be compliant jump-off pages Herbert, thanks for the email. Yes, I completely agree with this analysis. The proposal is a bit of a hack, and we should perhaps have made this clearer in the document! However, I think it is a useful hack :-). Particularly so in the context of our other recommendations for using simple DC to describe eprints. Furthermore, I would see it as good practice to embed XHTML elements into jump-off pages anyway - irrespective of whether the intention is to ease harvesting by robots or not. So I certainly don't see our proposal as causing any harm. The rest of your email raises some quite significant issues - some of which I suspect are not very easy to discuss by email. I don't propose giving a detailed response here, but I would like to note a few issues for consideration... Firstly, your comments about the complexity of the objects being described only goes part-way to describing the problem. The OAI-PMH specification, rightly, says very little about the nature of the resources that are described by the records exchanged using the protocol. However, particular applications of the protocol do need to be clear about the nature of the resources being described. Furthermore, the complexity of the problem is not just about whether the resources being described are aggregations of multiple objects. Part of the complexity arises because those those resources/objects fit into a model of the real world that spans both 'conceptual' works and specific digital or physical 'manifestations' of those conceptual works. Does the oai_dc record that I allow you to harvest describe a conceptual work (or expression of a work), an article for example, or does it describe one of the particular manifestations of that work, the PDF copy of the article for example? You'll note that I am intentionally using terms from the IFLA FRBR (Functional Requirements for Bibliographic Records) model here. In our guidelines for using simple DC to describe eprints we made the explicit decision to reflect the fact that most implementations of eprint archives (that we looked at) appeared to be configured to expose oai_dc metadata about the 'work' rather than about the particular manifestations of the work (though actually, in many cases (even in our own guidelines to a certain extent) there is a certain amount of fuzziness going on!). Unfortunately, there is no real way of indicating in a simple DC record that the work (as opposed to the manifestation) is being described - this would be difficult even in qualified DC currently, because the current DCMI Type vocabulary doesn't allow us to make those distinctions. But, in principle, the DC model is rich enough to handle this complexity - if we are prepared to put the effort in to agree how to do it. But the situation is even more complex than that because it is not clear to me where OAI resources and records sit within the Web architectural model of 'resources' and 'representations'. My suspicion is that the FRBR 'manifestation' is the equivalent of the Web architecture 'represresentation' of the FRBR 'work' (if you see what I mean!). The oai_dc record (and indeed the jump-off page) is a 'representation' of the 'work' (assuming that is what is being described). But at this point we almost certainly need a diagram or two! :-( OK, so on then to the question about whether the protocol can and/or should be used to exchange 'resources' as well as 'metadata' about 'resources'. The protocol spec is very explicit in differentiating 'resources' from 'items' and 'records' and makes it very clear that the protocol be used to exchange 'metadata' between services - I'm thinking of section 2.2 in particular. Now, with hindsight, I really wish we'd talked instead about 'resources' and 'representations' rather than resources, items, records and metadata, because that would have given us much more flexibility about what we do with the protocol. But we didn't - and therefore, I think we are constrained in terms of what we can do within the semantics of the protocol spec. This is not just to do with the words being used in the spec. It has to do with the entities in the model used by the protocol and the identifiers that are assigned to those entities. An oai-identifier, for exanmple, is an identifier of an 'item', not of a 'resource' (in terms of the protocol usage of those words). It seems to me that things are likely to become very fuzzy if the 'item' or 'record' suddenly becomes the 'resource' and vice versa. So, based on this, it seems to me that the protocol will 'break' if we start using it to carry the 'resource' where the protocol expects to see the 'record about the resource'. Now, your complex example of the METS package or the MPEG-21 DIDL is an interesting case - because those things can be used to carry both the metadata and the object. Is a METS package the 'resource' or the 'record' in OAI terms? The answer is that it is somewhere in-between. I certainly accept that the METS package is a 'representation' of a 'resource' - but, as I mentioned above, unfortunately we didn't use the words 'resource' and 'representation' in the protocol spec. Yes, the complex package can be viewed as metadata - but metadata about what - about the 'work' that the objects in the package 'represent', or about the particular manifestations contained in the package??! All in all, I think I'm happy with the case where OAI is used to carry the METS or DIDL package that contain objects - but I would be much less happy with a situation where the OAI-PMH is used to carry individual manifestations (an XHTML document for example). But the fuzziness between the package and the item worries me and I'm not sure that we are going to be able to tell them apart very easily in all cases. Enough for now... I agree with you that much more discussion and thinking about these issues is required. I'm certainly happy (and indeed expecting) to be told I'm wrong about any or all of the above! :-) Regards, Andy. > * The scenario as described in the propsoal, in which a single metadata record > corresponds with a single "preprint" is only a special case of - future - > reality. Increasingly, objects held in and described by repositories will be > "compound" or "complex", i.e. consisting of multiple datastreams, not just a > single "preprint". I find that it would be desirable that a solution to get to > the content would be able to handle such situations. The proposed solution > could actually accomodate such 'compound' objects, because the mutliple > datastreams are linked off the jump-off page. There is, however, a problem. > Let's presume we have a situation in which an object is deposited in an > institutional repository that has 2 datastreams, each of which actually has a > unique identifier, say a doi or something. Thinking of a - future - > self-archiving scenario and the trend to accord identifiers at finer levels of > granularity, this is not unlikely at all. Now we get 3 things in dc.identifier > (2 doi's and a link to a jump-off page), and 2 things in the jump-off page > (links to the 2 datastreams). How do I know which doi goes with which > datastream? Information that - I hope we will all agree - is rather significant. > > OK. The point I am trying to make is that the described scenario and its more > general problem domain (beyond eprints, and into the realm of objects with > multiple datastreams) may call for another approach. Our research has shown > that such an approach can remain 100% OAI-PMH-based if a complex object format > such as METS, MPEG-21 DIDL or SCORM is used. These formats can be "parallel" > OAI-PMH "metadata formats" through which harvesters can get to the content > without running into issues such as the ones mentioned above. Content can be > embedded in the XML wrappers or pointed at by them. Identifiers can be > unambiguously connected to content. If content changes, the datstamp of the > "conplex" record changes. > > I anticipate concerns re the overhead of introducing a solution based on a > complex object format. At this point, I would like to say 2 things with this > respect: > > * It took 2 people on my team about 2 days to create a prototype plug-in that > enables OAI-PMH harvesting of content from DSpace repositories. Our plug-in > rendered content using the MPEG-21 DIDL XML wrapper format. Most of the time > invested in this plug-in was spent figuring out the DSpace API and a sensible > way to map the DSpace data model to the DIDL data model. The prototype was > demonstrated at the DSpace federation meeting, last week. Although > questions/issues did arise in the course of our work, non seemed unsolvable. > But it is my impression that the very fast delivery of a prototype indicates the > feasibility of the complex format approach. > > * I would personally be very willing to spend time with the apporpiate > representatives of the community - including yourself - to work towards a > solution that is future-proof and provides adequate guarantees regarding > perceived requirements of a content-harvesting solution. I would actually > prefer that over going for a solution which is attractive at first glance > because of its obvious simplicity, but which seems to raise some relevant > questions upon closer inspection. > > To end, I would like to thank you for bringing this topic to the list. I have > had many private email exchanges over the last few months especially with > representatives from DARE and DINI about this and related problem domains. I > hope that your mail can be another impulse towards a joint action in this realm. > The problem is very real, and I would love our community to jointly create a > really good solution to it. > > many greetings > > herbert > > > Andy Powell wrote: > > > The JISC-funded ePrints UK project has a requirement to automatically > > harvest both metadata and full-text from the eprint archives within UK > > academia (and potentially elsewhere). This is so that we can pass both > > metadata and full-text to the various 'enhancement' Web services offered > > by our partners. > > > > http://www.rdn.ac.uk/projects/eprints-uk/ > > > > In order for our harvesting robot to be able to do this, it must be able > > to reliably (and automatically) determine the correct URL(s) for the > > various full-text manifestation(s) (HTML, PDF, RTF, etc.) of each eprint. > > > > Our "Using simple Dublin Core to describe eprints" guidelines are intended > > to encourage greater consistency in the metadata that is exposed by eprint > > archives using the 'oai_dc' format within the OAI Protocol for Metadata > > Harvesting (OAI-PMH). Somewhat perversely, because we stick rigidly to > > the semantics of the DC element set, our guidelines make determining the > > URL of each manifestation that is available quite difficult. (This is > > largely a consequence of the 'simple' nature of 'simple DC'!). In > > general, the URL in the element of the oai_dc record is > > the URL of a jump-off page, rather than a direct link to the full-text. > > > > We would like to suggest a new proposal for unambiguously embedding the > > URL for each manifestation of an eprint into the (X)HTML jump-off page for > > that eprint. Since the jump-off page is generated automatically by the > > eprint archive software, doing this shouldn't be too difficult (in fact, > > we would hope that archive software, such as eprints.org, will be > > configured to do this out of the box). > > > > If this proposal is adopted, it will make it much easier to write OAI > > service provider software that can reliably gather the full-text of an > > eprint, given only the oai_dc record for that eprint. > > > > The proposal is at > > > > http://www.rdn.ac.uk/projects/eprints-uk/docs/encoding-fulltext-links/ > > > > Comments are welcome, > > > > Andy > > -- > > Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK > > http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933 > > Resource Discovery Network http://www.rdn.ac.uk/ > > ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/ > > _______________________________________________ > > OAI-implementers mailing list > > List information, archives, preferences and to unsubscribe: > > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers > > > > > > -- > Herbert Van de Sompel > digital library research & prototyping > Los Alamos National Laboratory - Research Library > + 1 (505) 667 1267 / http://lib-www.lanl.gov/~herbertv/ > > "met gestreken jeans de dansvloer penetreren" > > > Andy -- Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933 Resource Discovery Network http://www.rdn.ac.uk/ ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/ From herbertv@lanl.gov Fri Mar 19 19:47:19 2004 From: herbertv@lanl.gov (herbert van de sompel) Date: Fri, 19 Mar 2004 12:47:19 -0700 Subject: [OAI-implementers] Automatically gathering the full-text of eprints In-Reply-To: References: <4058A4CB.6040200@lanl.gov> Message-ID: <405B4E47.5090309@lanl.gov> Dear Andy, Thanks a lot for your thoughtful comments. I provide some feedback, here, hoping that we can find some time to discuss all of this with representatives from the community in front of a much needed blackboard (whiteboard?). First, let me say that this mail isn't at all about trying to prove you wrong. Quite to the contrary. This is about conveying my perception of matters, hoping to further our joint insights in this rather complicated domain. Second, I feel that your FRBR-related comments, while very legitimate, are on quite the opposite end of the scale of your pragmatic, useful hack. I need to take some time to try and think about how much or how little a discussion related to making content accessible to service providers in the OAI-PMH framework should get involved with this. At this point I am puzzled. For example, I am not sure how much Google cares about the work/manifestation issue. Google wants a FRBR Item. Third, I must emphasize that I am very pleased to hear that - in principle - you find the notion of shipping modelled representations (DIDL, METS, ...) of resources through the OAI-PMH acceptable. Below, I hope to give some more indications as to why I feel that is indeed acceptable/appropriate in the context of the OAI-PMH. => Your W3C TAG resource/representation perspective is very helpful. Building on your insights, and with some stretching, one could distinguish the following levels: level 1: W3C.resource ~ FRBR.work ~ OAI-PMH.resource level 2: W3C.representation ~ FRBR.manifestation ~ OAI-PMH.record => Comments I want to make at this point: * The OAI-PMH doesn't really say that an OAI-PMH.resource sits at level 1, and I am not even sure it matters a lot for this discussion because the action we are interested in is at level 2 at which the OAI-PMH.resource clearly does not sit. * Putting OAI-PMH.record at the level of W3C.representation makes a lot of sense, especially since v.2.0 of the protocol in which OAI-PMH.records have gained autonomy by getting their own datestamp. * I dare to suggest that OAI-PMH.record ~ OAI-PMH.metadata ~ structured data pertaining to an OAI-PMH.resource. I dare to make this statement because the OAI-PMH has this built-in notion that equals metadata to XML. So I feel it is actually constructive for our reasoning to get rid of the term 'metadata' with its numerous and in many cases vague/loaded interpretations, and consider an OAI-PMH.record to be structured data pertaining to an OAI-PMH.resource. That is a quite unambiguous definition. * I think OAI-PMH.item doesn't matter in this discussion as it merely is a gateway to OAI-PMH.records. I feel we can loose that term too for our discussion. => I really don't think that OAI-PMH.identifier matters in any of this. While the OAI-PMH.identifier is a crucial key for harvesting, it doesn't need to have anything to do with any 'real' identifiers, nor with the real world data. Agreed that in some implementations - for practical reasons - it does, but there is no reason for it to. So we should not be distracted by it. If we really need to accord meaning to OAI-PMH.identifier, we could consider it to be the identifier shared by all W3C.representations of a W3C.resource (~OAI-PMH.resource), as it acts as the gateway to all OAI-PMH.records pertaining to a OAI-PMH.resource. Even in this interpretation, the OAI-PMH.identifier doesn't come close to becoming the identifier of the OAI-PMH.resource, irrespective of what the exact nature of the OAI-PMH.record is. - Cf. the W3C TAG distinction between URI for resource and URI for representation. - Cf URI for resource == doi / URI for representation is OAI-PMH request using unrelated OAI-PMH.identifier => So, I think we lost quite some overhead in the above. We are down to OAI-PMH.resource (rather undefined, how nice) and OAI-PMH.record (well defined, as being structured data pertaining to OAI-PMH.resource) to play with. I think we can all go along with an interpretation that an oai_dc record is a W3C.representation of an OAI-PMH.resource. I trust we would also agree this is the case for a special-purpose QDC record that 'models' the OAI-PMH.resource, and in doing so includes some links to datastreams of which that OAI-PMH.resource consists. The step to a complex object solution (METS, DIDL, ...) is really small from here as those indeed provide such by-reference technique to include datastreams, as well as by-value techniques to do so. In addtion, some complex object approaches actually have a data model so that the required 'modelling' boils down to mapping a specific world view to the existing data model. => I very much share your opinion that directly shipping a datastream/representation of the resource in an 'unmodelled' manner smells really fishy, as it makes us loose the 'structured data pertaining to the resource' life buoy. Jeez, this took me ages to write. And now it is my turn to be proven wrong ;-) cheers herbert Andy Powell wrote: > On Wed, 17 Mar 2004, herbert van de sompel wrote: > > >>a. The assumption that a harvester knows that something that is in the >>dc.identifier element of oai_dc points to a - compliant - jump-off page. There >>are two problems with this assumption: >>- lots of things can be in the dc.identifier element both resolvable and >>unresolvable >>- lots of things at the end of the thing identified by the content of >>dc.identifier (if resolvable) will not be compliant jump-off pages > > > Herbert, > thanks for the email. Yes, I completely agree with this analysis. The > proposal is a bit of a hack, and we should perhaps have made this clearer > in the document! > > However, I think it is a useful hack :-). Particularly so in the context > of our other recommendations for using simple DC to describe eprints. > Furthermore, I would see it as good practice to embed XHTML > elements into jump-off pages anyway - irrespective of whether the > intention is to ease harvesting by robots or not. So I certainly don't > see our proposal as causing any harm. > > The rest of your email raises some quite significant issues - some of > which I suspect are not very easy to discuss by email. I don't propose > giving a detailed response here, but I would like to note a few issues for > consideration... > > Firstly, your comments about the complexity of the objects being described > only goes part-way to describing the problem. The OAI-PMH specification, > rightly, says very little about the nature of the resources that are > described by the records exchanged using the protocol. However, > particular applications of the protocol do need to be clear about the > nature of the resources being described. Furthermore, the complexity of > the problem is not just about whether the resources being described are > aggregations of multiple objects. Part of the complexity arises because > those those resources/objects fit into a model of the real world that > spans both 'conceptual' works and specific digital or physical > 'manifestations' of those conceptual works. > > Does the oai_dc record that I allow you to harvest describe a conceptual > work (or expression of a work), an article for example, or does it > describe one of the particular manifestations of that work, the PDF copy > of the article for example? > > You'll note that I am intentionally using terms from the IFLA FRBR > (Functional Requirements for Bibliographic Records) model here. > > In our guidelines for using simple DC to describe eprints we made the > explicit decision to reflect the fact that most implementations of eprint > archives (that we looked at) appeared to be configured to expose oai_dc > metadata about the 'work' rather than about the particular manifestations > of the work (though actually, in many cases (even in our own guidelines > to a certain extent) there is a certain amount of fuzziness going on!). > > Unfortunately, there is no real way of indicating in a simple DC record > that the work (as opposed to the manifestation) is being described - this > would be difficult even in qualified DC currently, because the current > DCMI Type vocabulary doesn't allow us to make those distinctions. But, in > principle, the DC model is rich enough to handle this complexity - if > we are prepared to put the effort in to agree how to do it. > > But the situation is even more complex than that because it is not clear > to me where OAI resources and records sit within the Web architectural > model of 'resources' and 'representations'. My suspicion is that the FRBR > 'manifestation' is the equivalent of the Web architecture > 'represresentation' of the FRBR 'work' (if you see what I mean!). The > oai_dc record (and indeed the jump-off page) is a 'representation' of > the 'work' (assuming that is what is being described). But at this point > we almost certainly need a diagram or two! :-( > > OK, so on then to the question about whether the protocol can and/or > should be used to exchange 'resources' as well as 'metadata' about > 'resources'. > > The protocol spec is very explicit in differentiating 'resources' from > 'items' and 'records' and makes it very clear that the protocol be used to > exchange 'metadata' between services - I'm thinking of section 2.2 in > particular. Now, with hindsight, I really wish we'd talked instead about > 'resources' and 'representations' rather than resources, items, records > and metadata, because that would have given us much more flexibility about > what we do with the protocol. But we didn't - and therefore, I think we > are constrained in terms of what we can do within the semantics of the > protocol spec. > > This is not just to do with the words being used in the spec. It has to > do with the entities in the model used by the protocol and the identifiers > that are assigned to those entities. An oai-identifier, for exanmple, is > an identifier of an 'item', not of a 'resource' (in terms of the protocol > usage of those words). It seems to me that things are likely to become > very fuzzy if the 'item' or 'record' suddenly becomes the 'resource' and > vice versa. > > So, based on this, it seems to me that the protocol will 'break' if we > start using it to carry the 'resource' where the protocol expects to see > the 'record about the resource'. > > Now, your complex example of the METS package or the MPEG-21 DIDL is an > interesting case - because those things can be used to carry both the > metadata and the object. Is a METS package the 'resource' or the 'record' > in OAI terms? The answer is that it is somewhere in-between. I certainly > accept that the METS package is a 'representation' of a 'resource' - but, > as I mentioned above, unfortunately we didn't use the words 'resource' and > 'representation' in the protocol spec. Yes, the complex package can be > viewed as metadata - but metadata about what - about the 'work' that the > objects in the package 'represent', or about the particular manifestations > contained in the package??! > > All in all, I think I'm happy with the case where OAI is used to carry the > METS or DIDL package that contain objects - but I would be much less happy > with a situation where the OAI-PMH is used to carry individual > manifestations (an XHTML document for example). But the fuzziness between > the package and the item worries me and I'm not sure that we are going to > be able to tell them apart very easily in all cases. > > Enough for now... I agree with you that much more discussion and thinking > about these issues is required. I'm certainly happy (and indeed > expecting) to be told I'm wrong about any or all of the above! :-) > > Regards, > > Andy. > > >>* The scenario as described in the propsoal, in which a single metadata record >>corresponds with a single "preprint" is only a special case of - future - >>reality. Increasingly, objects held in and described by repositories will be >>"compound" or "complex", i.e. consisting of multiple datastreams, not just a >>single "preprint". I find that it would be desirable that a solution to get to >>the content would be able to handle such situations. The proposed solution >>could actually accomodate such 'compound' objects, because the mutliple >>datastreams are linked off the jump-off page. There is, however, a problem. >>Let's presume we have a situation in which an object is deposited in an >>institutional repository that has 2 datastreams, each of which actually has a >>unique identifier, say a doi or something. Thinking of a - future - >>self-archiving scenario and the trend to accord identifiers at finer levels of >>granularity, this is not unlikely at all. Now we get 3 things in dc.identifier >>(2 doi's and a link to a jump-off page), and 2 things in the jump-off page >>(links to the 2 datastreams). How do I know which doi goes with which >>datastream? Information that - I hope we will all agree - is rather significant. >> >>OK. The point I am trying to make is that the described scenario and its more >>general problem domain (beyond eprints, and into the realm of objects with >>multiple datastreams) may call for another approach. Our research has shown >>that such an approach can remain 100% OAI-PMH-based if a complex object format >>such as METS, MPEG-21 DIDL or SCORM is used. These formats can be "parallel" >>OAI-PMH "metadata formats" through which harvesters can get to the content >>without running into issues such as the ones mentioned above. Content can be >>embedded in the XML wrappers or pointed at by them. Identifiers can be >>unambiguously connected to content. If content changes, the datstamp of the >>"conplex" record changes. >> >>I anticipate concerns re the overhead of introducing a solution based on a >>complex object format. At this point, I would like to say 2 things with this >>respect: >> >>* It took 2 people on my team about 2 days to create a prototype plug-in that >>enables OAI-PMH harvesting of content from DSpace repositories. Our plug-in >>rendered content using the MPEG-21 DIDL XML wrapper format. Most of the time >>invested in this plug-in was spent figuring out the DSpace API and a sensible >>way to map the DSpace data model to the DIDL data model. The prototype was >>demonstrated at the DSpace federation meeting, last week. Although >>questions/issues did arise in the course of our work, non seemed unsolvable. >>But it is my impression that the very fast delivery of a prototype indicates the >>feasibility of the complex format approach. >> >>* I would personally be very willing to spend time with the apporpiate >>representatives of the community - including yourself - to work towards a >>solution that is future-proof and provides adequate guarantees regarding >>perceived requirements of a content-harvesting solution. I would actually >>prefer that over going for a solution which is attractive at first glance >>because of its obvious simplicity, but which seems to raise some relevant >>questions upon closer inspection. >> >>To end, I would like to thank you for bringing this topic to the list. I have >>had many private email exchanges over the last few months especially with >>representatives from DARE and DINI about this and related problem domains. I >>hope that your mail can be another impulse towards a joint action in this realm. >> The problem is very real, and I would love our community to jointly create a >>really good solution to it. >> >>many greetings >> >>herbert >> >> >>Andy Powell wrote: >> >> >>>The JISC-funded ePrints UK project has a requirement to automatically >>>harvest both metadata and full-text from the eprint archives within UK >>>academia (and potentially elsewhere). This is so that we can pass both >>>metadata and full-text to the various 'enhancement' Web services offered >>>by our partners. >>> >>>http://www.rdn.ac.uk/projects/eprints-uk/ >>> >>>In order for our harvesting robot to be able to do this, it must be able >>>to reliably (and automatically) determine the correct URL(s) for the >>>various full-text manifestation(s) (HTML, PDF, RTF, etc.) of each eprint. >>> >>>Our "Using simple Dublin Core to describe eprints" guidelines are intended >>>to encourage greater consistency in the metadata that is exposed by eprint >>>archives using the 'oai_dc' format within the OAI Protocol for Metadata >>>Harvesting (OAI-PMH). Somewhat perversely, because we stick rigidly to >>>the semantics of the DC element set, our guidelines make determining the >>>URL of each manifestation that is available quite difficult. (This is >>>largely a consequence of the 'simple' nature of 'simple DC'!). In >>>general, the URL in the element of the oai_dc record is >>>the URL of a jump-off page, rather than a direct link to the full-text. >>> >>>We would like to suggest a new proposal for unambiguously embedding the >>>URL for each manifestation of an eprint into the (X)HTML jump-off page for >>>that eprint. Since the jump-off page is generated automatically by the >>>eprint archive software, doing this shouldn't be too difficult (in fact, >>>we would hope that archive software, such as eprints.org, will be >>>configured to do this out of the box). >>> >>>If this proposal is adopted, it will make it much easier to write OAI >>>service provider software that can reliably gather the full-text of an >>>eprint, given only the oai_dc record for that eprint. >>> >>>The proposal is at >>> >>>http://www.rdn.ac.uk/projects/eprints-uk/docs/encoding-fulltext-links/ >>> >>>Comments are welcome, >>> >>>Andy >>>-- >>>Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK >>>http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933 >>>Resource Discovery Network http://www.rdn.ac.uk/ >>>ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/ >>>_______________________________________________ >>>OAI-implementers mailing list >>>List information, archives, preferences and to unsubscribe: >>>http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers >>> >>> >> >>-- >>Herbert Van de Sompel >>digital library research & prototyping >>Los Alamos National Laboratory - Research Library >>+ 1 (505) 667 1267 / http://lib-www.lanl.gov/~herbertv/ >> >>"met gestreken jeans de dansvloer penetreren" >> >> >> > > > Andy > -- > Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK > http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933 > Resource Discovery Network http://www.rdn.ac.uk/ > ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/ > _______________________________________________ > OAI-implementers mailing list > List information, archives, preferences and to unsubscribe: > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers > > -- Herbert Van de Sompel digital library research & prototyping Los Alamos National Laboratory - Research Library + 1 (505) 667 1267 / http://lib-www.lanl.gov/~herbertv/ "met gestreken jeans de dansvloer penetreren" From evelyn@ime.usp.br Fri Mar 26 16:06:39 2004 From: evelyn@ime.usp.br (Evelyn Cristina Pinto) Date: Fri, 26 Mar 2004 13:06:39 -0300 (EST) Subject: [OAI-implementers] Publication date and Datestamps Message-ID: I noticed that the datestamp field is usually used for harvesting date, but some repositories use it for submission date. Is the dc:date field used for publication date and submission date too? How could I differentiate them? Does anyone could give me more information about it? Thanks, Evelyn. ================================================================= master student in Computer Science at USP-Brazil From simeon@cs.cornell.edu Fri Mar 26 16:49:27 2004 From: simeon@cs.cornell.edu (Simeon Warner) Date: Fri, 26 Mar 2004 11:49:27 -0500 (EST) Subject: [OAI-implementers] Publication date and Datestamps In-Reply-To: References: Message-ID: Within OAI the record datestamp MUST be the date of last update of the metadata record. Otherwise incremental harveting will not work. Within simple dc metadata there is no (accepted) way to differentiate types of date in the dc:date fields. For e-prints, a good recommendation is that in the RDN guidelines given by Andy Powell, Michael Day and Peter Cliff: http://www.rdn.ac.uk/projects/eprints-uk/docs/simpledc-guidelines/#date dc:date (*) Eprint-specific Recommendation: The 'last-modified' date of the eprint and/or the date of its accession into the archive. The date should be formatted according to the W3C encoding rules for dates and times [9] (a profile based on ISO 8601 known as W3C-DTF), for example: 2000-12-25 1999 2003-01 If necessary, repeat this element to provide both the last-modified date and the date of accession. The last-modified date will be assumed to be the more recent of the two dates. If only one date is provided, it will be assumed that the last-modified date and the date of accession are the -- Simeon same. On Fri, 26 Mar 2004, Evelyn Cristina Pinto wrote: > I noticed that the datestamp field is usually used for harvesting date, > but some repositories use it for submission date. Is the dc:date field > used for publication date and submission date too? How could I > differentiate them? Does anyone could give me more information about it? > > Thanks, > Evelyn. > > ================================================================= > master student in Computer Science at USP-Brazil > > _______________________________________________ > OAI-implementers mailing list > List information, archives, preferences and to unsubscribe: > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers > From evelyn@ime.usp.br Sat Mar 27 13:35:34 2004 From: evelyn@ime.usp.br (Evelyn Cristina Pinto) Date: Sat, 27 Mar 2004 10:35:34 -0300 (EST) Subject: [OAI-implementers] Publication date and Datestamps In-Reply-To: Message-ID: First at all, thanks a lot for the information. And I have a suggestion. I know that OAI-PHM is a general protocol, not only for eprints. However I think that the publication date is a *very* important information about an eprint life-cycle. So I think that it should be shown explicitly through the protocol. Regards, Evelyn. On Fri, 26 Mar 2004, Simeon Warner wrote: > > Within OAI the record datestamp MUST be the date of last update of > the metadata record. Otherwise incremental harveting will not work. > > Within simple dc metadata there is no (accepted) way to differentiate > types of date in the dc:date fields. For e-prints, a good recommendation > is that in the RDN guidelines given by Andy Powell, Michael Day and Peter > Cliff: > http://www.rdn.ac.uk/projects/eprints-uk/docs/simpledc-guidelines/#date > > dc:date (*) Eprint-specific Recommendation: > > The 'last-modified' date of the eprint and/or the date of its accession > into the archive. > > The date should be formatted according to the W3C encoding rules for > dates and times [9] (a profile based on ISO 8601 known as W3C-DTF), for > example: > > 2000-12-25 > 1999 > 2003-01 > > If necessary, repeat this element to provide both the last-modified date > and the date of accession. The last-modified date will be assumed to be > the more recent of the two dates. If only one date is provided, it will > be assumed that the last-modified date and the date of accession are the > > -- > Simeon > > > same. On Fri, 26 Mar 2004, Evelyn Cristina Pinto wrote: > > I noticed that the datestamp field is usually used for harvesting date, > > but some repositories use it for submission date. Is the dc:date field > > used for publication date and submission date too? How could I > > differentiate them? Does anyone could give me more information about it? > > > > Thanks, > > Evelyn. > > > > ================================================================= > > master student in Computer Science at USP-Brazil > > > > _______________________________________________ > > OAI-implementers mailing list > > List information, archives, preferences and to unsubscribe: > > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers > > > From herbertv@lanl.gov Sun Mar 28 16:25:16 2004 From: herbertv@lanl.gov (herbert van de sompel) Date: Sun, 28 Mar 2004 09:25:16 -0700 Subject: [OAI-implementers] Carl Lagoze receives 2004 Kilgour Award Message-ID: <4066FC6C.7020109@lanl.gov> Please join me in congratulating Carl Lagoze on receiving the 2004 Kilgour Award for Research in Library and Information Technology. More information at http://www.lita.org/ala/lita/litaresources/litascholarships/04fred.htm . herbert van de sompel -- Herbert Van de Sompel digital library research & prototyping Los Alamos National Laboratory - Research Library + 1 (505) 667 1267 / http://lib-www.lanl.gov/~herbertv/