[OAI-general] Re: The archival status of archived papers

Hussein Suleman hussein@vt.edu
Mon, 09 Dec 2002 11:47:26 -0500

hi Chris

i have tackled some of these problems while designing a peer review 
model/protocol/component as part of the ODL (Open Digital Libraries) 

specifically, i worked on a system with the following features:
- metadata records are linked into a hierarchical management system - 
files are stored independently and linked into the metadata using URLs
- draft versions are linked linearly so that a chain can be established 
- all drafts as retained as OAI-accessible metadata records
- an "authoritative" OAI identifier is maintained for the most current 
- the review protocol provides a summary of the status of a record, its 
metadata, its version chain, etc.

for our purposes (aimed at the CSTC and JERIC systems), after an item is 
approved, its metadata is stored in a different archive. thus, the draft 
versions are no longer used and are not "addressable". however, this 
should be possible with a few changes to the model/protocol/software. in 
any event, having a quick look at the stuff i worked on might give you 
some ideas in terms of linking and accessing of resources.

if you are interested in looking over some of the work, you can read the 
appropriate sections of my dissertation:
(see pages 64-77, especially p75)
also, check the ODL website at http://oai.dlib.vt.edu/odl for more 
information, papers, talks, demos, and some software you can play with 
if you really want to :)

hope this is useful ...


Christopher Gutteridge wrote:

> I'm considering adding version control to the files. This is going to be
> needed for the oft discussed j-prints (eprints with peer review)
> I think for archives like ECS letting the author un-deposit then resubmit is
> probably OK, but bad for cogprints. I'm considering adding it as an *option*
> which the archive admin can enable/disable.
> This is definitely a quality vs. quantity issue. ECS EPrints is already such
> a state I don't think it can hurt. Cogprints is another matter.
> Suggestions of a perfect(better) solution are more than welcome.
> On Mon, Dec 02, 2002 at 07:40:34 +0000, Stevan Harnad wrote:
>>On Mon, 2 Dec 2002, J Adrian Pickering wrote:
>>>It isn't just a technical issue.
>>>If you follow Mark's solution you end up with the risk of people citing 
>>>papers that don't contain the information they cite anymore. 
>>Mark suggested that an archived article should be a persisting object,
>>with a persisting identifier. That seems reasonable. Now if there have
>>been several successive versions of a paper (which the author wants to
>>consider as successive versions rather than new papers), then it also
>>seems reasonable that the archive should link all the successive
>>versions and point to the latest one by default.
>>All the prior versions are preserved, and accessible (unless the archive
>>has a policy allowing withdrawal -- a policy that should not be
>>It is the user's or citer's responsibility to specify which version they
>>have used/cited, if there are more than one. That will become part of
>>good scholarship, just as spelling the author's name correctly is.
>>So we need both unique identifier for a generic paper and a unique
>>identifier for a specific draft of that paper.
>>>This is particularly likely when the matter being discussed is 
>>>controversial. A citation strictly refers to a manifestation/version 
>>>not the generic paper. 
>>Correct. It refers to a specific draft, usually with a calendar date and
>>some other identifying features.
>>>If the person making the citation wishes to change the citation to a later 
>>>version then that is *their* right. The link is *their* link, not the 
>>>target's. If you have 'published' something then it is in the public domain 
>>>and you must expect people to cite it (and that version).
>>I mostly agree. But this seems to be covered by providing unique version
>>identifies; it does not prevent the Archive from defaulting to the most
>>recent version -- while offering the earlier versions too.
>>It might be making a subtle difference in the view people are taking on
>>this whether they are thinking of the Archive as a centralized one
>>(rather like a journal) or a distributed institutional one (rather like
>>author-provided reprints). It is conceivable that different drafts of a
>>paper will be in different archives. Those distributed versions too,
>>need to be trackable and integrated. My technical inexpertise leaves me
>>unable to propose how to do this, but it is the hardest-case scenario,
>>and the one we should aim to cover, eventually. Assuming it will all
>>be in one central archive is probably unrealistic (and unnecessary, in the
>>spirit of distributed OAI archiving and interoperability).
>>To my layman's ear it sounds as if every version of a paper will need a
>>unique version identifier, and in addition, there will need to be some
>>interoperable ways of integrating different versions as being different
>>versions of the same paper. The new scholarship will be, at the gross
>>level, concerned only with citing the generic paper (without worrying
>>about version fine-tuning), but the careful scholars need to have the
>>option of specifying the version too, uniquely, for those cases where
>>it matters.
>>>I agree that archive items should persist and, therefore, the references to 
>>>them. The relationship between the versions should be issue to click 
>>>through too.
>>>Regards the 'user' query, they need to be told not to submit so many 
>>>versions i.e. *think* carefully before submission! This is a matter of 
>>>policy and governs the degree of 'resistance' there is to making 
>>>submissions. There needs to be some otherwise the quality level drop.
>>It cuts both ways. Yes, authors should not start archiving willy-nilly
>>every raw draft and every afterthought. But they should not feel
>>constrained in doing corrections and updates whenever they are needed
>>too. Authors should know, though, that from the moment they place a draft
>>into a public open-access archive, it may be read, cited, and pointed to
>>-- that specific draft -- in perpetuum. That is part of what it means
>>to have archived something publicly.
>>I'm sure scholars will easily get a sense for this, as they have for
>>everything else. In the beginning some will fumble and treat the
>>archive as labile first drafts or lapidary touch-me-nots, but experience
>>and feedback will calibrate everyone's practice and reflexes. The
>>Archives just have to make sure they do not pre-judge or short-circuit
>>any important options a priori.
>>>>Stevan Harnad
>>>>On Mon, 2 Dec 2002, Mark Doyle wrote:
>>>>>On Tuesday, November 26, 2002, at 08:27  PM, Stevan Harnad wrote:
>>>>>>Now it is conceivable that the eprints architecture can be slightly
>>>>>>modified, so that the old, suppressed URL for the deleted paper
>>>>>>automatically redirects to the new draft if someone tries to access
>>>>>>the old one. That I have to let Chris reply about. Here I have merely
>>>>>>explained the rationale for not having designed the archive so a paper
>>>>>>could be deposited, and then modified willy-nilly under the same URL.
>>>>>>For that would not have been an archive at all, and user complaints,
>>>>>>about trying to use and cite a moving target, would have far
>>>>>>depositor complaints about what to do with after-thoughts and
>>>>>Well, that is one way to look at it. On the other hand, arXiv.org uses
>>>>>version numbers and the persistent name/id and URL (say hep-th/0210311
>>>>>and http://arXiv.org/abs/hep-th/0210311) always points to the latest
>>>>>with links to the earlier versions.
>>>>>I believe you are advocating a poor design choice here. One cannot
>>>>>the importance of human-friendly persistent names that are easily
>>>>>to URL's for linking and quick location. Patching the system to
>>>>>redirect to the
>>>>>latest linked version is a hack. Is one actually able to download
>>>>>the earlier version (which is what was cited)? Generally, a better
>>>>>is to give a good persistent name to a "work" and not a single
>>>>>of that work (whether it be a particular format or a particular
>>>>>version) and
>>>>>then give a reader a single point of entry into the system that can be
>>>>>or cited reliably which gives a choice of what to download. Cutting off
>>>>>to an earlier, citeable version is a mistake. Archives should not
>>>>>delete items
>>>>>or make them hard to access - rather they should show items in context
>>>>>and give easy access to an item's history and versioning with a single
>>>>>identifier for the work taken as a whole.
>>>>>Mark Doyle
>>>>>Manager, Product Development
>>>>>The American Physical Society

hussein suleman - hussein@vt.edu - vtcs - http://www.husseinsspace.com