[OAI-general] Re: [BOAI] Re: Cliff Lynch on Institutional Archives

Hussein Suleman hussein@cs.uct.ac.za
Thu, 27 Mar 2003 09:17:52 +0200


hi

this may be stating the obvious, but why not use sets for the separate 
disciplines, aimed at particular service providers? i say it that way 
because some disciplines are not well-defined (namely, computer science) 
so such archives may want to play ball with multiple service providers 
and hence may need different sets.

in any event, for something like physics, a simple set might do the 
trick at the source. then, somewhat in keeping with the Kepler model (as 
published in DLib a while back), the service provider can provide an 
interface for potential data providers to self-register. i know this 
sounds dodgy, but think of it as an alternative mechanism for 
contribution. either individual users submit individual papers or groups 
submit baseURLS - both go through some kind of review and while one 
leads to once-off storage, the other leads to periodic harvesting.

what remains a difficult problem, however, is how to recreate the 
metadata used by the service provider as its native format. so, for a 
typical example, if arXiv classifies items using a specific set 
structure, this is certainly not going to be the default for an 
institutional archive. does the service provider automatically or 
manually reclassify? or does it not allow browsing by categories? in 
either event, the quality of the metadata from the perspective of the 
service provider may be an impetus for potential users to want to 
replicate their effort rather than rely on the automated submission from 
their own institutions ... this needs more thought ...

ttfn,
----hussein


Christopher Gutteridge wrote:
> Disciplinary/subject archives vs. Institutional/Organisation/Region based
> archives. This is going to be a key challenge now open archives begin
> to gain momentum. 
> 
> For example; we are planning a University-wide eprints archive. I am 
> concerned that some physisists will want to place their items in both
> the university eprints service AND the arXiv physics archive. They may 
> be required to use the university service, but want to use arXiv as it
> is the primary source for their discipline. This is a duplication of 
> effort and a potential irritation.
> 
> Ultimately, of course, I'd hope that diciplinary archives will be replaced
> with subject-specific OAI service providers harvesting from the institutional
> archives. But there is going to be a very long transition period in which
> the solution evolves from our experience.
> 
> What I'm asking is; has anyone given consideration to ways of smoothing
> over this duplication of effort? Possibly some negotiated automated process
> for insitutional archives uploading to the subject archive, or at least
> assisting the author in the process.
> 
> This isn't the biggest issue, but it'd be good to address it before it
> becomes more of a problem.
> 
>   Christopher Gutteridge
>   GNU EPrints Head Developer
>   http://software.eprints.org/
> 
> On Sun, Mar 16, 2003 at 02:15:56 +0000, Stevan Harnad wrote:
> 
>>On Sat, 15 Mar 2003, Thomas Krichel wrote:
>>
>>
>>>  Stevan Harnad writes:
>>>
>>>sh> There is no need -- in the age of OAI-interoperability -- for
>>>sh> institutional archives to "feed" central disciplinary archives:
>>>
>>>  I do not share what I see as a  blind faith in interoperability
>>>  through a technical protocol. 
>>
>>I am quite happy to defer to the technical OAI experts on this one, but let
>>us put the question precisely: 
>>
>>Thomas Krichel suggests that institutional (OAI) data-archives
>>(full-texts) should "feed" disciplinary (OAI) data-archives,
>>because OAI-interoperability is somehow not enough. I suggest that
>>OAI-interoperability (if I understand it correctly) should be enough. No
>>harm in redundant archiving, of course, for backup and security, but not
>>necessary for the usage and functionality itself. In fact, if I understand
>>correctly the intent of the OAI distinction between OAI data-providers -- 
>>http://www.openarchives.org/Register/BrowseSites.pl 
>>-- and OAI service-providers --
>>http://www.openarchives.org/service/listproviders.html 
>>-- it is not the full-texts of data-archives that need to be "fed" to
>>(i.e., harvested by) the OAI service providers, but only their metadata.
>>
>>Hence my conclusion that distributed, interoperable OAI institutional
>>archives are enough (and the fastest route to open-access). No need
>>to harvest their contents into central OAI discipline-based archives
>>(except perhaps for redundancy, as backup). Their OAI interoperability
>>should be enough so that the OAI service-providers can (among other things)
>>do the "virtual aggregation" by discipline (or any other computable
>>criterion) by harvesting the metadata alone, without the need to harvest
>>full-text data-contents too.
>>
>>It should be noted, though, that Thomas Krichel's excellent RePec
>>archive and service in Economics -- http://repec.org/ -- goes
>>well beyond the confines of OAI-harvesting! RePec harvests non-OAI
>>content too, along lines similar to the way ResearchIndex/citeseer --
>>http://citeseer.nj.nec.com/cs -- harvests non-OAI content in computer
>>science. What I said about there being no need to "feed" institutional OAI
>>archive content into disciplinary OAI archives certainly does not apply
>>to *non-OAI* content, which would otherwise be scattered willy-nilly
>>all over the net and not integrated in any way. Here RePec's and
>>ResearchIndex's harvesting is invaluable, especially as RePec already
>>does (and ResearchIndex has announced that it plans to) make all its
>>harvested content OAI-compliant!
>>
>>To summarize: The goal is to get all research papers, pre- and
>>post-peer-review, openly accessible (and OAI-interoperable) as soon as
>>possible. (These are BOAI Strategies 1 [self-archiving] and 2
>>[open-access journals]: http://www.soros.org/openaccess/read.shtml
>>). In principle this can be done by (1) self-archiving them in central
>>OAI disciplinary archives like the Physics arXiv (the biggest and
>>first of its kind) -- http://arxiv.org/show_monthly_submissions
>>-- by (2) self-archiving them in distributed institutional OAI
>>Archives -- http://www.ecs.soton.ac.uk/~harnad/Temp/tim.ppt -- by (3)
>>self-archiving them on arbitrary Web and FTP sites (and hoping they
>>will be found or harvested by services like Repec or ResearchIndex)
>>or by (4) publishing them in open-access journals (BOAI Strategy 2:
>>http://www.soros.org/openaccess/journals.shtml ).
>>
>>My point was only that because researchers and their institutions
>>(*not* their disciplines) have shared interests vested in maximizing
>>their joint research impact and its rewards, institution-based
>>self-archiving (2) is a more promising way to go -- in the age of
>>OAI-interoperability -- than discipline-based self-archiving (1), even
>>though the latter began earlier. It is also obvious that both (1) and
>>(2) are preferable to arbitrary Web and FTP self-archiving (3), which
>>began even earlier (although harvesting arbitrary Website and FTP contents
>>into OAI-compliant Archives is still a welcome makeshift strategy
>>until the practise of OAI self-archiving is up to speed). Creating new
>>open-access journals and converting the established (20,000) toll-access
>>journals to open-access is desirable too, but it is obviously a much
>>slower and more complicated path to open access than self-archiving,
>>so should be pursued in parallel.
>>
>>My conclusion in favor of institutional self-archiving is based on the
>>evidence and on logic, and it represents a change of thinking,
>>for I had originally advocated (3) Web/FTP self-archiving --
>>http://www.arl.org/scomm/subversive/toc.html -- then switched allegiance
>>to central self-archiving (1), even creating a discipline-based archive:
>>http://cogprints.ecs.soton.ac.uk/ But with the advent of OAI in 1999,
>>plus a little reflection, it became apparent that
>>institutional self-archiving (2) was the fastest, most direct, and most
>>natural road to open access: http://www.eprints.org/ 
>>And since then its accumulating momentum seems to be confirming that this
>>is indeed so: http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2212.html
>>http://www.ecs.soton.ac.uk/~harnad/Temp/tim.ppt
>>
>>
>>>  The primary sense of belonging
>>>  of a scholar in her research activities is with the disciplinary
>>>  community of which she thinks herself a part... It certainly
>>>  is not with the institution. 
>>
>>That may or may not be the case, but in any case it is irrelevant to
>>the question of which is the more promising route to open-access. Our
>>primary sense of belonging may be with our family, our community,
>>our creed, our tribe, or even our species. But our rewards (research
>>grant funding and overheads, salaries, postdocs and students attracted
>>to our research, prizes and honors) are intertwined and shared with our
>>institutions (our employers) and not our disciplines (which are often
>>in fact the locus of competition for those same rewards!)
>>
>>
>>>  Therefore, if you want to fill
>>>  institutional archives---which I agree is the best long-run way
>>>  to enhance access and preservation to scholarly research--- [the]
>>>  institutional archive has to be accompanied by a discipline-based
>>>  aggregation process. 
>>
>>But the question is whether this "aggregation" needs to be the "feeding"
>>of institutional OAI archive contents into disciplinary OAI archives, or
>>merely the "feeding" of OAI metadata into OAI services.
>>
>>
>>>   The RePEc project has produced such an aggregator
>>>  for economics for a while now. I am sure that other, similar
>>>  projects will follow the same aims, but, with the benefit of
>>>  hindsight, offer superior service. The lack of such services
>>>  in many disciplines,  or the lack of interoperability between
>>>  disciplinary and  institutional archives, are major obstacle to
>>>  the filling  the institutional archives.  There are no
>>>  inherent contradictions between institution-based archives
>>>  and disciplinary aggregators,
>>
>>There is no contradiction. In fact, I suspect this will prove to be a
>>non-issue, once we confirm that (a) we agree on the need for
>>OAI-compliance and (b) "aggregation" amounts to metadata-harvesting and
>>OAI service-provision when the full-texts are in the institutional
>>archive are OAI-compliant (and calls for full-text harvesting only
>>if/when they are not). Content "aggregation," in other words, is a
>>paper-based notion. In the online era, it merely means digital sorting
>>of the pointers to the content.
>>
>>
>>>  In the paper that Stevan refers to, Cliff Lynch writes,
>>>  at http://www.arl.org/newsltr/226/ir.html
>>>
>>>cl> But consider the plight of a faculty member seeking only broader
>>>cl> dissemination and availability of his or her traditional journal
>>>cl> articles, book chapters, or perhaps even monographs through use of
>>>cl> the network, working in parallel with the traditional scholarly
>>>cl> publishing system.
>>>
>>>  I am afraid, there more and more such faculty members. Much
>>>  of the research papers found over the Internet are deposited
>>>  in the way. This trend is growing not declining.
>>
>>You mean self-archiving in arbitrary non-OAI author websites? There is
>>another reason why institutional OAI archives and official institutional
>>self-archiving policies (and assistance) are so important. In reality,
>>it is far easier to deposit and maintain one's papers in institutional
>>OAI archives like Eprints than to set up and maintain one's own website.
>>All that is needed is a clear official institutional policy, plus
>>some startup help in launching it. (No such thing is possible at a
>>"discipline" level.)
>>http://www.ecs.soton.ac.uk/~lac/archpol.html 
>>http://www.eprints.org/self-faq/#institution-facilitate-filling 
>>http://www.ecs.soton.ac.uk/~harnad/Temp/Ariadne-RAE.htm
>>http://paracite.eprints.org/cgi-bin/rae_front.cgi
>>
>>
>>>cl> Such a faculty member faces several time-consuming problems. He or
>>>cl> she must exercise stewardship over the actual content and its
>>>cl> metadata: migrating the content to new formats as they evolve over
>>>cl> time, creating metadata describing the content, and ensuring the
>>>cl> metadata is available in the appropriate schemas and formats and
>>>cl> through appropriate protocol interfaces such as open archives
>>>cl> metadata harvesting.
>>>
>>>  Sure, but academics do not like their work-, and certainly
>>>  not their publishing-habits, [to] be interfered with by external
>>>  forces. Organizing academics is like herding cats!
>>
>>I am sure academics didn't like to be herded into publishing with the
>>threat of perishing either. Nor did they like switching from paper to
>>word-processors. Their early counterparts probably clung to the oral
>>tradition, resisting writing too; and monks did not like be herded from
>>their peaceful manuscript-illumination chambers to the clamour of
>>printing presses. But where there is a causal contingency -- as there is
>>between (a) the research impact and its rewards, which academics like as
>>much as anyone else, and (b) the accessibility of their research -- academics
>>are surely no less responsive than Prof. Skinner's pigeons and rats to
>>those causal contingencies, and which buttons they will have to press 
>>in order to maximize their rewards!
>>http://www.ecs.soton.ac.uk/~harnad/Temp/self-archiving.htm
>>
>>Besides, it is not *publishing* habits that need to be changed, but
>>*archiving* habits, which are an online supplement, not a substitute,
>>for existing (and unchanged) publishing habits.
>>
>>
>>>cl> Faculty are typically best at creating new
>>>cl> knowledge, not maintaining the record of this process of
>>>cl> creation. Worse still, this faculty member must not only manage
>>>cl> content but must manage a dissemination system such as a personal Web
>>>cl> site, playing the role of system administrator (or the manager of
>>>cl> someone serving as a system administrator).
>>>
>>>  There are lot of ways in which to maintain a web site or to get
>>>  access to a maintained one. It is a customary activity these days and
>>>  no longer requires much technical expertise. A primitive integration
>>>  of the contents can be done by Google, it requires  no metadata.
>>>  Academics don't care  about long-run preservation, so that problem
>>>  remains unsolved. In the meantime, the academic who uploads papers to a web
>>>  site takes steps to resolve the most pressing problem, access.
>>
>>Agreed. And uploading it into a departmental OAI Eprints Archive is 
>>by far the simplest way and most effective way to do all of that. All it
>>needs is a policy to mandate it:
>>http://www.ecs.soton.ac.uk/~lac/archpol.html
>>
>>
>>>cl> Over the past few years, this has ceased to be a reasonable activity
>>>cl> for most amateurs; software complexity, security risks, backup
>>>cl> requirements, and other problems have generally relegated effective
>>>cl> operation of Web sites to professionals who can exploit economies of
>>>cl> scale, and who can begin each day with a review of recently issued
>>>cl> security patches.
>>>
>>>  These are technical concerns. When you operate a linux box
>>>  on the web you simply fire up a script that will download
>>>  the latest version. That is easy enough. Most departments
>>>  have separate web operations. Arguing for one institutional
>>>  archive for digital contents is akin to calling for a single web
>>>  site for an institution. The diseconomies of scale of central
>>>  administration impose other types of costs that the ones that it was to
>>>  reduce. The secret is to find a middle way.
>>
>>I couldn't quite follow all of this. The bottom line is this: The free
>>Eprints.org software (for example) can be installed within a few days. It
>>can then be replicated to handle all the departmental or research group
>>archives a university wants, with minimal maintenance time or costs. The
>>rest is just down to self-archiving, which takes a few minutes for the
>>first paper, and even less time for subsequent papers (as the repeating
>>metadata -- author, institution, etc., can be "cloned" into each new
>>deposit template). An institution may wish to impose an institutional
>>"look" on all of its separate eprints archives; but apart from that,
>>they can be as autonomous and as distributed and as many as desired:
>>OAI-interoperability works locally just as well as it does globally.
>>
>>
>>>cl> Today, our faculty time is being wasted, and expended ineffectively,
>>>cl> on system administration activities and content curation. And,
>>>cl> because system administration is ineffective, it places our
>>>cl> institutions at risk: because faculty are generally not capable of
>>>cl> responding to the endless series of security exposures and patches,
>>>cl> our university networks are riddled with vulnerable faculty machines
>>>cl> intended to serve as points of distribution for scholarly works.
>>>
>>>  This is the fight many faculty face every day, where they
>>>  want to innovate scholarly communication, but someone
>>>  in the IT department does not give the necessary permission
>>>  for network access...
>>
>>I don't think I need to get into this. It's not specific to
>>self-archiving, and a tempest in a teapot as far as that is concerned. An
>>efficient system can and will be worked out once there is an effective
>>institutional self-archiving policy. There are already plenty of excellent
>>examples, such as CalTech: 
>>http://library.caltech.edu/digital/ 
>>See also:
>>http://software.eprints.org/#ep2
>>
>>Stevan Harnad
> 
> 


-- 
=====================================================================
hussein suleman ~ hussein@cs.uct.ac.za ~ http://www.husseinsspace.com
=====================================================================