[UPS] RE: uni- and multi-disciplinary settings on ePrints

Carl Lagoze lagoze@cs.cornell.edu
Tue, 27 Jun 2000 07:22:15 -0400


All,

I have to jump in and say that IMHO there is a mistaken focus on
partitions as the proper means of segmenting up the information space
(of eprint archives, of OAI, of anything).  

First, a little history which I meant to say in San Antonio but never
got around to it.  The idea of partitions was a direct result of the
beginning of our (NCSTRL) collaboratioin with LANL a few years ago.  At
that point (and still has) LANL had this, in my opinion, somewhat
misguided and non-scalable legacy notion of fixed partitions in its
archive.  They wanted to make these partitions visible at the protocol
level and thus was born the notio of repository parititions in Dienst.
(Paul and Simeon, please understand that I'm not meaning to disparage
your work or arXiv.  As noted below the partition concept is just fine
for your application!)

The Intention! -- These were and still are purely intended as a
repository local, administrative convenience. Basically a very simple
way of dividing up an individual repository and certainly not meant as a
means for partitioning up some larger information space.  

I have never felt very comfortable with this whole idea esp. the way
that it is implemented at LANL - e.g., authors decide which partition a
paper should be placed and users search within partitions.  This works
(maybe) just great in a closed and highly expert community such as those
who use the LANL archive but breaks down badly in other communities.
Esp. at the user end where searching within a partition makes little
sense.

Extending this notion across repositories/archives really starts to
break down.  We seen this confusion in the OAi discussions.  All the
sudden we're trying to figure out what is the right way to
"universalize" partitions?  What is the way of registering partitions?
What do these partitions mean anyway?

The Reality! -- There is no "right" way to partition information spaces
(just as there is no "right" metadata).  There are many ways to
partition information spaces that are customized to different user
groups.  Furthermore, partitioning of information spaces is completely
independent of archive location (e.g., the set of information in a
partition may some content from repository A, all from repository B,
some from repository zz, etc.).  So, mapping individual respository
partitions to a global or even intranet cross-repository partitioning
system breaks down due to 1) projecting local decisions to global
decisions and 2) ignoring the fact that any one document in a repository
should be able to exist in more than one global partition.

Solution? -- Back in '98 I wrote
http://www.dlib.org/dlib/november98/lagoze/11lagoze.html, in which I
talked about a collection abstraction in distributed information spaces.
At this point we implement such an idea in Diesnt as the means of
creating the NCSTRL collection that spans multiple repositories.  Our
implementation is imperfect but I maintain is on the right track.  Over
the next year we have funding and people to push this to the next and
hopefully correct implementation that will allow organizations and
instititions to create flexible collections that do (hopefully) scale
across multiple repositories and make it possible to aggregate documents
for multiple communities.

For now, please lets not try to push the partition thing beyond its
original goal or a merely repository local administrative convenience.

Finally, Rob and Stevan, please understand that I'm not trying to
criticize the work you've done on your eprint software.  I'm really
looking forward to seeing it in action and working with you on the idea
of overlaying more features of the Dienst protocol on it as we try to
scale from individual archives to federated information spaces.

Carl

> -----Original Message-----
> From: Stevan Harnad [mailto:harnad@coglit.ecs.soton.ac.uk]
> Sent: Tuesday, June 27, 2000 6:56 AM
> To: Robert Tansley
> Cc: Eric F. Van de Velde; 'Stevan Harnad'; john.ober@UCOP.EDU;
> ken.weiss@UCOP.EDU; Carl Lagoze; Ed Sponsler
> Subject: uni- and multi-disciplinary settings on ePrints
> 
> 
> Dear Eric, Ken et al:
> 
> The question of whether it will prove optimal for University Open
> Archives to be pluridisciplinary or unidisciplinary can be settled by
> actual practise. 
> 
> The ePrints archiving software is designed to be useable either way:
> Part of the local institution's parameter-setting and customization of
> the generic ePrints software can amount to turning other disciplines
> off if it is being used for just one department (or lab, or
> researcher).
> 
> Also, there should be a generic spectrum of discipline partitions that
> ePrints provides as a default (we are still looking for the optimal
> default one to use, and recommendations are welcome!), and then these
> can be added to. To preserve overall interoperability, it 
> would be best if
> such site-specific additions to an expanding open-partition 
> space could
> be percolated to all open-archives in some systematic way (but this is
> a technical issue that exceeds my own technical grasp!: Carl?)
> 
> What is certain is that, again, the philosophy of "minimalism plus"
> should prevail: We must not hold back, waiting for a final, ultimate,
> optimal solution, requiring more complicated compliance by 
> individuals.
> 
> Find an approximation that will "satisfice" to launch, fill, and bring
> up-to-speed a large number of universities' open archives right now.
> THEN the collective commitment that comes with having all those
> institutions' intellectual goods already minimum-plus-functional in
> the interoperable open-archives will ensure that the functionality
> grows, and that the growth comes in an already-shared collective
> convention.
> 
> So: "Satisficing" approximate partitions for now, optimizing 
> for later,
> once the open-archiving is irreversibly in motion.
> 
> Cheers, Stevan
> 
> On Tue, 27 Jun 2000, Robert Tansley wrote:
> 
> > "Eric F. Van de Velde" wrote:
> > > 
> > > Stevan, Rob,
> > > The tech guru for our preprint service (currently 
> consisting of NCSTRL) is
> > > Ed Sponsler. He is in today and tomorrow, but then takes 
> a (well-deserved)
> > > vacation. So, it may take a bit before we get into this.
> > > 
> > > However, I believe we may have similar issues. Until now, 
> the primary usage
> > > of Dienst has been within the NCSTRL context. We are 
> struggling with
> > > decisions on how to implement a Caltech-wide 
> cross-disciplinary archive.
> > > 
> > > Do we really have only one Caltech-wide archive with 
> partitions for
> > > individual options (departments). However, can these 
> partitions easily
> > > participate in disciplary federations?
> > 
> > This is an issue that hasn't fully been resolved by the 
> open archives
> > initiative, and is in fact the main issue I raised at the 
> OA workshop in
> > San Antonio. I will certainly be pushing to get this resolved.
> > 
> > > Another option is to create a repository for each 
> department and combine
> > > them through federation into a Caltech repository.
> > 
> > This does sound like a better option to me, as it would 
> ease some of the
> > difficulties involved in disciplinary federation (actually 
> harvesting in
> > the OA world.) Additionally, if individual archives are 
> smaller, this
> > does tend to improve their individual performance.
> > 
> > As well as the departmental archives, you could quite easily have a
> > Caltech "gateway" search engine, that could create an index 
> covering all
> > of the departmental archives, and search them all in a very 
> efficient
> > way. This separation of services (such as searching) and 
> data provision
> > brings many benefits.
> > 
> > > Occasionally, even the option of creating a repository 
> for every faculty
> > > member is mulled over, because there are quite a number of
> > > "independence-minded" faculty in this place.
> > > Question though is whether the federations remain 
> manageable under such a
> > > scenario...
> > 
> > You could allow each department a degree of freedom. For 
> example, using
> > the EPrints software, each department's archive could be given the
> > departmental "look and feel", if they have one. Additionally the
> > software allows each department to hold their own extra information
> > about documents (for example, "funding body"). Provided each archive
> > supports the open archives protocol, and provides the same central
> > metadata, the distributed searches performed by the Caltech search
> > gateway are not affected.
> > 
> > R
> > 
> > > --Eric.
> > > 
> > > -----Original Message-----
> > > From: Stevan Harnad [mailto:harnad@coglit.ecs.soton.ac.uk]
> > > Sent: Thursday, June 22, 2000 9:02 AM
> > > To: Eric F. Van de Velde
> > > Cc: Rob Tansley
> > > Subject: RE: EPrints Software Beta (fwd)
> > > 
> > > Hi Eric,
> > > 
> > > The link will come to you shortly, from Rob Tansley. 
> Meanwhile see:
> > > http://www.eprints.org/software.html
> > > 
> > > Chrs, Stevan
> > > 
> > > On Thu, 22 Jun 2000, Eric F. Van de Velde wrote:
> > > 
> > > > Stevan,
> > > > I would definitely be interested to take a look at 
> this. Did you mean to
> > > > include a link in your e-mail? I did not find a link to 
> the Beta on the
> > > > cogprints site.
> > > > --Eric.
> > > >
> > 
> > -- 
> >  Robert Tansley                    Tel: +44 (0) 23 80594492
> >  IAM Research Group                Fax: +44 (0) 23 80592865
> >  Electronics & Computer Science    
> http://www.ecs.soton.ac.uk/~rht96r/
> >  University of Southampton
> >  Southampton SO17 1BJ, UK
> > 
>