[UPS] RE: uni- and multi-disciplinary settings on ePrints - c omment re collections, partitions

Tue, 27 Jun 2000 09:26:59 -0400

Hi!

Just a short reply.  Thanks to Stevan and Carl for thei msgs.
I believe we should leave "partitions" in our model
and let them be used as desired.

First, from a math/computer science perspective, collections
implement the structuring abstraction commonly called "set".
Sets are defined by some predicate or enumeration of content.
Those can be based on various principles, algorithms, theories,
human habits, etc.

As a primitive math concept, sets can be used to build much
more complex concepts.  They can lead to trees, as in taxonomies,
for example. There are the standard sets most are familiar with, and
other formulations, like fuzzy sets, which have been shown to
fit a bit better in many information retrieval situations.
I don't see why partitions as in OAi cannot be even this general.

The collection abstraction has been around for a long time.
The Hyper-G/HyperWave system has "collection" as a primitive,
as part of their attempt to "fix" basic problems in the WWW.

The fact that Cornell efforts use collections is an accomodation
of the habit of librarians and many other groups of scholars to
collect items and organize them using this mechanism.

Second, regarding partitions in OAi, there are many uses.
One can have well-defined partitions, or one can have subjective
or fuzzy ones.  On the objective side one can use various
facts, like dates, to partition a collection by year, month, ...
Many information collections are partitioned according to this
scheme - see Dialog and other such services.  Or, consider NDLTD.
We will partition the overall NDLTD-wide collection based first
on institution - where did a student receive complete their
work on a thesis or dissertation. Of course even this simple
partitioning allows variation, as can be seen in Internet domain
names. So, one can partition in a 2 level scheme where there is
the "root" and then all the universities are at level 2. Or one
can partition along the way by groupings - NDLTD members are
not only individual universities but also regions (Catalunya in
Spain, Ohio in USA), countries (e.g., Portugal, Australia), etc.

Third, as Carl and Stevan indicate, there is the partitioning that
comes from the expected growth of OAi services.  As people harvest
from other collections, their new collection may be partitioned
based on the collections from which they harvest. And that harvesting
can be hierarchical in turn, with archives build from groups of
archives, often in layers.

Finally, in keeping with our general habit of supporting unknown
future use, we should not be prescriptive here. I want to stimulate
adoption of the idea of harvesting and making archives available,
and of those who self-archive learning about key concepts of library
and information science, as well as supporting scholarly communication.
Whole industries (consider Yahoo) have been built around people
wanting to organize collections of information to help others' activities.
Let us allow this.

Regards, Ed

-----Original Message-----
From: Carl Lagoze [mailto:lagoze@cs.cornell.edu]
Sent: Tuesday, June 27, 2000 7:22 AM
To: 'Stevan Harnad'; Robert Tansley
Cc: Eric F. Van de Velde; john.ober@UCOP.EDU; ken.weiss@UCOP.EDU; Ed
Sponsler; 'ups@vole.lanl.gov'
Subject: [UPS] RE: uni- and multi-disciplinary settings on ePrints

All,

I have to jump in and say that IMHO there is a mistaken focus on
partitions as the proper means of segmenting up the information space
(of eprint archives, of OAI, of anything).  
...