[UPS] research proposal for NSF ITR? deadline for letter of intent is N ov. 15

Thu, 11 Nov 1999 11:42:42 +0000 (GMT)

On Wed, 10 Nov 1999 fox@vt.edu wrote:

>     http://www.nsf.gov/cgi-bin/getpub?nsf99167

> 1. Do you think we have a chance of getting NSF support through 99-167?

It is an extremely important and timely domain, and Open Archives is
particularly well placed to make an important contribution, consisting
as it does, of most of the principal players worldwide.

What it needs is a coherent project. I like what you wrote, but there is
one potentially fatal fault line in our group, and "covering the grey"
(as well as the reluctance to accept <journal> as a core meta-tag) is
symptomatic of that fault-line [as was our old name, "UPS"]).

>From my point of view, this is not exclusively or even primarily about
the "grey" (= non-journal, non-peer-reviewed) literature. Nor is it
about inventing new forms of "vetting" to replace classical peer review.

Initiative after initiative keeps foundering on this point, and I think
it's because it is simply the wrong starting point. Perhaps conventional
journals will survive, perhaps they will not; perhaps classical peer
review will continue to be the norm, perhaps it will be superseded by
something else.

My point is that THAT IS NONE OF OUR BUSINESS: We are not experts or
researchers on peer review; we are not journal editors or publishers.
That is not our domain of expertise, and that is not where the substance
of our initiative lies.

I will again invoke the Los Alamos Lemma: LANL was not committed to any
a priori stance regarding the "grey literature" -- even though, as it
happens, Paul happens to have some views of his own about journals,
refereeing, and their roles: It doesn't matter; those views played no
causal role in the implementation or the success of LANL. LANL was made
into a self-archiving repository for physicists, and they readily made
use of it, for BOTH their "black" unrefereed preprints and their
"white" refereed reprints -- and any of the grey shades in between. And
this has been true from the very beginning.

Consequently, if an open archiving initiative is to succeed it MUST be
equally inclusive from the very beginning, and must make NO a priori
commitments to hypothetical changes in the journal/refereeing system.
There might eventually occur such changes, but it is none of our
business, and certainly not our business to design the open archives in
any way that biasses them towards any hypothetical outcome of this
sort.

There is certainly one way that we are going against the journal status
quo: We are making it possible for authors to give away their own
papers publicly for free. That is indeed a commitment to a change. But
it's the ONLY commitment we need to make (indeed, it is close to our
raison d'etre, since the digitization and online availability of the
literature is taking place anyway, irrespective of our efforts; so our
mission is the FREEING of the literature online, not simply the
facilitation of its going online simpliciter).

So I propose that our explicit mandate be: freeing the research
literature (black, white and grey) on-line, for one and all, by
facilitating the creation and interoperability of Open Archives. No
prejudices whatsoever about the fate of journals or of classical peer
review. In and of itself, open archiving will create a magnificent
empirical arena for innovation and change, but that must NOT be our
direct doing. We just provide for the open archiving, and then the
users of the open archives can do with it and make of it what they
will.

With that homily in mind, I proceed to Ed's further items:

> 2. Do you think the issues raised in part A below define key
>    research questions?

Yes, but only after all the (substantial) bias toward the nonrefereed
(or nonclassically refereed) corpus is completely redressed, so
everything we say and propose applies equally to both the classical
refereed literature and its complement.

> 3. Does the outline in part B define a sensible follow-on for research
>    according to our group vision?

With the above proviso, it certainly does.

> 4. Would you like to be involved in this? If so, what would your role
>    be?  (answer with as many as apply)
>    a- Work on own archive, but integrate it with UPS.
>    b- Focus on UPS and its architecture, development, evaluation.
>    c- Research, mostly on the architecture and software side.
>    d- Research, mostly on the sociological and usage side.

Yes, in all four capacities a-d, but with a much stronger commitment to
Open Archives [formerly: UPS] than to my own Archives [CogPrints &
Psycoloquy]. Indeed, we are now genericizing CogPrints to make it into
OpenArchive software, useable by any university for self-archiving the
research papers in all of its disciplines, interoperable, and fully
Dienst/Sante-Fe compliant (and simple to install, maintain and use).

> 5. What other comments and suggestions do you have?

>  - - - - - - - A. thoughts about research follow- - - -- - - - 
> If we go after funding for the UPS initiative, what are the most important
> research questions, whose resolution will have the greatest impact, and
> will help advance our understanding the most?  Below are some candidates -
> suggestions on others are welcome:

I agree with Ed that the interest will be in (1) providing the means of
freeing the literature interoperably via Open Archiving, and then (2)
investigating how that changes how people use the literature and do
research.

> 1. Research into the nature of scholarship and its change as a result of
> UPS:
>     - Will research build upon newer work than was possible before,
>        since the delays in learning about scholarly efforts are reduced?
>        Will that happen universally, or only in some situations?

Yes, a free online literature's utilization can be monitored, and
although it cannot strictly be compared with its non-online predecessor,
there are many ways we can approximate and make educated guesses about
how the useage patterns compare and change. (And with huge archives we
can also ask users in various ways to consciously compare their current
practises with their prior ones; this can be cross-checked against more
objective data from citation patterns, hits, cite-navigation, etc.).

>        For example, will that happen only in less well known places
>        that would not have heard through "invisible colleges", thus
>        enfranchising smaller groups?

Clearly open access to more of the literature, earlier, can only help.
Let's let this come out in the analysis, without being too concerned
about formulating hypotheses.

>     - Will scholars look at more works than before since it is easier?
>        If so, how much of those works will be examined? Which parts?

Good questions. Ways of monitoring useage that are more sensitive than
just hits or down-loads certainly need to be developed and used.

>     - Will looking at such works not previously used (e.g., theses)
>        provide real benefit?  Which types or genres are most beneficial?
>        Or what combinations?

Again, this is biassed toward a specific (and rather simplistic)
hypothesis. Of course more access to more stuff, earlier, can only help,
but that is almost banal. We will be asking much more specific
questions. (And note that if you remove the "grey" bias and ask, for
example, "will being able to freely access the refereed journal literature
provide real benefit?" the answer is so obvious that the banality of the
question is revealed.)

>     - Will scholarly habits shift in a significant way to use UPS?

LANL already gives us data on the likely outcome, but we will be in a
position to quantify and track the changes on a much larger and more
diverse sample.

>         Instead of works that cost (lots) more? Instead of works that are
>         not as readily available (e.g., journals not available
>         electronically)?

Answer obvious: free and online is obviously better than toll-gated;
this only sounds substantive under the bias that it will be pitting one
literature VERSUS another, rather than simply freeing it all.

>         For what types of scholarly activities / tasks?  For what learning
>         activities?
>         Will there be more cross-disciplinary research?

Certainly worthwhile to see how the LANL experience generalizes across
disciplines, and whether there prove to be any SUBSTANTIVE differences
in useage patterns and implementation. However, we will be up against
NONsubstantive differences first, namely, how soon a particular
disciplines "gets it" (i.e., realizes what open archiving is, what it's
for, and why it's a good idea); this will vary trivially from discipline
to discipline as an INITIAL CONDITION, but with no predictive value as
to what the pattern will be in the STEADY STATE, once they've actually
tried it, reached critical mass, and seen what it amounts to.

In other words, initial discipline differences are much more likely to
be misleading than informative.

> 2. Digital library architecture
>     - What is the "right" component-wise decomposition for digital libraries
>         to support interoperability most easily and effectively?  Can we
>         build it?

Is this part of the open-archive mandate? I know a lot of librarians
think in terms of collections and collection management, but I have a
suspicion that the phenomenon might be obsolescent.

Tagging, interoperability, taxonomy, indexing -- all these are real
functions.

Maybe, not being a librarian, I don't really understand what this
architectural question is.

>     - Can we demonstrate its practicality? Scalability? Efficiency? The ease
>         with which new collections are made available? New virtual
>         collections? New services? New combinations of services?

A bit scattershot (esp. the "collections" standpoint). But certainly
services such as archiving itself, search, and linking (esp. citations)
are services that are central to our mandate.

>     - Can we demonstrate its usability? Effectiveness?  What are the effects
>          of the decomposition on the complexity and performance of services?
>          What are the effects on users of this decomposition / synthesis of
>          services - do they become hard to understand? Hard to manage?
>          How long does learning take?

Not sure what all this means. Sounds again like "collections" thinking:
I would invoke the Los Alamos Lemma again: How do these considerations
apply (or not) to LANL?

>     - Will it be easily adopted by many repository managers?  Which ones?
>         Why? Why not - in case some don't support it?

Not sure what is envisioned here in the interaction between "collections
managers" at the library/institution level, and the various open-archive
managers.

>     - How do we deal with lack of metadata provided, to synthesize it?  What
>         are the effects of lack of metadata? Of very detailed metadata? How
>         much of it is used? How often? How does this compare to only having
>         full-text searching and linking?

Terrific questions. Should be addressed by Open Archive Initiative.

>     - How does this compare with the current situation with many separate 
>         collections and services?

Vague

> 3. Studies possible
>     - Study for various user communities, activities, tasks, periods
>         of time:
>     - Measure efficiency and effectiveness.
>     - Determine relative effectiveness across user communities.

Rather vague. We need to specify potential measures and analyses in
advance.

>     - What combinations of collections and services into virtual collections
>        are most popular? most beneficial?

Involves assumptions; may not all be relevant

>     - Which services are most popular, beneficial?
>     - What combinations of services are most popular, beneficial? What usage
>         scenarios evolve (e.g., visualize collection, browse, search for
>         similar items to ones identified, and then search with
> restrictions)?
>     - How do patterns of use of services and their combinations vary across
>         the communities, activities, tasks, etc.?

Mostly very worthwhile questions; we need to explicitly think through
a convincing sample of the measures and analyses we propose to perform.
The database (even if it were only LANL) is so rich that plenty of
sensible systematic analyses that may prove very informative
can be (and are being!) performed.

--------------------------------------------------------------------
Stevan Harnad                     harnad@cogsci.soton.ac.uk
Professor of Cognitive Science    harnad@princeton.edu
Department of Electronics and     phone: +44 23-80 592-582
Computer Science                  fax:   +44 23-80 592-865
University of Southampton         http://www.cogsci.soton.ac.uk/~harnad/
Highfield, Southampton            http://www.princeton.edu/~harnad/
SO17 1BJ UNITED KINGDOM