[UPS] research proposal for NSF ITR? deadline for letter of intent is N
Wed, 10 Nov 1999 08:59:03 -0500
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
I am writing to suggest a serious collaborative research proposal
effort. Please read what is below and send me a brief reply
no later than Friday 11/12.
After our meeting, a number went on to the Dublin Core 7 meeting in
Frankfurt and continued some of the discussion. Also, there has
been some email exchange involving Herbert and Carl, as well as
several at the University of Virginia, who Carl recommended become
involved, because of their enhancements and application of the Fedora
work from Cornell.
I've tried to summarize my view of all these discussions, in two
parts, below. The first part, A, addresses the Problems we are
tackling, couched in broad terms related to measuring impact on
society. The second part, B, elaborates the vision that I think
a number of us were trying to develop in terms of open digital libraries
using a component-based distributed architecture.
These are both rough, so while I encourage detailed comments, please
note that this was sent without the weeks of polishing needed; consider
the general ideas and see if they resonate.
Now, my question is, should our group write a serious proposal to NSF,
perhaps for the ITR program, that is NSF 99-167
I believe that we might have a chance to get funding to do some or
all of the research that is discussed below. Note that a letter of
intent is due Nov. 15, so we need to decide this soon.
I am willing to take the lead on this, but only if noone else wants
to. Carl is too busy, and I believe it would be hard for either
Herbert or the LANL group to get NSF support. If others think that
they are in a better position, I'd happily pass on this opportunity :-)
I realize that many in the group want their archives to grow and
prosper and while they support UPS, don't view that as their central
interest. Others may view the integrated UPS as an important area
of research in its own right. I think we need both groups, working
Anyhow, I now call for brief replies from each interested party to
answer the following questions. I believe a reply could fit in a
short paragraph if you are pressed for time - I hope everyone will
reply no later than by Friday 11/12.
1. Do you think we have a chance of getting NSF support through 99-167?
2. Do you think the issues raised in part A below define key
3. Does the outline in part B define a sensible follow-on for research
according to our group vision?
4. Would you like to be involved in this? If so, what would your role
be? (answer with as many as apply)
a- Work on own archive, but integrate it with UPS.
b- Focus on UPS and its architecture, development, evaluation.
c- Research, mostly on the architecture and software side.
d- Research, mostly on the sociological and usage side.
5. What other comments and suggestions do you have?
I look forward to your replies. Many thanks, Ed
- - - - - - - A. thoughts about research follow- - - -- - - -
If we go after funding for the UPS initiative, what are the most important
research questions, whose resolution will have the greatest impact, and
will help advance our understanding the most? Below are some candidates -
suggestions on others are welcome:
1. Research into the nature of scholarship and its change as a result of
- Will research build upon newer work than was possible before,
since the delays in learning about scholarly efforts are reduced?
Will that happen universally, or only in some situations?
For example, will that happen only in less well known places
that would not have heard through "invisible colleges", thus
enfranchising smaller groups?
- Will scholars look at more works than before since it is easier?
If so, how much of those works will be examined? Which parts?
- Will looking at such works not previously used (e.g., theses)
provide real benefit? Which types or genres are most beneficial?
Or what combinations?
- Will scholarly habits shift in a significant way to use UPS?
Instead of works that cost (lots) more? Instead of works that are
not as readily available (e.g., journals not available
For what types of scholarly activities / tasks? For what learning
Will there be more cross-disciplinary research?
2. Digital library architecture
- What is the "right" component-wise decomposition for digital libraries
to support interoperability most easily and effectively? Can we
- Can we demonstrate its practicality? Scalability? Efficiency? The ease
with which new collections are made available? New virtual
collections? New services? New combinations of services?
- Can we demonstrate its usability? Effectiveness? What are the effects
of the decomposition on the complexity and performance of services?
What are the effects on users of this decomposition / synthesis of
services - do they become hard to understand? Hard to manage?
How long does learning take?
- Will it be easily adopted by many repository managers? Which ones?
Why? Why not - in case some don't support it?
- How do we deal with lack of metadata provided, to synthesize it? What
are the effects of lack of metadata? Of very detailed metadata? How
much of it is used? How often? How does this compare to only having
full-text searching and linking?
- How does this compare with the current situation with many separate
collections and services?
3. Studies possible
- Study for various user communities, activities, tasks, periods
- Measure efficiency and effectiveness.
- Determine relative effectiveness across user communities.
- What combinations of collections and services into virtual collections
are most popular? most beneficial?
- Which services are most popular, beneficial?
- What combinations of services are most popular, beneficial? What usage
scenarios evolve (e.g., visualize collection, browse, search for
similar items to ones identified, and then search with
- How do patterns of use of services and their combinations vary across
the communities, activities, tasks, etc.?
- - - - - - -- - -B. One possible approach to funding- - - - - - - -- - - -
Title: Covering the Grey - from Santa Fe:
Universal, Heterogeneous, Distributed, Collaborative Self-Archiving
Overview: This project aims to support the objectives articulated at the
Fe meeting of October 21-22, 1999 to build a worldwide infrastructure for
integrated access to the gray and related literature (theses,
e-prints, ...) wherein authors are involved in self-archiving processes.
It focuses on research related to architectures, collections, services,
tools. It emphasizes information management, with middleware that
is part of a scalable information infrastructure, supported with effective
human-computer interfaces. It strongly involves HCI experts, from design
through large-scale analysis of real usage. It extends the scholarly
community's ability to access the latest research results, both according
traditional disciplinary boundaries and through new cross-disciplinary
knowledge representations and organizations.
This project will support a variety of architectural instantiations, based
- Content and services will make use of the distributed capabilities of
- "Collections" refer to containers and other structures related to
* (Virtual )Collections
- We can have a directed acyclic graph of arbitrary complexity, building
virtual collections from other collections, along with query-based
restrictions/views of collections built upon.
- See sections B and C below for more details.
- A variety of DL-related content elements, software tools, and
services exist or will be developed.
- These should be composable easily as components of larger, more
services and systems.
- Key services will be supported, with high-performance, scalable to
large numbers of users.
- Services range from those surrounding a "raw" collection, to those
a virtual collection, to those transforming or delivering
to those supporting individuals or communities of uses, etc.
See section C below for more details.
- Components of a given type will be registered so they can be located
Note: Approaches based on agent technology, federation, centralization with
replication, information buses, etc. can all be supported. So too
can be buckets. But this effort is neutral to the use of buckets as in
the SODA effort, etc.
* Collections all support a repository access protocol
- Given an ID, return one or both of:
- native form of its content, which hopefully will include
either implicitly or explicitly
- standard form (according to collection type) of its content
- All collections are self-describing, and can describe their
- Identifiers used are persistent.
* Collections conform to a multiple inheritance taxonomy, with various
- One facet specifies organizations, e.g., university, college, dept.,
Note: Case study work by Neill Kipp at Virginia Tech, looking for
patterns in the digital library field, has identified the following
types and examples of digital libraries. Some of the facets below
to this list too.
Community (NDLTD, CSTC, CDDC)
Publisher (ACM-DL, D-Lib, Lexis)
Warehouse (NZDL Music Library, Amazon.com, Marian)
Museum (VTSF, Blake Archive)
Library (NCSU MyLibrary, LA Courts Information Support)
- One facet specifies subjects/disciplines, which can be structured
hierarchically, e.g., science then computing,
digital libraries then metadata then DC then DC.title
- One facet distinguishes physical (e.g., LoC, Virginia Tech) or
virtual (NDLTD, NCSTRL)
Note: virtual collections can be constructed from physical or virtual
collections, by identifying them, possibly with a filter that
identifies a subset of interest.
- One facet distinguishes types, with special methods as appropriate
for the type
- metadata - handle Dublin Core qualifiers and RDF in intelligent
- document - returning all or part(s)
- multimedia - with methods for returning objects (compressed,
uncompressed), or for streaming
- authority control - with de-duping
- terms and conditions - for simple types that can easily be managed,
like worldwide, educational use, for campus community, for
- thesaurus/cluster - with methods for returning an object, or a
neighborhood of object
* Collections included in the development and testing in this project
- NCSTRL - distributed across organizations, single discipline (CS)
- NDLTD - distributed across organizations, genre (ETDs) across all
- LTRS - NASA reports collection, from a single organization (multiple
sites), on a (broad) discipline
- xxx - centralized repository, serving multiple disciplines (Physics,
- Economics preprints - RePEc collection harvested from multiple
organizations, single discipline
- CogPrints - Cog. Sci. collection harvested from multiple organizations,
- SLAC/SPIRES - physics collection from multiple organizations, single
- International physics departments - Harvest collection from multiple
organizations, single discipline
- other groups from Santa Fe meeting, plus additional volunteers
* Support programmatic access
* Existing foundations for services include the following software
- Dienst extended for the Santa Fe initiative
- SFX (see recent D-Lib Magazine article and 2 earlier this year)
- MARIAN, NDLTD, and other efforts at Virginia Tech
* Types of services
- Maybe Library of Congress will assist with a pilot - ref. Caroline
- Maybe OCLC will assist with its authority information - ref. T.
- In connection with NDLTD and work in Germany, their server about
teachers may be supporting this.
- Statistical - analyzing properties and reporting, to help with other
services such as visualization.
- Clustering - perhaps using software from H. Chen in Arizona.
- Summarization - perhaps using Stanford or Xerox tools
- Index - MARIAN and other systems
- Search - MARIAN and other systems
- Including sophisticated use of content+context+links,
- Disseminate - providing various forms and versions
- Transform - supporting dissemination and archiving/preservation
- manage conversions among MARC, Dublin Core, ReDIF, RFC-1807, ...
- Thesaurus/concept space - manage MeSH, ERIC, ACM categories, ...
- Browse - support navigation through thesaurus/concept space,
document space, ...
- Visualize - provide special support to manage collections, results
sets, concepts, ...
- Certification - authorization, authentication, and resolution so know
best terms and conditions for a given user regarding any restricted
digital object (e.g., for SFX)
- Link (e.g., using SFX) - from citations directly to digital objects
in best form(s), using certification services
- Mirroring, replication - for robustness, for political and
- Archiving and preservation
- maybe connect harvesting activities at time of collection/cleanup
to one or more 3rd party services to ensure preservation of
- use transform services to shift archive contents to newer
representations as needed
- by author, those authorized by author, for public view
- by anyone, for note-taking, for personal collection
- Editorial processing and review, certification of quality
- Other workflow management services
- For educational resources, like CSTC (www.cstc.org)
- For theses and dissertations (see software provided to
administrators, from link near bottom right of page
- Submission by author
- Of digital objects and metadata
- Including accurate identification using thesaurus/concept
- Including careful identification of every reference/citation
so each can be easily resolved (e.g., with SFX)
- With educational/training support about the process and principles
* The underlying mechanisms to make all this work involve principles and
- Aggregation (at varying levels of static -> dynamic, with harvesting
- Automation (improving workflow, shifting to dynamic capabilities
* We build upon various projects related to DC, RDF, XML
* We build upon relevant work at Cornell, Stanford, OCLC, ...
* HCI involved from beginning of design of the services and tools
* Remote evaluation of users working with the testbed as part of normal
* Evaluation of users in NSF-funded HCI labs at Virginia Tech
* Of the research to be undertaken
- Promote a DL industry
- Promote scholars making their research available sooner, in more
in ways wherein discovery and reuse are supported better
* Of the services to be developed (testbed)
- Promote sharing of research results
- Reduce costs
- Increase speed and convenience
- Increase amount of electronic publishing
- Promote cross-disciplinary work
- Promote feasible efforts to develop new archives and services
- Promote building of infrastructure at research universities
- Promote knowledge of epublishing, digital libraries, IPR, ... of
name="Edward A. Fox (E-mail).vcf"
filename="Edward A. Fox (E-mail).vcf"
FN:Edward A. Fox (E-mail)
ORG:Virginia Tech;Computer Science
TEL;HOME;VOICE:+1 (540) 552-8667
TEL;CELL;VOICE:+1 (540) 230-6266
ADR;WORK:;608 McBryde Hall;203 Craig Drive;Blacksburg;VA;24060;United States of America
LABEL;WORK;ENCODING=QUOTED-PRINTABLE:608 McBryde Hall=0D=0A203 Craig Drive=0D=0ABlacksburg, VA 24060=0D=0AUnited =
States of America
ADR;HOME:;;203 Craig Drive;Blacksburg;VA;24060;United States of America
LABEL;HOME;ENCODING=QUOTED-PRINTABLE:203 Craig Drive=0D=0ABlacksburg, VA 24060=0D=0AUnited States of America