[UPS] Re: Dienst protocol: Partitions and partitionspecs

Robert Tansley rht96r@ecs.soton.ac.uk
Thu, 25 May 2000 14:23:36 +0100


Thanks for the replies. I've been talking about this with some colleagues
here at Southampton. One thing that seems unclear is what exactly the
purpose of the partitions in the spec is. Is there any meaning attached
to partitions? e.g. might a searcher specify a partition or partitions as
one of the criteria in a search? In which case, I think some controlled
language classification needs to be introduced, as otherwise specifying a
partition in a search is likely to retrieve results from a single
partition in a single archive, which isn't really taking full advantage
of archive interoperability.

(More points below)

Carl Lagoze wrote:
> 
> Hi Rob,
> 
> Really sorry for the slow responses.  I was away in Europe and only able to
> do a minimum of email.  I really appreciate the probing questions and please
> let me know if my answers are satisfactory.
> 
> Carl
> 
> ------------------------------------------------------------------------
> ----
> Carl Lagoze, Digital Library Scientist
> Department of Computer Science, Cornell University
> Ithaca, NY 14853 USA
> Phone: +1-607-255-6046
> FAX: +1-607-255-4428
> E-Mail: lagoze@cs.cornell.edu
> WWW: http://www.cs.cornell.edu/lagoze/lagoze.html
> 
> > Referring to
> > http://www.cs.cornell.edu/cdlrg/dienst/protocols/OpenArchivesD
> > ienst.htm:
> >
> > I'm unclear about partitionspecs, particularly in relation to the List
> > Contents verb. From the grammar given in section 2.2.1, it
> > appears that a
> > partitionspec consists of 1 or more partition names. The
> > example given,
> > "Florida;Frenetics" is rather ambiguous. Does this refer to two
> > partitions, "Valley View University of Florida" and "Department of
> > Frenetics"? i.e. If this is sent as part of a List Contents
> > request, do I
> > return documents in "Valley View University of Florida" and
> > documents in
> > "Department of Frenetics" partitions?
> >
> > Or, does that partitionspec refer to the single partition
> > "Department of
> > Frenetics", with the "Florida;" part just specifying the
> > ancestry in the
> > hierarchy? In which case, since the example seems to pertain to the
> > hierarchy given in section 2.2, surely it should include the
> > Institutions
> > node in the hierarchy, to become something like
> > "Institutions;Florida;Frenetics"?
> 
> Your confusion is understood.  A partitionspec refers to a single partition;
> essentially a path expression in a partition tree.  I agree that there is
> some confisuion between the examples. I've cleaned up the spec to clear it
> up hopefully.
> >
> > The example in the description of the List Contents verb,
> > "partitionspec=physics;hep" seems to suggest the latter,
> > which would mean
> > that a partitionspec can only be used to specify a single
> > partition. Is
> > this correct?
> 
> yes, that is correct
> >
> > If so, this does give rise to another ambiguity. If, using the arXiv
> > example, I were to send a List Contents request with
> > "partitionspec=physics", does that implicitly include "physics;hep",
> > "physics;ex", "physics;lat" etc.? Or, is there an assumption
> > that actual
> > documents will only be stored in the "leaves" of the
> > partition hierarchy,
> > so you should only ever specify leaves in a partitionspec?
> 
> The specification does not specify this.  THe partitions are meant for
> administration purposes and don't reflect any sense of "containment".  As
> noted, a document may be "contained" in more than one partition.  The
> intention, in retrospect was that documents are only contained in the
> leaves.

What exactly do you mean by "administration purposes"?

I can see how allowing documents in interior nodes does introduce issues
for services querying a data provider. A service using List-Contents to
retrieve records would have to iterate over all of the partitions in the
hierarchy (including interior nodes) in order to find out what's in all
of the partitions, rather than just the leaves. However, not allowing
documents in interior nodes may also be problematic. For example, the IAM
group at ECS has a number of sub-groups, but not everyone is in a
subgroup. Thus some papers would go in a sub-group partition
Southampton;ECS;IAM;subgroup, but others should really go in
Southampton;ECS;IAM. A solution would be to have other papers go in
Southampton;ECS;IAM;misc or somesuch, however, this doesn't seem ideal to
me. Additionally, if another group, e.g. Southampton;ECS;ISIS, starts
forming subgroups, you'd need to create a new partition
"Southampton;ECS;ISIS;other" to migrate all of the documents previous at
Southampton;ECS;ISIS to.

One might argue that the partitions should remain static, but I would
argue that this is never going to be the case, either with organisational
structures like institution/department/etc., or with subject hierarchies.
So, I think we need to define what partitions are, what they are for, and
how documents may be held within them.

> > Although the CogPrints incarnation of our software has a two-level
> > hierarchy, with all documents held at the leaves, the core software
> > doesn't enforce that. How the hierarchy works is up to the individual
> > site configuration. Thus you could easily end up with an
> > archive in which
> > some documents are held at, say, "Southampton;ECS", and some at
> > "Southampton;ECS;IAM". In which case, does the partitionspec
> > "Southampton;ECS" include documents in "Southampton;ECS;IAM" as well?
> > (i.e. Would a List Contents request with
> > "partitionspec=Southampton;ECS"
> > respond with the documents from the IAM partition too?)
> 
> This indicates that you could have some problems with what I've specified
> above.  However, if we say that you can give an interior node in
> List-Contents, then we need to specifiy partitions int he response, yes?

Yes, if you assert that the partionspec Southampton;ECS does include
Southampton;ECS;IAM. The alternative is to assert that a List-Contents
request specifying Southampton;ECS won't retrieve documents in
Southampton;ECS;IAM. The List-Partitions response could also be extended
to include a flag indicating whether a partition is used to hold
documents or not. Either way, I think the specification needs to be clear
about where documents should be held and what a partitionspec specifies.

R 

-- 
 Robert Tansley                    Tel: +44 (0) 23 80594492
 Multimedia Research Group         Fax: +44 (0) 23 80592865
 Electronics & Computer Science    http://www.ecs.soton.ac.uk/~rht96r/
 University of Southampton
 Southampton SO17 1BJ, UK