[OAI-general] Re: Interoperability - subject classification/terminology

Hussein Suleman hussein@cs.uct.ac.za
Thu, 27 Mar 2003 13:00:59 +0200


hi

well, sure, i agree in principle ... if arXiv and similar projects agree 
to bunch of all physics into a single category and use google for 
searching, with no browsing capabilities, it wouldnt be a problem at all.

similarly, if we grouped together computer science, electrical 
engineering and information systems, that would be ok for gross-level 
interoperability ... once again, assuming searching is the only service 
required. frankfully, i think this is a little simplistic and assumes 
digital libraries are no more than submission+search systems.

[aside: why does eprints support browsing by catgeories ?]

besides, who decides what constitutes a discipline anyway ? has anyone 
ever been able to decide if computer science is engineering or science ?

i think we have more questions than answers here and it isnt as simple 
as you point out or we wouldnt even be discussing this :)

ttfn,
----hussein


Stevan Harnad wrote:
> On Thu, 27 Mar 2003, Hussein Suleman wrote:
> 
> 
>>...why not use sets for the separate 
>>disciplines, aimed at particular service providers?...
>>some disciplines are not well-defined (namely, computer science) 
>>so such archives may want to play ball with multiple service providers 
>>and hence may need different sets.
> 
> 
> The question of taxonomic classification sets and version-control for
> Open Archives is a technical one, so I will not presume to comment on it
> except from the point of view of the potential *users* of one particular
> kind of Archive Content, namely, unrefereed preprints and refereed
> postprints of research papers from one or many or all disciplines: This
> -- in the google-age of boolean inverted full-text searchability --
> does not require a detailed a-priori taxonomy, as book metadata or the
> metadata for other kinds of material might. A fairly general sorting by
> discipline should suffice.
> http://www.eprints.org/self-faq/#26.Classification
> http://www.ecs.soton.ac.uk/~harnad/Hypermail/Amsci/2385.html
> 
> 
>>...the service provider can provide an 
>>interface for potential data providers to self-register.
> 
> 
> I hope that once the number and contents of Open-Access Eprint Archives
> for research preprints and postprints have scaled up toward something
> closer to universality, the simple metadata descriptors "pre-refereeing
> preprint" and "refereed journal article" plus perhaps "discipline name"
> will be enough to guide relevant service-providers in automatically
> harvesting their relevant metadata. Multiple self-registration seems a
> tedious and unnecessary constraint. (Possibly a master-registry of valid
> institutions and disciplinary archives will also help, but may not be
> necessary unless commercial spamming invades this sector too.)
> 
> 
>>what remains a difficult problem, however, is how to recreate the 
>>metadata used by the service provider as its native format. so, for a 
>>typical example, if arXiv classifies items using a specific set 
>>structure, this is certainly not going to be the default for an 
>>institutional archive. does the service provider automatically or 
>>manually reclassify? or does it not allow browsing by categories? 
> 
> 
> Worrying about "recreating the categories" in this boolean full-text age
> is, I believe, a waste of time (for research preprints/postprints). Just
> harness google's harvested full-text to your engine's search capability,
> if it is incapable of contending with boolean full-text search on its
> own. (Manual reclassification! Heaven forfend! Don't bother classifying
> this material in the first place, beyond the simplest of first-cuts,
> such as discipline. Any further classification should be algorithmic and
> text-data-driven, not manual.)
> 
> 
>>in either event, the quality of the metadata from the perspective of the 
>>service provider may be an impetus for potential users to want to 
>>replicate their effort rather than rely on the automated submission from 
>>their own institutions ... this needs more thought ...
> 
> 
> Again, I speak only for research preprints/postprints, but please let's
> not inject any further credibility into the notion that self-archiving
> author/institutions will also have to self-advertise by multiple
> self-archiving of the same paper. Surely that is one headache that
> OAI-interoperability should eradicate from the planet! Self-archiving
> itself is self-advertising (and effort) enough. Please let us not
> now -- when the momentum is still not big enough -- saddle would-be
> self-archivers with needless extra worries, and tasks!
> http://www.ecs.soton.ac.uk/~harnad/Temp/tim-arch.htm
> 
> Stevan Harnad
> 
> _______________________________________________
> OAI-general mailing list
> OAI-general@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-general


-- 
=====================================================================
hussein suleman ~ hussein@cs.uct.ac.za ~ http://www.husseinsspace.com
=====================================================================