[OAI-general] Re: Interoperability - subject classification/terminology
Thu, 27 Mar 2003 13:00:59 +0200
well, sure, i agree in principle ... if arXiv and similar projects agree
to bunch of all physics into a single category and use google for
searching, with no browsing capabilities, it wouldnt be a problem at all.
similarly, if we grouped together computer science, electrical
engineering and information systems, that would be ok for gross-level
interoperability ... once again, assuming searching is the only service
required. frankfully, i think this is a little simplistic and assumes
digital libraries are no more than submission+search systems.
[aside: why does eprints support browsing by catgeories ?]
besides, who decides what constitutes a discipline anyway ? has anyone
ever been able to decide if computer science is engineering or science ?
i think we have more questions than answers here and it isnt as simple
as you point out or we wouldnt even be discussing this :)
Stevan Harnad wrote:
> On Thu, 27 Mar 2003, Hussein Suleman wrote:
>>...why not use sets for the separate
>>disciplines, aimed at particular service providers?...
>>some disciplines are not well-defined (namely, computer science)
>>so such archives may want to play ball with multiple service providers
>>and hence may need different sets.
> The question of taxonomic classification sets and version-control for
> Open Archives is a technical one, so I will not presume to comment on it
> except from the point of view of the potential *users* of one particular
> kind of Archive Content, namely, unrefereed preprints and refereed
> postprints of research papers from one or many or all disciplines: This
> -- in the google-age of boolean inverted full-text searchability --
> does not require a detailed a-priori taxonomy, as book metadata or the
> metadata for other kinds of material might. A fairly general sorting by
> discipline should suffice.
>>...the service provider can provide an
>>interface for potential data providers to self-register.
> I hope that once the number and contents of Open-Access Eprint Archives
> for research preprints and postprints have scaled up toward something
> closer to universality, the simple metadata descriptors "pre-refereeing
> preprint" and "refereed journal article" plus perhaps "discipline name"
> will be enough to guide relevant service-providers in automatically
> harvesting their relevant metadata. Multiple self-registration seems a
> tedious and unnecessary constraint. (Possibly a master-registry of valid
> institutions and disciplinary archives will also help, but may not be
> necessary unless commercial spamming invades this sector too.)
>>what remains a difficult problem, however, is how to recreate the
>>metadata used by the service provider as its native format. so, for a
>>typical example, if arXiv classifies items using a specific set
>>structure, this is certainly not going to be the default for an
>>institutional archive. does the service provider automatically or
>>manually reclassify? or does it not allow browsing by categories?
> Worrying about "recreating the categories" in this boolean full-text age
> is, I believe, a waste of time (for research preprints/postprints). Just
> harness google's harvested full-text to your engine's search capability,
> if it is incapable of contending with boolean full-text search on its
> own. (Manual reclassification! Heaven forfend! Don't bother classifying
> this material in the first place, beyond the simplest of first-cuts,
> such as discipline. Any further classification should be algorithmic and
> text-data-driven, not manual.)
>>in either event, the quality of the metadata from the perspective of the
>>service provider may be an impetus for potential users to want to
>>replicate their effort rather than rely on the automated submission from
>>their own institutions ... this needs more thought ...
> Again, I speak only for research preprints/postprints, but please let's
> not inject any further credibility into the notion that self-archiving
> author/institutions will also have to self-advertise by multiple
> self-archiving of the same paper. Surely that is one headache that
> OAI-interoperability should eradicate from the planet! Self-archiving
> itself is self-advertising (and effort) enough. Please let us not
> now -- when the momentum is still not big enough -- saddle would-be
> self-archivers with needless extra worries, and tasks!
> Stevan Harnad
> OAI-general mailing list
hussein suleman ~ firstname.lastname@example.org ~ http://www.husseinsspace.com