"KNOWLEDGE ACCESS MANAGEMENT:
TOOLS
AND CONCEPTS FOR NEXT-GENERATION CATALOGERS"
Report from the OCLC Seminar
On Nov. 17-19, 1997 I attended the second of OCLC's
seminars entitled "Knowledge
Access Management: Tools and
Concepts for Next-Generation Catalogers" (http://www.oclc.org/institute/seminar2.htm).
Although the title sounds general, the seminar focused on the
intersection of cataloging, metadata, and the Internet. There
were 22 of us, many from academic libraries, both large and small,
but also some from a network, a large public library, a consulting
firm, a special library, and a library science faculty.
There were 3 broad themes to the seminar: libraries
and the Internet, issues in the cataloging of Internet resources,
and the role of different kinds of "metadata" in the
online environment. I have tried to highlight below some of the
major points made for each theme.
LIBRARIES AND THE INTERNET
This session was an argument that the phenomenal
growth of the World Wide Web in the last year or so has brought
with it many of the scholarly resources that libraries traditionally
collect, and thus libraries will have to "collect" and
catalog these new Web-based versions of these resources in the
future.
- The WWW grew from 1993-1996 to include 30,000
sites; from 1996 to the present we have gone to 1 million sites.
Growth in telecommunications capacity remains a major driving
force in today's world, leading to Scott McNealy's remark that
Internet years are just like dog years, seven to every one.
- Why all this WWW growth? The multimedia capabilities
provide a better educational experience.
- The downside of all this WWW growth: the Web's
too disorganized to search effectively (just when our major need
is for organizing information, library schools are bypassing cataloging
in the curriculum!). The library field has created some wonderful
organizing tools like the MARC record, classification schemes,
authority control.
- Good cataloging is hard to do: differentiating
each publication from all the others is not easy and will only
get harder. Nevertheless, we must begin to mainstream Internet
resources in our cataloging workflow.
This theme was ended by a prediction that any OPAC
vendor who does not have a Web catalog interface by the end of
this year will be out of business.
ISSUES IN THE CATALOGING OF INTERNET RESOURCES
These sessions were conducted by Ann Sandberg-Fox
who has been involved in both national and international discussions
of bibliographic description of electronic resources.
We went through every cataloging rule and MARC field
used in cataloging Internet resources discussing difficulties
and possible solutions all along the way. Basic bibliographic
concepts are hard to apply to Internet resources (an example she
used that illustrates many of the issues discussed below is http://www.oc.ca.gov/):
- Monograph vs. Serial:
We have hung a great deal on our traditional division of material
into monograph and serial, a distinction that doesn't translate
well to Internet resources. As things now stand, Web sites are
cataloged as "monographs".
- Title: Even the title
of a website isn't always clear from the layout and the problem
is compounded by the fact that pages may be displayed differently
by different Web browsers. There is already some reliance being
put on the title as recorded in the <title> area of the
html document source code, even though it doesn't always correspond
exactly to anything as viewed on the Web page!
- Edition: Is "new
and improved" on a home page to be taken seriously as an
edition statement, the way it would be if printed in a book? Probably
not. One wonders if edition statements as such will disappear
for Internet resources, with a date being the only fixed point
in identifying "versions of websites."
- Date:
Speaking
of dates, which dates? Often "last update" is the only
date that appears. Some copyright dates are appearing and these
would probably take precedence for identifying "editions."
In describing Websites we may need to use the "serial"
note: "Description based on [last update date]."
- General principles:
Some of the general principles that are emerging in discussions
both of Internet resources and the future of the cataloging rules
are:
- All Internet resources are considered "published."
- The international community favors the term "electronic
resource" over our current "computer file" format
term
- More emphasis on "identifying" resources
over exact transcription of their text
- A move away from format-based cataloging: users
like information gathered together on one record.
- A new definition of "seriality"
- Bibliographic record of the future:
these may be records that are permanently "unfinished."
Some identifying features would be permanent but other parts that
we might want in the latest form (e.g., contents) would have a
pointer to the website to search, retrieve and display the up-to-the-minute
information from the website, on-the-fly.
- Library of Congress Internet projects
- Table of contents (TOC):
this project follows well on the "record of the future"
idea described above and is connected to LC's Electronic CIP (E-CIP)
program. A method was developed to take TOC information from the
E-CIP manuscript, wrap minimal HTML coding around it, save it
to the WWW server, and add the 856 field to the catalog record.
The catalog record has as its "contents note" a URL
to the contents on the Web. In another project, LC is automatically
adding the first subject heading to the TOC site in a keyword
field. This means that contents words can be found by Web search
engines and link back to the catalog record which could then serve
as a source for finding related items. David Williamson from LC,
who was an attendee at the seminar, reported on several TOC projects
involving interaction between catalog records and Web sites (sample
LC record with Web URL: OCLC#32822869).
- BEOnline (http://lcweb.loc.gov/rr/business/beonline/beoabout.html):
this is a pilot project to provide access to business and economic
resources from the Internet. The plan is to have it serve as both
a model and a catalyst for developing approaches to identifying,
selecting, and providing bibliographic access to remote electronic
sources on the Web. This should begin to yield LC copy for Internet
resources.
- Intercat database (http://purl.org/net/intercat):
This is a searchable database of more than 20,000 catalog records
for Internet resources. It was begun by OCLC as part of a grant
project sponsored by the Dept. Of Education, but continues to
be maintained. There is also an Intercat listserv (http://www.oclc.org/oclc/forms/listserv.htm)
for continuing discussion.
- PURLs (http://purl.oclc.org/):
PURLs are "persistent URL"s. OCLC is offering to maintain
an "authority file" of URLs with the PURL serving as
a cross-reference or alias for the real URL, which is, hopefully,
maintained by the person who registered for the PURL. PURLs can
be assigned automatically at the Website.
METADATA
Metadata is a term being used to refer to any data
about other data. Catalog records are currently libraries' most
familiar form of metadata but this is an area of great interest,
excitement and experimentation and other forms of metadata are
emerging to serve different needs.
- Why all this interest in metadata?
First, the impetus is not coming from libraries,
who have always recognized the importance of metadata by creating
catalog records. The impetus is coming from the WWW community:
the Web browsers, the search engines, and the Internet service
providers.
We all know what searches on the Web are like! A
recent and very good metadata survey article by Warwick Cathro
of the National Library of Australia (http://www.nla.gov.au/nla/staffpaper/cathro3.html)
described a Web search on the acronym IETF (Internet Engineering
Task Force). Admittedly, this is an Internet body and will have
a lot of hits, but, still, it's not as general as searching for
a common subject word like "environment." Nevertheless,
the "IETF" search retrieved 896,354 matches! Cathro
went on to explain that metadata would allow search engines to
target searches onto words or phrases that identify their correct
role, e.g., "green" as a personal name vs. "green"
as a subject. This sounds elementary and it is, but it is still
more than many search engines can do.
In March, 1995, the first Metadata Workshop was convened
at OCLC, and included researchers and professionals from librarianship,
computer science, text encoding, and related areas, to come up
with a set of core metadata elements to describe networked resources.
The result was called the Dublin Core, "Dublin" because
of OCLC's location in Dublin, Ohio, and "Core" because
this was meant to have the minimal number of elements required
to describe an electronic resource adequately.
Since then, there have been four more Dublin Core
conferences with these results:
- The DC consists of 15 elements, all of which
try to have a commonly understood meaning : (for more details:
http://purl.oclc.org/metadata/dublin_core_elements)
- title
- author or creator
- subject and keywords
- description
- publisher
- other contributor
- date
- resource type
- format
- resource identifier
- source
- language
- relation
- coverage
- rights management
- It has been determined that the DC will be adequate
to describe both documents and images.
- Many packages of metadata should be able to exist
side-by-side. This is referred to as the "Warwick framework"
which is "a conceptual model for a container architecture
for metadata packages of various types. The organizing principle
that emerged from this workshop was that there will be many packages
of metadata, independently developed and maintained for different
purposes by various communities, and that this modularization
will allow for coherent evolution of different components of the
metadata landscape under a single metadata architecture."
(from 4th DC Metadata Workshop report: http://www.dlib.org/dlib/june97/metadata/06weibel.html)
Now that the DC has become standardized, various
"crosswalks" are being developed between DC and other
types of metadata, especially the MARC record (http://lcweb.loc.gov/marc/dccross.html).
However, there are still many questions whose answers are speculative:
- Where will DC records come from?
In the long run, from people with a vested interest
in better access to Web items, e.g., Website creators, database
creators, catalogers, etc. There is also software already available
for creating DC records for existing Websites (for fun with this
try DC Dot: http://www.ukoln.ac.uk/metadata/dcdot/).
- How soon will we see DC-compliant browsers?
6 months, as a conservative estimate. For a recent
report: http://www.zdnet.com/pcweek/news/1103/03rdf.html
- Other forms of metadata in use in libraries
- TEI header: The Text
Encoding Initiative (TEI) is an international project that guides
the SGML-encoding of electronic texts for scholarly research.
At the top of the file for each encoded text is a TEI header that
describes the text and its original source, and documents the
encoding history (http://www.uic.edu/orgs/tei/).
Our own project Documenting the American South includes
a TEI header for each text, and this is then used as the basis
for the MARC record when each text is cataloger-a good example
of coexisting forms of metadata.
CONCLUSION
The MARC record, as old and imperfect as it is to
us, is viewed by others as "very rich metadata," with
all of its coding and authority control. No one expects the DC
to replace the MARC record although it is likely that catalog
records and other types of metadata will merge eventually. Right
now, there is room for all kinds of metadata. We may find ourselves
using several kinds in one catalog: the MARC record for the most
important and scholarly resources, with, perhaps, the DC becoming
a new minimal level format.
The seminar opened with some memorable quotes from
library history, this one from Cutter in 1904: "I cannot
help thinking that the golden age of cataloging is over, and the
difficulties and discussions which have furnished an innocent
pleasure to so many will interest them no more." If Cutter
were here to see us at this new stage of catalog development I
think he would be pleased.
Celine Noel
Catalog Dept.
Davis Library
1-29-98