Metatag Depositories:
Dublin Core Metadata, Harvest and Resource Discovery in Law

1999 CALI Conference
Steven C. Perkins
and John Doyle

This session will discuss one solution for allieviating the problem of inefficient resource discovery of legal materials on the Internet. The proposed solution involves the use of Dublin Core metadata tags in electronic documents combined with the use of a search engine, such as Harvest, that is capable of recognizing the Dublin Core metadata tags.

What is Metadata?

Metadata is data about data. Common examples are various types of catalogs, such as a mail order catalog or a library catalog. Each catalog contains descriptions of items but not the actual items. Thus each catalog is a metadata depository. By searching the catalog metadata you are able to locate the physical item. The metadata do not need to be with the actual item and that allows the metadata depository to be separate from the physical item.

Why Metadata?

There are many metadata schemes:

Why use Dublin Core?

Examples of Dublin Core Metadata Depositories

Tools to create Dublin Core and other types of Metadata

The User Guide Working Draft

There is a User Guide Working Group which has developed a User Guide Working Draft which explains how to use Dublin Core version 1.

What is the Future of Metadata?

The future of Metadata is the Resource Description Framework, an XML application that describes metadata and allows for relationships between metadata. See, R. Iannella, "Application of RDF for extensible Dublin Core metadata".

What is RDF?

This article What is RDF? explains RDF in a readable manner. See, RDF Tools for an example of the Dublin Core in RDF.


Which Search Engines Support Metadata?

Using Harvest to Retrieve and Index Metadata

Harvest is an integrated set of tools to gather, extract, index and search Internet information. Harvest is capable of using various indexing software, but comes by default with the Glimpse indexer built-in. Both Harvest and Glimpse are able to deal with structured data that is organized into fields, allowing documents with fields containing names and attributes (such as metadata) to be stored and retrieved using field limited searches.

Harvest contains a "gatherer" and a "broker" componant. The gatherer retrieves documents from the Internet using a list of URLs, and will recursively descend a site if the configuration list requires it to. The retrieved documents are each passed through a summarizing program that extracts-out the elements of the documents that are required and stores the summary for each document in a field-organized record. The gatherer having completed its retrieval/summarizing tasks is then ready to receive a request from a broker. The broker portion of Harvest periodically queries the broker(s) listed in its configuration file, retrieving documents added since the previous contact. The broker then (utilizing Glimpse) indexes the data, and, via a user query-form and a query-engine, accepts input from users, passes the query on to the external Glimpse search-engine, and sends the results back to the user as a web page.

For information on configuring Harvest see: http://www.wlu.edu/Harvest/docs/.