Creating a virtual library with HPSearch and Mops

Gerd Hoff and Martin Mundhenk
Universität Trier, FB IV - Informatik
D-54286 Trier (Germany)
hoffg@uni-trier.de       mundhenk@ti.uni-trier.de

Keywords: virtual library, scientific literature on the web, intelligent search techniques, focused navigation

Abstract

The fast dissemination of new research results on the world-wide web is a new challenge for search engines. In many research areas, scientists make their newest results electronically available on their web site, long before the results appear in conference proceedings or in journals. Whereas a decade ago, the state of the art in a research area could be found out by reading conference proceedings and journals in the local library, nowadays it is additionally necessary to find the newest related electronic publications on the web - in other words, to maintain a virtual library of not-yet-printed literature. Traditional search engines do not help for this task. E.g., they do not index postscript documents, which is the electronic format of many preprints appearing on the web. The few existing searchable indices for postscript documents either cover too large fields - all of computer science, for example - to be really helpful, or they depend on some submission procedure which delays the appearance of the documents on the web.

We present a new approach for constructing a virtual library of scientific papers which is specialized in a relatively small research area and allows to find the latest new documents.

In this project, we developed and implemented HPSearch and Mops. We tested our approach by creating two example indices. The research area for the one index is complexity theory, and for the other index it is BDDs (binary decision diagrams, a data structure for VLSI design and verification). Both indices are well used in the respective research communities. The whole software runs on standard PCs.
We conclude that such a focused crawling is very effective for building high-quality virtual libraries, using ordinary desktop hardware.

A more detailed description of the system can be found at http://www.minet.uni-jena.de/www/fakultaet/mundhenk/papers/virt-lib/.

Gerd Hoff and Martin Mundhenk
2001-01-31