Author: Henning Meyerhenke

E-Mail address: henningm (AT)
cs.uni-jena.de

Keywords: Parallel density-based clustering, Parallel k-d tree

**Abstract. **Since obtaining spatial data has become very
easy through technologies like remote sensing or medical devices such
as computer tomography, the processing of this large amount of data
into information has become a computational issue. One technique of
this knowledge discovery process in spatial databases is clustering -
the grouping of objects depending on their spatial proximity. One of
the proposed algorithms for this is FDC [11], which uses a
density-based clustering approach. As there is a need for parallel
processing in very large databases to distribute resource allocation,
this paper introduces PFDC, a parallel version of FDC. Three
approaches how to parallelize the algorithm are presented, two of
them in detail. The major difference between them is the
implementation of the parallel spatial access method, the data
structure k-d tree. All of these approaches have been implemented
with standard tools (C++ and MPI message passing) in order to work on
a variety of parallel platforms. Experiments show that the two
variations presented in detail achieve good speedup results.
*Bibliography*

I. Al-furaih, S. Aluru, S. Goil, and S. Ranka: Practical Algorithms for Selection on Coarse-Grained Parallel Computers, IEEE Trans. Parallel and Distributed Systems, vol. 8, no. 8, pp. 313-324, Aug. 1997.

I. Al-furaih, S. Aluru, S. Goil, and S. Ranka: Parallel Construction of Multidimensional Binary Search Trees. In: IEEE Trans. on Parallel and Distributed Systems, vol. 11, no. 2, pp. 136-148, Feb. 2000.

T. Cormen, C. E. Leiserson, R. L. Rivest: Introduction to Algorithms. MIT Press, 1990.

M. Ester, H.-P. Kriegel, J. Sander, X. Xu: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD'96), pp. 226-231.

J. Jàjà: Introduction to parallel algorithms. Addison-Wesley, 1992.

K. Mehlhorn, S. Näher: LEDA: A Platform for Combinatorial and Geometric Computing. Cambridge University Press, 1999.

H. Samet: The Design and Analysis of Spatial Data Structures. Addison-Wesley, 1990.

H. Shi, J. Schaeffer: Parallel sorting by regular Sampling. Journal of Parallel and Distributed Computing, vol. 14, no. 4, pp. 361-372, April 1992.

Software library LAM/MPI available at www.lam-mpi.org.

X. Xu, J. Jaeger, H.-P. Kriegel: A Fast Parallel Clustering Algorithm for Large Spatial Databases. In: High Performance Data Mining. Ed. by Y. Guo and R. Grossmann, pp. 29-56. Kluwer Academic Publishers, 1999.

B. Zhou, D. W. Cheung, B. Kao: A Fast Algorithm for Density-Based Clustering in Large Database. In: Proc. of the Third Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD '99), pp. 338-349. Springer-Verlag, 1999.