dr. Dan SIMOVICI Professor of Computer Science University of Massachusetts, Department of Computer Science Boston S.U.A. ----------------------------------- Metric Methods In Data Mining ----------------------------------- ABSTRACT ----------------------------------- This lecture is dedicated to metric techniques applied to several major data mining problems: classification, feature selection, incremental clustering of categorical data, and to other data mining tasks. The origin of these techniques can be found in the work of Lopez de Mantaras, R. who introduced a metric between partitions of finite sets and used this metric to formulate a novel splitting criterion for decision trees that, in many cases, yields better results than the classical entropy gain (or entropy gain ratio) splitting techniques. Applications of metric methods are based on a simple idea: each attribute of a set of objects induces a partition of this set, where two objects belong to the same class of the partition if they have identical values for that attribute. Thus, any metric defined on the set of partitions of a finite set generates a metric on the set of attributes. Once a metric is defined, we can evaluate how far these attributes are, cluster the attributes, find centrally located attributes, and so on. All these possibilities can be exploited for improving existing data mining algorithms and for formulating new ones. We discuss the geometry of the metric space of partitions of a finite set, metric splitting criteria for decision trees, incremental clustering of categorical data, clustering features and feature selection, and a metric approach to discretization. Finally, we present several open problems and future directions for research. SHORT CV ----------------------------------- Dr.Dan Simovici has been Professor of Computer Science at University of Massachusetts Boston USA since 1985. His research focuses on information-theoretical methods in data mining, semantic models in databases and algebraic aspects of multiple-valued logic. Dr.Simovici held several research and teaching positions at University of Science and Technology, Lille, France, Tohoku University, Sendai, Japan, and University of Miami, Florida. He is also an editor of several scientific journals (e.g., Journal for Multiple-Valued Logic and Soft Computing, International Journal for Parallel, Emergent, and Distributed Systems, International Journal for Software and Information Technologies, etc.). Dr.Simovici completed his PhD in 1974 at the University of Bucharest, Romania, his M.S. in Mathematics in 1970 at the Al. I. Cuza University of Iasi, Romania and his M.S. in E.E. in 1965 at the Polytechnical Institute of Iasi, Romania.