Events Calendar

Flat View
By Year
Monthly View
By Month
Weekly View
By Week
Download as iCal file
Seminário Grafos e Algoritmos: Eduardo S. Laber
Wednesday, 29 May 2019,  1:30 -  3:30

O grupo de Grafos e Algoritmos da UFRJ convida a todos para o seminário a ser realizado em 29/05/2019 (quarta-feira). Local - Coppe/Sistemas, Sala H-324B, 13:30h.



Eduardo S. Laber 



On the approximability of Information Theoretic Clustering 



An impurity measure is a function that assigns a d-dimensional vector v, with non-negative components, to a non-negative value so that the more homogeneous v with respect to the values of its components, the larger its impurity. Well known examples of impurity measures are the Entropy and the Gini impurities. We study the problem of clustering based on impurity measures. Given a collection V of n many d-dimensional vectors, an impurity measure I and an integer k, the goal is to find a partition P of V into k groups V_1,...,V_k so as to minimize the sum of the impurities of the groups in P, where the impurity of a group U is the impurity of the vector \sum_{v \in U } v. Impurity minimization has been widely used as quality assessment measure in probability distribution clustering (KL-divergence) as well as in categorical clustering. However, in contrast to the case of metric based clustering, the current knowledge of impurity measure based clustering in terms of approximation and inapproximability results is much more limited.


In this talk we discuss some results that contribute to narrow this gap. Among them we highlight (i) a proof that the problem, for Entropy impurity, does not admit a PTAS even when all vectors of V have the same \ell_1 norm and (ii) a polynomial time O(log^2(min{k,d})-approximation algorithm. The former solves a problem that remained open in previous work [Chaudhuri and McGregor COLT 08; Ackermann et. al. ECCC 11] while the latter is the first polynomial time algorithm with approximation guarantee independent of the number of points/vector and not relying on any restriction on the components of the vectors for producing partitions with minimum entropy.


In order to assess the practical applicability of our theoretical findings we show that a new method that relies on these findings obtains clustering with impurity close to a k-means based method, with the advantage of being 2-3 orders of magnitude faster. 

The main results from our research were/will be presented in the international Conference on Machine Learning (ICML 2018, 2019) 



Consulte também a nossa página de seminários: