Hierarchical Clustering (HC) is a widely studied problem in unsupervised learning and exploratory data analysis, usually tackled by simple agglomerative procedures like average-linkage, single-linkage or complete-linkage. Applications of HC include reasoning about text documents, understanding the Evolution of species and the Tree of life, decomposing social networks like Facebook, or even organizing large data centers efficiently.
Surprisingly, despite the plethora of heuristics for tackling the problem, until recently there was no optimization objective associated with it; this is in stark contrast with flat clustering objectives like k-means, k-median and k-center. In this talk, we will give an overview of the optimization objectives for Hierarchical Clustering, we will analyze the popular Average-Linkage procedure and mention some of its drawbacks. Then, we will propose better algorithms with provably better approximation guarantees. Most of the talk is based on joint works with Moses Charikar and Rad Niazadeh.
Vaggos Chatziafratis is a 4th year PhD student at Stanford University under the supervision of Tim Roughgarden and Moses Charikar, where he is part of the Theory and the Machine Learning groups. His work focuses on approximation algorithms and their intersection with machine learning problems, mainly in unsupervised or semi-supervised settings. He has also worked on the design and analysis of algorithms, going “beyond-the-worst-case” analysis. Prior to Stanford, he received his Diploma in EECS from the National Technical University of Athens in Greece.