Resampling likelihood-type statistics for comparing clustering solutions

Relatore: Pietro Coretto (Università degli studi di Salerno)

  • Data: 01 aprile 2021 dalle 16:00 alle 18:00

  • Luogo: Modalità telematica, mediante sistema di videoconferenza su piattaforma Microsoft Teams

Selecting an optimal clustering solution is a longstanding problem. In model-based clustering, this amounts to choose the architecture of the model mixture distribution. Decisions are cluster prototype distribution, number of mixture components, perhaps restrictions on the clusters' geometry, etc.  Classical methods address this issue via penalized model selection criteria that are based on the observed likelihood.  We compare these methods with the less explored cross-validation alternative, which is almost the default option in the prediction-oriented paradigm.  We introduce a framework for "scoring" clustering solutions, where scores are intimately connected with likelihood and information-theoretic quantities.  We propose to estimate scores and their confidence intervals based on resampling methods and to use these estimates to formulate selection rules. Theoretical guarantees are given. Both real and artificial data sets are analyzed to assess the relative performance of the proposed methodology.

Christian Hennig