Bayesian covariate-informed clustering of high-dimensional data with variable selection in the Mallows rank model

Relatrice: Valeria Vitelli (Department of Biostatistics, University of Oslo, Norway)

  • Data: 14 novembre 2024 dalle 14:30 alle 14:30

  • Luogo: Aula III - Via Belle Arti, 41

Abstract
Rank-based models can be used to estimate individual behaviours and preferences in several areas, such as marketing and politics. Often, combining the expressed preferences with additional user-related information (covariates) can potentially lead to a better accuracy in individual predictions, by enhancing the understanding of the users’ personal profiles. The Mallows model is a popular model for rankings, as it flexibly adapts to different types of preference data, and the previously proposed Bayesian Mallows Model (BMM) offers a computationally efficient framework for Bayesian inference also allowing capturing the users’ heterogeneity, via a finite mixture. However, the Mallows model does not seem realistic when the pool of items is large, and furthermore BMM does not currently allow the use of covariates. In this talk, I will introduce a recent extension of BMM that embeds covariate information in a joint rank-based clustering framework. The proposed method is based on a similarity function that a priori favours the aggregation of people into a cluster when their covariates are similar. A lower-dimensional version of BMM (lowBMM) that scales to large datasets has also been proposed and used in the context of cancer genomics; however, lowBMM does not perform clustering. We now propose to combine the Bayesian mixture of Mallows models with items selection, to jointly perform variable selection and clustering. Performance of both methods is investigated via simulation studies, and real-data examples in genomics and preference learning are also shown.
This is joint work with Luca Coraggio, Emilie Eliseussen, Arnoldo Frigessi, Haakon Muggerud, and Ida Scheel.

Organizzazione
Angela Montanari