A boosting approach to lean the dependence structure of the data

Relatore: Michela Battauz (Università di Udine)

  • Data: 11 gennaio 2024 dalle 16:00 alle 17:00

  • Luogo: Aula III - Via Belle Arti, 41

Statistical boosting is a method for fitting a model that performs variable selection and prevents overfitting at the same time. While the proposals in the literature focus on the covariates to be included in the model, we aim to automatically learn the dependence structure present in the data. In particular, we have considered two different models that use latent variables to account for the dependency between observations, which are factor analysis models for binary data and linear mixed models. In the case of factor analysis, the algorithm automatically selects the number of factors and the non-zero loadings. In linear mixed models, the procedure selects the variables that have a random effect. Early stopping criteria are fundamental for both model selection and regularization of the estimates. Latent variable models pose new challenges in the development of boosting algorithms. In fact, the objective function to be minimized, which is the negative log-likelihood function in our proposals, is not convex and presents a null gradient in the starting point. Hence, the classical gradient-based boosting algorithms cannot be applied and new strategies are required. To overcome this issue, our method is based on the directions of negative curvature, which exploit the Hessian matrix of the objective function. Simulation studies and real data applications show the effectiveness of our proposal.

Silvia Cagnone