Relatrice
Jeanine Houwing-Duistermaat
Department of Statistics
University of Leeds
Abstrac
The availability of large omics datasets in epidemiological and clinical studies provides many opportunities for research in statistical bioinformatics. The hope is that the abundance of information will provide better understanding of underlying disease mechanisms and accurate prediction models enabling patient targeted screening and treatment. Statistical challenges are to deal with data cleaning, heterogeneity across omic datasets, high dimensionality, data integration and the presence of high correlation within and between datasets (Morris et al, 2017; Houwing-Duistermaat et al, 2017). In this talk I will present Partial Least Squares (PLS) methods for multivariate regression and for data integration and dimension reduction when analysing several omics datasets simultaneously.
Three PLS type of methods for omics analysis will be considered namely the standard PLS algorithm (Wold, 1972), Envelope (Cook et al, 2015) and our recently developed Probabilistic PLS (PPLS) (Bouhaddani et al, 2018). Envelope and PPLS are maximum likelihood methods. PLS and PPLS can deal with high dimensions while Envelope requires n larger than p. PPLS maximizes a constrained log likelihood to ensure that the solution is unique. The methods will be illustrated with several data examples. The results of simulation studies to compare their performances will be shown.
Organizzazione
Angela Montanari