Abstract
Many studies collect multiple omics datasets to gather novel insights about different stages of biological processes and to associate omic features with outcome variables. For joint modelling of omic datasets, several data integration methods have been developed. We have proposed a probabilistic latent variable modelling framework for inferring the relationship between two omics datasets. These methods reduce dimensionality and address the presence of heterogeneity among datasets due to representing different biological processes and using different measurement technologies. The correlation structure is modelled by joint and data specific components. An extension of the model includes the relationship between the joint components and an outcome variable.
Model parameters are estimated using maximum likelihood. Test statistics are proposed for the null hypothesis of no relationship. We evaluate our methods via simulations. Under the null hypothesis, the test statistics appear to approximately follow the normal distribution. Our method appears to outperform existing methods for small and heterogeneous datasets in terms of selecting relevant variables and prediction accuracy. We illustrate the methods by application to multi omics datasets from a population cohort, cell lines and a case control study.
Organizzatore
Christian Hennig
Collegamento Teams