SO-UNFER SEMINAR SERIES 2025
Abstract
Demography has long been the science of life courses — tracing how individuals move through education, work, family, and health across time and generations. Yet, despite massive advances in data availability, from full-population registries to linked longitudinal surveys, our capacity to predict life outcomes remains strikingly limited. This talk explores why. Drawing on results from studies in ODISSEI such as the Predicting Fertility (PreFer) challenge, I show that even with abundant data and compute, predictive accuracy for fundamental demographic events like childbearing remains modest. These limits reveal not only the boundaries of data and modeling, but also the inherent unpredictability of human lives. Building on these findings, I outline a vision for foundational models in demography: large-scale, general-purpose models trained on population-wide life-course sequences. Inspired by language models such as BERT and GPT, these models treat life events as tokens in a “biographical language”, learning representations that capture the social, temporal, and network structures underlying human trajectories. I demonstrate how this approach—applied to the entire Dutch population through Statistics Netherlands and ODISSEI infrastructure—can produce life-course embeddings capable of predicting income, partnership, and fertility dynamics years into the future, while retaining interpretability and ethical safeguards. Together, these results suggest a new methodological frontier for demography: one that acknowledges the limits of predictability revealed by challenges like PreFer, but also leverages foundation models to systematically learn from the full grammar of life. The talk concludes by reflecting on what this convergence of demography and machine learning implies for theory, policy, and the future of population research.
Link Microsoft Teams
Organizzazione
Chiara Comolli