Brunero Liseo - (Dip. di metodi e modelli per il territorio, l'economia e la finanza, Sapienza Università di Roma)
Abstract
We introduce a general Bayesian methodology for performing Bayesian inference using the result of a preliminary record linkage analysis based on a k lists framework with possible duplications. We frame the record linkage process into a formal statistical model, which comprises both the matching variables and the other variables included at the inferential stage. This allows the researcher to be able to account for the matching process uncertainty in inferential procedures based on probabilistically linked data, and at the same time, to be also able to generate a feedback propagation of uncertainty between the working statistical model and the record linkage stage. We argue that the feedback effect is both essential to eliminate potential biases that otherwise would characterize the resulting linked data inference, and it is also able to improve record linkage performances. Practical implementation of the procedure is based on standard Bayesian computational techniques, such as Markov Chain Monte Carlo algorithms. Although the methodology is quite general, we have restricted our analysis to the popular and important case of multiple linear regression set-up for expository convenience.
KEYWORDS: Record Linkage, Hit and Miss algorithm, Clustering
Contact person
Daniela Cocchi