Abstract
Classification and Regression Trees (CART) represent a widely adopted statistical technique for predictive modeling, based on binary recursive partitioning of the feature space (Breiman et al. 1984). Despite their popularity and interpretability, standard CART models face significant challenges when applied to spatial data, primarily due to spatial autocorrelation and/or spatial heterogeneity. Ignoring these spatial effects can lead to biased parameter estimates, suboptimal partitions, and reduced predictive performance. In this seminar, we introduce alternative formulations of the traditional CART algorithm explicitly designed to analyze spatially distributed data. The first approach extends the classical CART framework by augmenting the set of splitting variables with spatial predictors, constructed using two distinct strategies. The first strategy relies on spatial lag predictors derived from exogenous spatial weight matrices, following the path outlined by Kelejian and Prucha (2007), which allows the model to capture the influence of neighboring observations on each unit. The second strategy incorporates spatial filtering predictors obtained through well-established spatial econometric techniques. In particular, we employ both the Getis filtering approach, based on the local spatial statistic Getis’ G (Getis and Griffith, 2002), and the eigenvector spatial filtering method introduced by Griffith (2003). The second proposed approach introduces spatial information directly into the objective function of the tree-building process by incorporating a spatially constrained penalty term that encourages geographically contiguous partitions. This formulation ensures that the resulting terminal nodes correspond to spatially coherent regions and allows the recursive partitioning process to explicitly consider spatial effects. To evaluate the performance of these spatial CART models, we present an empirical application using the well-known Boston housing dataset. The results demonstrate that both spatial extensions significantly improve predictive accuracy compared to the standard CART model.