Working Paper: Random Forests and Selected Samples

​Paper Authors: Jonathan Cook and Saad Siddiqui

Abstract: This paper presents a procedure for recovering causal coefficients from selected samples that uses random forests, a popular machine-learning algorithm. This proposed method makes few assumptions regarding the selection equation and the distribution of the error terms. Our Monte Carlo results indicate that our method performs well, even when the selection and outcome equations contain the same variables, as long as the selection equation is nonlinear. We also compare the results of our procedure with other parametric and semiparametric methods using real data.