Working Paper: Overall Rates and Sample Selection: Inferring HIV Prevalence from a Selected Sample

Paper Authors: Jessica Ying Chan and Jonathan A. Cook

Abstract: This paper estimates HIV prevalence in Zambia from survey data that are subject to sample selection: some surveyed individuals do not consent to taking a HIV test. We introduce semiparametric estimators for an overall rate that incorporate recent developments in machine learning. The semiparametric estimators perform well in Monte Carlo experiments and obtain narrower confidence intervals than a fully parametric estimator when the model is misspecified. Our semiparametric estimates of the HIV rate are roughly equal to the rate in the selected sample. In contrast, recent parametric estimates find a higher rate--implying that some form of sample-selection correction is warranted.