Societal Consequences of Biased Data in Predictive Policing

Sunday, February 19, 2017: 10:00 AM-11:30 AM
Room 309 (Hynes Convention Center)
Kristian Lum, Human Rights Data Analysis Group, San Francisco, CA
Police departments are increasingly relying on predictive policing techniques, such as PredPol, to forecast the locations of future crimes. Virtually all police forecasting models are trained using police-recorded instances of crime which are neither complete nor representative. Not all crimes are contained within the police records and these records systematically over-represent certain demographic groups. As a result, predictions made on the basis of this data are vulnerable to these same biases, and police action directed by these predictions will disparately affect historically over-policed communities.

We demonstrate this effect by applying a recent predictive policing model to data on drug crimes in the City of Oakland, CA*. We find that the occurrence of drug related arrests are highly concentrated in areas with higher proportions of low income and non-white residents. When the predictive policing model is applied to this data, additional targeted policing is directed primarily to the the non-white and low-income neighborhoods, despite the fact that public health surveys suggest that drug use in these areas are no more common than in more more affluent, white neighborhoods.

Our case study suggests that predictive policing models not only perpetuate existing racial and income discrimination in policing but create a feedback loop that amplifies the police targeting of low-income and minority residents. 

*The dataset we used was collected and made publicly available by members of