For over a century, The Rockefeller Foundation has worked on initiatives promoting greater social equity in the United States and around the world. As part of this commitment, we collaborate with partners and grantees to tackle persistent obstacles that limit the ability of societies to achieve better outcomes. One such challenge in the United States has been the presence of lead in drinking water.
This prompted our Data Science team to work on making lead pollution datasets easier and a dashboard to explore and available for academics, environmental organizations and the general public to understand the extent of lead exposure and highlighting the deep social inequalities associated with lead pollution.
The adverse consequences of lead exposure have been well-documented, particularly its neurological effects in young children. Lead is more likely to be present in regions with aging infrastructure, cities experiencing industrial decline and towns in close proximity to old lead smelting plants. In the United States, many of the most affected locations are in the Rust Belt region, where several manufacturing industries used lead extensively during the first half of the twentieth century.
In regions where lead is more prevalent, the social stratification of lead exposure reveals a strong case of environmental and social injustice, where low-income and minority communities are more likely to be affected. In particular, Black children have been the most afflicted demographic, for decades they have been the demographic with the highest blood lead levels among American children.
The persistency of higher blood lead levels among non-Hispanic Black children has been examined in several epidemiological studies using data from the National Health and Nutrition Examination Survey (NHANES). This annual survey conducted by the Center for Disease Control and Prevention (CDC) has collected blood lead levels data from a representative sample of the American population since 1976. Currently, the CDC establishes the Blood Lead Reference Value to be 5 micrograms per deciliter (5 ug/dL) to identify children with elevated lead exposure.
In one of the most recent studies covering NHANES data between 1999 and 2010, researchers estimated Black children to be 2.8 times more likely than White and Hispanic children to present blood lead levels above 5 ug/dL and on average Black children have higher blood lead levels compared to their White and Hispanic peers (2.95 ug/dL vs 1.89 ug/dL).
We cannot dismiss sociological explanations that reinforce the difference in lead exposure, such as housing policies and public institutional disinvestment in neighborhoods segregating low-income and minority residents. The 2014 water crisis in Flint was an example of this, affecting a city where nearly 60% of its residents are Black.
An obstacle for implementing cost-effective lead abatement programs is the lack of integrated datasets listing the main sources of exposure at the household level. An additional difficulty is the accessibility of such data. For instance, the Environmental Protection Agency reports tap water lead tests through the Safe Drinking Water Information System, a manual data retrieval system with non-intuitive navigation. Without access to clear and sensible record keeping, it is difficult to imagine well-targeted lead abatement programs, particularly in areas where governments already struggle to access public funding.
Producing higher quality data is only part of the solution. Partnering with key stakeholders is what drives actionable change. At the beginning of 2016, Jacob Abernethy, Assistant Professor of Computer Science at Georgia Tech, and Eric Schwartz, Assistant Professor of Marketing at the University of Michigan, designed a machine learning model with the goal of guiding excavation decisions in Flint. Their collaboration with city officials saved the city an estimated 12% in excavation costs by providing predictive estimates of the locations of lead service lines. The scalability and social mission of this approach is what motivated Dr. Abernethy and Dr. Schwartz to found BlueConduit, a water analytics social venture with the mission of supporting cities in removing lead from drinking water systems, as ultimately, the goal of any data science effort for public good should be to improve the wellbeing and progress of society.
To this end, The Rockefeller Foundation continues to explore ways to incorporate data science as part of its toolkit of resources, including with our investment in data.org and advisory support to the Foundation’s efforts to improve health outcomes through equitable, data-driven initiatives. Lead exposure is a very illustrative example of a social challenge that greatly benefits from better data science efforts to guide abatement programs. After all, lead poisoning is entirely preventable when we invest in well-informed decision making.
Explore the data for yourself
 Reuben A, Caspi A, Belsky DW, et al. Association of Childhood Blood Lead Levels With Cognitive Function and Socioeconomic Status at Age 38 Years and With IQ Change and Socioeconomic Mobility Between Childhood and Adulthood. JAMA. 2017;317(12):1244–1251. doi:10.1001/jama.2017.1712
 Lanphear BP, Hornung R, Khoury J, Yolton K, Baghurst P, Bellinger DC, Canfield RL, Dietrich KN, Bornschein R, Greene T, Rothenberg SJ, Needleman HL, Schnaas L, Wasserman G, Graziano J, Roberts R. Low-level environmental lead exposure and children’s intellectual function: an international pooled analysis. Environ Health Perspect. 2005 Jul;113(7):894-9. doi: 10.1289/ehp.7688. Erratum in: Environ Health Perspect. 2019 Sep;127(9):99001. PMID: 16002379; PMCID: PMC1257652.