Ideas & Insights / All Perspectives / Ideas & Insights

Inclusive AI Needs Inclusive Data Standards

Tim Davies — Co-Founder, Open Data Services Co-Operative

Designing AI for the public good is in our hands

Modern artificial intelligence (AI) was hailed as bringing about the “end of theory.” To generate insights and actions, no longer would we need to structure the questions we ask of data. Rather, with enough data and smart enough algorithms, patterns would emerge. In this world, trained AI models would give the “right” outcomes—even if we didn’t understand how they did it.

Today that theory-free approach to AI is under attack. Scholars have called out the bias-in, bias-out problem of machine learning systems, showing that biased data sets create biased models and, by extension, biased predictions. That’s why policy-makers now demand that if AI systems are used for making public decisions, their models must be explainable by offering justifications for the predictions they make, but a deeper problem rarely gets addressed. It is not just the selection of training data or the design of algorithms that embeds bias and fails to represent the world we want to live in. The underlying data structures and infrastructures on which AI is founded were rarely built with AI uses in mind, and the data standards—or lack thereof—used by those data sets place hard limits on what AI can deliver.

Questionable assumptions

From form fields for gender that offer only a binary choice, to disagreements over whether or not a company’s registration number should be a required field on an application form for a government contract, data standards define the information that will be available to machine-learning systems. They set in stone certain hidden assumptions and taken-for-granted categories that make possible certain conclusions—while ruling others out—before the algorithm even runs. Data standards tell you what to record and how to represent it. They embody particular worldviews. And they shape the data that shapes decisions.

For corporations planning to use machine-learning models with their own data, creating a new data field or adapting available data to feed the model may be relatively easy. But for the public good, uses of AI—which frequently draw on data from many independent agencies, individuals or sectors—the syncing of data structures is a challenging task.

Opening up AI infrastructure

There is hope, however. A number of open-data-standards projects have launched since 2010. They include the International Aid Transparency Initiative, which works with international aid donors to encourage them to publish project information in a common structure, and HXL, the Humanitarian eXchange Language, which offers a lightweight approach for structuring spreadsheets with who, what and where information from different agencies engaged in disaster response activities.

When those standards work well, they enable a broad community to share data that represents their own realities, and they make the data interoperable with data from others. But for that interoperability to happen, standards must be designed with broad participation so that they avoid design choices that embed problematic cultural assumptions that create unequal power dynamics or that strike the wrong balance between comprehensive representation of the world and simple data preparation. Without the right balance, certain populations might drop out of the data-sharing process altogether.

To use AI for the public good, we have to focus on the data substrata on which AI systems are built. That focus requires primary focus on data standards and far more-inclusive standards development processes. Even if machine-learning lets us ask questions of data in new ways, we cannot shirk our responsibility to consciously design data infrastructures that make possible both meaningful and socially just answers.

Related Updates

  • Report

    AI+1: Shaping Our Integrated Future

    The Rockefeller Foundation convened a unique group of thinkers and doers at its Bellagio Center in Italy to weigh one of the great challenges of our time: How to harness the powers of machine learning for social good and minimize its harms. The resulting AI + 1 report includes diverse perspectives from top technologists, philosophers, economists, and artists at a critical moment during the current Covid-19 pandemic.
    Download PDF

Tags :