Announcing the Lacuna Fund: Closing Data Gaps to Enable Equitable Machine Learning

Over a decade into the AI revolution bias remains a pernicious problem. From medical recommendation systems that allocate less care to Black people to human resource algorithms that are biased against women it can often feel like we’re trapped in a cycle of hype then harm. As scientists scramble to build machine learning tools to fight Covid-19 it feels inevitable that some well-intentioned project will lead to similarly disparate results.

This begs the question:

Why does the field of machine learning--which has yielded so many compelling and useful tools--regularly produce technology that privileges some while denying many others?

From a technical perspective, there is a straightforward answer. We can often trace issues back to the data used to build the algorithm. Developing a system to diagnose a medical condition or identify voice commands takes a large amount of labeled data to train the algorithm. Marginalized people and problems are regularly left out of industry standard datasets, leading to measurably worse algorithmic performance when compared to privileged groups.

But a purely technical answer hides the social root of the problem. Bias in existing datasets and the lack of other critical datasets is at its core about who has the power and resources to envision and build a better future. In a recent article for Nature, AI researcher Pratyusha Kalluri sums up the problem:

It is not uncommon now for AI experts to ask whether an AI is ‘fair’ and ‘for good’. But ‘fair’ and ‘good’ are infinitely spacious words that any AI system can be squeezed into. The question to pose is a deeper one: how is AI shifting power?

As a practical example, The Food and Agriculture Organization estimates that 800 million people or 78 percent of the world’s poorest are harmed by agriculture data gaps. Without locally representative data, it is difficult for engineers to build AI tools to help farmers in their communities plant and manage their crops. This in-turn contributes to low productivity and unstable incomes. There is the potential to build tools that shift power in favor of marginalized people, but without access to resources to build training datasets this is difficult. The same pattern holds true across health, economic development, and many other sectors.

To address this problem we are proud to join with Google.org, Canada’s International Development Research Center (IDRC), and German development agency GIZ on behalf of the Federal Ministry for Economic Cooperation and Development (BMZ) to launch The Lacuna Fund: Our Voice on Data. The initial $4 million fund will provide grants to support the creation, expansion, and maintenance of quality labeled datasets that enable machine learning by and for people who are often excluded from the global AI conversation. Meridian Institute will serve as the founding Secretariat and fiscal sponsor for the Fund.

When designing the Lacuna Fund with our co-funders and partners, our team at The Rockefeller focused on three core principles:

Foster the development of AI systems shaped by the people they affect

We’ve designed the governance of the Lacuna Fund to put the power to draft calls for proposals and select in the hands of Technical Advisory Panels composed of experts who live in and work with the communities without access to quality data. While the Lacuna Fund won’t solve the systemic problem of bias in machine learning entirely, we are committed to elevating voices and research topics from marginalized people to support the debiasing process.

Support datasets to address practical, high-value problems

The Lacuna Fund is focused on applied machine learning problems. We seek to fund datasets that will lead to the research and tools that meet real needs in agriculture systems, automated text processing, and health modeling. Datasets focused on fundamental scientific questions are valid and useful, but are not the focus of the Fund.

Make datasets locally owned and openly available

The labeled datasets that do exist in many low- and middle-income countries are often privately held or costly to use. This limits development, especially for researchers and social entrepreneurs early in their careers. Datasets funded by Lacuna Fund will be owned and managed by the teams that build them and will be openly licensed for use by researchers around the world.

The first funding call seeks proposals to build agriculture datasets from teams across sub-Saharan Africa. If you work for an organization with a social impact mission and have expertise in data collection and labeling, you can apply at lacunafund.org/apply. Later this year, the Lacuna Fund will launch a second call on languages, and a request for information on health datasets. To learn more about these upcoming opportunities follow the Lacuna Fund on Twitter or sign up for its newsletter at lacunafund.org.