Greg Yetman is codirector of the Center for International Earth Science Information Network (CIESIN), a part of the Climate School at Columbia University. As part of a NASA contract, CIESIN has been exploring ways to deliver socioeconomic data by observing the Earth since the early 1990s. Yetman says things like understanding that it’s common for people to live in basement apartments in the Queens borough of New York City, for example, are “always hard to capture and really difficult to measure from space.” Apartment conversions, sublets by an owner or occupant, or unregistered settlements—all likely to increase as the cost of living climbs—aren’t often captured by the census or satellites either. And if a person is unhoused or has few financial records, they may not show up in location-sharing data collected by private brokers.
There’s room to improve on the census in the US, but the Constitution requires that one be conducted every decade, and Yetman says the country is “data rich.” By comparison, some countries haven’t carried out detailed household surveys in decades. Obstacles such as cost, conflicts, or difficulty reaching remote locations can make some communities harder to count.
In 2017, the Nigerian government, CIESIN, and others working with funds from the Bill & Melinda Gates Foundation used satellite imagery and machine learning to map the country’s population to deliver measles vaccinations. Since then, Gates Foundation senior program officer Vince Seaman says, the effort has expanded to five other African countries, a project known as Grid3. That work, he adds, demonstrates that the tech is only part of the solution. After applying machine learning to photos from satellites, community surveys were carried out to reach thousands of people in person and to verify results.
In research published last month, satellite imagery and machine learning were used to automatically identify housing plots and predict population, age, and sex in five provinces in the western half of the Democratic Republic of Congo (DRC). The project brought Grid3 participants like the University of Southampton in the UK together with groups like the DRC’s National Bureau for Statistics. Anonymous surveys of nearly 80,000 people were carried out by the Kinshasa School of Public Health and University of California, Los Angeles School of Public Health to validate the performance of a deep learning model that achieved about 80 percent accuracy. Coauthors say their method is no replacement for a true attempt to count the entire population, but it can supply a predictive snapshot of society in places with little or poor-quality data. No national census has taken place in the DRC since 1984.
Yetman has spent more than 20 years working with satellite images. He works with Pop Grid, a data collaborative for a diverse group of organizations that count populations, including the European Commission, Facebook, the German Aerospace Center, and NASA. He says deep learning models for identifying buildings can’t always tell where one roof ends and another begins, and he warns there’s no such thing as a model that works everywhere in the world.
In the US, he explains, applying an AI model trained using images of roofs from the western US is problematic if it’s applied to homes on the East Coast because the western expansion of the country follows a grid-based system, while cities like Boston developed with less uniformity. Equally, a roof in South Africa looks different from one in Zambia. AI can easily mistake the roof of a stall at a commercial market in Accra, Ghana with the roof of an unregistered home or struggle to accurately predict the number of people in urban settlements or rural villages. “Without the on-the-ground survey that says there’s a slum or informal settlement here, it’s really difficult to know just from the structure of the roof patterns,” Yetman says. He adds that obtaining high-quality data for training models to detect buildings or home plots based on local conditions is the hardest part of the job.