Understanding poverty in the Philippines with artificial intelligence
Thinking Machines helped the United Nations Development Programme (UNDP) generate nationwide wealth estimates in the Philippines with machine learning:
- AI model estimated the wealth index across the Philippines, based on the 2017 Demographic and Health Survey
- AI model was trained on open geospatial data sourced from OpenStreetMap, Facebook Marketing API, VIIRS Nighttime Lights, Land Surface Temperature, NDVI, etc.
- Rolled out the model to 18 square kilometer grids nationwide with better performance and at the fraction of the cost of our previous model
- Generated granular and nationwide map of wealth estimates
This project was done in partnership with UNDP and Zero Extreme Poverty PH 2030. We presented this work at the NeurIPS 2020 Machine Learning for the Developing World workshop. We are honored to have been recognized as the workshop’s Best Paper, which is available here.
By 2030, the United Nations aims to have solved 17 key Sustainable Development Goals to promote prosperity while addressing social and climate needs. While the Philippines has had considerable progress with some goals, extreme poverty is an ever-growing problem. In 2019, 20.8% of the Philippine population was estimated to be living below the poverty line, a number that may rise even further with the high rates of unemployment that COVID-19 has brought on. Humanitarian organizations are faced with the challenge of locating the poorest areas to strategically plan their poverty-alleviating interventions.
We worked with UNDP, the UN’s arm that works to eradicate poverty and social inequality, to identify the most vulnerable areas in the Philippines. The UNDP has partnered with Zero Extreme Poverty PH 2030, a Philippine coalition of NGOs, in pursuing the goal of eradicating extreme poverty. ZEP is spread across the country, having established presence in 400 cities and municipalities nationwide.
Strategically locating areas for extreme poverty interventions
For NGOs like UNDP and their partners, planning interventions requires identifying where poverty is rampant so that these areas can be targeted as beneficiaries. Traditionally, poverty assessment is informed by on-the-ground surveys like the Demographic and Health Survey (DHS).
While surveys are necessary and invaluable, they also have limitations – they are usually conducted every three to five years and cost millions of dollars with each round. Designed to interview only a representative sample of the population, they often exclude dangerous and inaccessible areas of interest. With the labor, cost, and incompleteness of surveys, it’s difficult to get a complete map of poverty in (near) real time.
We set out to develop an approach that could augment these surveys at a fraction of the cost. Previously, we built a deep learning model that used satellite data of nighttime lights to approximate wealth. While this model performed well, it had two shortcomings:
Cost: The model is significantly cheaper than a survey, but is expensive as a machine learning model. The high-resolution satellite images required would cost nearly $5,000 dollars to acquire, and the model also requires intensive compute resources for every run.
Interpretability: A deep learning model is a blackbox – it takes a satellite image as input and then produces a wealth prediction, but there are no intuitive details on how it went from A to B. Without knowing why an area was decided as wealthy or poor, social impact groups miss out on potentially useful information about poverty and have less reasons to trust the model.
With the goal of helping humanitarian organizations identify the most vulnerable areas in the Philippines, our challenge was to innovate our previous methodology and present UNDP with a cost-efficient and interpretable model for estimating poverty for the entire Philippines.
Using open geospatial datasets for AI
To build an explainable model, we needed to develop an approach that utilizes intuitive datasets. We combined multiple datasets to derive a rich set of insights for our model:
Google Earth Engine satellite images: We extracted features from cheaper, low-resolution satellite images, taking the pixel values of geographic features like night time lights, land surface temperature, and vegetation index (NDVI).
OpenStreetMap Points of Interest (POIs): From a data platform containing global volunteered geospatial data, we took the number of points of interest (e.g. banks, restaurants, convenience stores) in each area.
Facebook Marketing API data: We used Facebook’s marketing API to get the approximate number of Facebook users per area, with a breakdown of user segments such as users with 4G access, 3G access, 2G access, WiFi access, Apple devices, and mid-to-high valued goods consumer preferences.
Merging these multiple datasets yields a table with hundreds of features for each location. For our ground truth, we used the 2017 Demographic and Health Survey’s wealth index. The final training dataset is composed of the Demographic Health Survey’s location, its averaged wealth index, and all the geospatial features extracted above. We then used machine learning to predict the wealth indices of all locations in the Philippines.
The open geospatial model was more accurate than our deep learning model, with an r-squared of 0.66 (compared to the deep learning model with an r-squared of 0.63).
The improvements may be attributed to the addition of new information from rich feature sets apart from the nighttime lights. Because our model has more intuitive features, we can also profile each area to further understand how poverty is related to the access of goods and services.
A tale of two provinces
Upon completing the model for nationwide roll-out, we were able to rank areas according to their wealth. Seven out of the top ten most vulnerable areas are non-tourist spots in Palawan, while nine of the top ten wealthiest areas are in Metro Manila.
- Google Earth Engine - a platform for downloading and processing satellite images
- QGIS - an open-source GIS desktop application
- GeoPandas - an open-source Python module for wrangling geospatial data
- OpenStreetMap - a crowd-sourced geospatial data platform
Augmenting poverty estimation surveys with a cost-efficient and novel approach
We were able to create a model that reliably predicts the wealth index across the entire country. While the model is dependent on costly and randomly sampled surveys, the data we used for the model has full nationwide coverage – which can fill in spatial gaps for measuring poverty across the whole country. Rolling out wealth estimates for the whole population at a negligible cost, at a national scale, all in a matter of minutes is now possible.
Thanks to this low-cost, granular, and fast roll-out, we also generated a granular map of wealth estimates in the Philippines. With the intuitive features from our input data as in the last figure, users can quickly zoom into specific areas to see what resources are accessible to them. The map can also be layered with other datasets for cross analysis. Our partners at UNDP can use these maps for further socio-civic analysis as well as data-driven intervention planning. In their case, the wealth estimates were overlaid with the locations of their partner organizations to determine their proximity to the most vulnerable areas. Other layers such as other vulnerability indicators, COVID cases, and even wealth estimates over time, are open for future research.
Read more about the wealth model in our technical note here.