Header image

Mapping New Informal Settlements for Humanitarian Aid through Machine Learning

July 23, 2020 case-study geospatial machine-learning location mapping maps satellite-imagery time-series-analysis development government logistics ngo non-profit

Thinking Machines helped iMMAP accelerate the discovery of new informal settlements in Colombia by training an AI model to detect them from satellite imagery:

  • AI model was trained on field data of informal migrant settlements collected on-the-ground by iMMAP in 2019
  • Rolled out the model to municipalities with high incidence of Venezuelan Migrants
  • Detected more than 350 informal settlements across 68 municipalities in just a few weeks
  • Have verified the validity of over 70 settlements by working with Premise, our on-the-ground validation partners.

This project was done in partnership with iMMAP and USAID.


Since 2014, nearly 2 million Venezuelans have fled to Colombia to escape an economically devastated country during what is one of the largest humanitarian crises in Latin America’s recent history. Many migrants struggle to survive as they face extreme poverty, poor living conditions, unemployment, food insecurity, and health problems, exacerbated further by the ongoing COVID-19 pandemic. Humanitarian organizations are now faced with the overwhelming challenge of locating these informal settlement communities scattered throughout the country to quickly deliver aid and support.

We worked with iMMAP, an international not-for-profit organization that provides information management services to humanitarian and development organizations, to locate these new migrant informal settlements in Colombia as a result of the Venezuelan crisis. To augment iMMAP’s manually-gathered field data, we produced probability maps from satellite imagery and machine learning models to quickly identify and guide the validation of informal settlement locations across the country.


Quickly locating new informal settlements for on-the-ground validation for the entire country

In order to obtain reliable data, our partners at iMMAP would locate scattered informal migrant settlements along the border of Colombia with Venezuela to complete high coverage field surveys and interviews. Depending on the resources available, it could take them days or even weeks to locate these settlements on the ground even in just a small area.

Our challenge was to present our partners at iMMAP with a time- and cost-efficient approach to locate new and emerging informal settlements to cut down the time and manpower needed to identify these settlements.

With the goal of helping humanitarian organizations focus their efforts in areas with higher likelihoods of housing migrant populations, we wanted to use satellite imagery to quickly detect areas where new informal settlements have appeared in the past few years. This will allow iMMAP and their partners to focus on sending on-the-ground validation and aid, without having to spend so much time on looking for the settlements they want to help.


Can we use Machine Learning and Time Series Satellite Images to accelerate the identification of informal migrant settlements that have emerged recently?

In satellite images, every pixel represents a specific location/area in the world. Just like how humans can identify objects when looking at photos, we can train a machine learning model to scale up this task for every pixel in every satellite image we need to classify. So if we’re able to create a model that classifies whether or not there’s a new informal settlement in a given area, then we’re able to pinpoint the exact locations and speed up the processes of our humanitarian partners in locating and identifying these settlements.

Processing Satellite Imagery and the Model Inputs

We acquired publicly available satellite images from Google Earth Engine and generated composites from 2015 to 2020 to reduce cloud cover in our areas of interest. We then created indices to bring out certain features of the composites, such as greenery or vegetation, to detect the settlements. Now that we have the input data, the question is whether or not we are able to find patterns from these to allow us to differentiate between normal settlements, non-occupied land, and informal settlements.

Sentinel-2A Biennial Composite Generation: Cloud cover removal and median aggregation.

Creating Our Ground Truth and Modelling the Probability Map

We then used Machine Learning classifiers to train our model to identify the differences between pixels containing informal settlements and pixels containing formal settlements or unoccupied land masses. To do this, we gave it positive examples from field data gathered by iMMAP in 2019, which contained ground-validated coordinates of informal migrant settlements in Maicao, Riohacha, Uribia, Arauca, Arauquita, Tibu, Cucuta, Soacha, and Bogota; and negative examples which were generated from randomly selected and validated grid blocks in urban areas, mountainous areas, grassy areas, etc.

Instead of just creating a binary yes/no map of informal settlements with this, our final output is a probability map where each pixel’s brightness corresponds to the probability of it being an informal settlement. This helps us maximize the number of settlements we are able to detect.


Two-Step Post-Classification Validation

Of course, it doesn’t end there. Once we had the predictions, we worked with iMMAP on two additional validation steps:

Step 1: Remote Validation

Once we have the informal settlement probability map, our partners at iMMAP are then tasked to manually inspect high-resolution historical satellite imagery in Google Earth Pro, starting with the brightest conglomeration of pixels. To distinguish informal Venezuelan migrant settlements, we want to look out for the following:

  1. Slum-like characteristics including small roof sizes, disorganized layout of houses, and lack of nearby road structures; and
  2. The absence of a settlement on Google Earth Pro satellite imagery from 2014--or when the Venezuelan mass migration began--, and the emergence of an informal settlement in that area on any date after 2014.

Step 2: On-the-Ground Validation

Once potential informal settlements are identified, we then draw vector polygons around the candidate areas using QGIS or Google Earth Pro. These polygons are collated and shared with our partners, Premise Data, which then enables the contributor network in the region to identify if these pre-identified settlements are actual locations where Venezuelan migrants are living. Using their proprietary app, Premise’s contributor network completes surveys and observations (photographs) within these predefined polygons. The contributors are able to locate the settlements through the map shown on the mobile application and submit answers and photos that can help validate if these areas actually do house Venezuelan migrants.

A second task within the app incentivizes the contributors to return to the settlements and complete a monitoring task, which focuses on identifying specific needs that the inhabitants of the settlements have with regards to water and sanitation, health, food security and overall living conditions.

Here is an example of the validation process of a settlement in Norte de Santander:

1) A settlement is identified by Thinking Machines.

New settlements identified in Bogotá, D.C., Colombia.

2) Polygons of the settlements are drawn on Geojson and ingested into the Premise platform.

3) Tasks appear in the Premise app.

4) On the ground Premise contributors take photos of the settlements and answer questions through the Premise app.

A satellite view of a settlement in Arauca, Colombia.

Photos taken by Premise contributors of the settlement in Arauca.

5) iMMAP gets access to the results through a Premise dashboard which they can use to visualize the results in aggregate and see the submitted photos by location .


Automating settlement mapping for quick and cost-effective validation and response

At the end of it all, we were able to create a model that reliably predicts the presence of informal settlements, and we’re able to roll-out that model for any municipality in Colombia in a span of minutes. As of this writing, we have helped iMMAP identify and validate the location of more than 350 potentially new settlements across Colombia, with around 70 of those already on-the-ground validated, with the rest still currently undergoing validation. Moving forward, they can continue to use this model to detect new settlements in the future.

Our partners at iMMAP now have a novel approach to streamline their process and efficiently use their time and resources towards supporting and coordinating the international community’s response towards vulnerable Venezuelan migrant communities. The probability maps guide them towards areas with high probability of informal migrant settlements which they can then quickly verify and mobilize to. NGOs and LGUs can then provide these communities with targeted humanitarian aid and assistance, as well as monitor their state of well-being over time.


Effective Targeted Campaigns with Machine Intelligence

Advanced market segmentation using Machine Learning to develop targeted campaigns to improve consumer behavior around payment operations

MAPPED: Danger Zones in Metro Manila's Roads

Which roads and intersections are the most accident prone for motorcycles? Where are pedestrians most vulnerable to be injured? Which routes are dangerous for both?

Scaling Better with AI

A majority of AI projects fail. How do we make sure they hit the mark?