Header image

Using Geospatial AI to Monitor Rice Productivity in Cambodia

January 17, 2022 case-study artificial-intelligence data-visualization environment geospatial machine-learning remote-sensing satellite-imagery southeast-asia time-series-analysis development geospatial government health

Thinking Machines supported the Asian Development Bank’s assessment of agricultural productivity gains from improved irrigation programs. We generated plot-level rice yield estimates for over 67,000 plots of land in Cambodia by training an AI model to predict yield using open-source geospatial data:

  • An AI model was trained on data gathered through household surveys in the areas that were recipients of ADB’s irrigation projects.
  • The model used open geospatial data to indicate the rice crops’ growing condition and health throughout the different farming seasons.
  • The rice yield predictions were displayed on a web map for convenient visualization and analysis.
Extended rice yield data to over 50x more plots using machine learning, with predictions ranging from 3.8 to 5.5 tons per ha and an accuracy of +/- 0.56 tons per ha.

Since 2013, the Asian Development Bank (ADB) has devoted substantial resources to help increase agricultural productivity, reduce rural poverty, and improve the overall climate and disaster resilience of irrigation systems in select provinces in Cambodia. Traditionally, they have measured the impact of their programs on agricultural productivity by conducting on-the-ground surveys and meticulously recording each household's yield every season. However, surveying households across large areas can take months to complete and can cost millions of dollars. Conducting surveys during the pandemic also poses a new set of risks and constraints.

ADB partnered with Thinking Machines to find a faster, more scalable, and more affordable way of measuring the agricultural productivity of 15,240 farmer households across Cambodia.

Monitoring agricultural productivity across Cambodia

Our challenge was to use satellite data and other open-source datasets to create an explainable model to estimate rice yield of unsurveyed households. This will help ADB get a better sense of agricultural productivity at a more granular level and understand what factors affect rice yield. Training a model to predict rice yield using spatial data makes it possible to “virtually” survey hard-to-reach locations and cover a larger area in near real-time while minimizing cost.

Using open-source geospatial data to predict yield at field level

Capturing the complexity of rice plant growth means using indicative datasets. With this in mind, we used open-source geospatial data and satellite imagery features that represent plant health and different environmental factors. We used satellite images to show crop health and growth, average rainfall and soil moisture to indicate water availability, and land surface temperature to show plant stress. We also used a time-series approach to effectively represent the plants’ growth stages.

Combining the different datasets gave us a good look into the growing condition and plant health at a plot level. The testing and validation dataset contained the plot location, the reported yield for each season in tons per hectare, and the extracted geospatial features. We then used machine learning to predict the rice yield in over 64,000 unsurveyed fields covering more than 16,000 hectares across the country. The predicted yield ranges from 3.8 to 5.5 tons per ha and is within +/- 0.56 tons per ha of the actual yield.


A peek under the model’s hood also gives us insight into what factors contribute to high yield. The features that were most indicative of high crop yield were soil moisture, soil subsurface moisture, and land surface temperature. One of the specific characteristics associated with rice cultivation is the flooded soil environment. This explains why soil moisture and temperature (which affects how quickly water may evaporate) are the leading features that indicate yield.

The power of a well-designed map

We displayed the results on an interactive web map. It shows field boundaries for rice paddies in the project zones, developed in December 2020 by Quantitative Engineering Design (QED) colored based on our model-derived yield estimates. The web map allows the user to efficiently view the data for both surveyed and unsurveyed fields at any granularity they need by zooming in at a district level, or zooming out to compare yield across the province. It also shows the plots’ proximity to the irrigation canals, allowing them to see which low yield plots can be connected to irrigation. This web map ultimately provides the flexibility needed to understand how rice productivity changed across the study area during three different growing seasons.

Supplementing survey data with Machine Learning

Machine Learning can extend survey information and fill in spatial gaps at a fraction of the cost of a ground survey.

Putting the model predictions on a dynamic map visualized the data across three provinces in a dynamic and customizable way. ADB’s agricultural experts can filter the map by district and growth season, spot clusters of high or low yield plots, and see how the proximity of irrigation canals affects productivity. The model and web-based map helped ADB better assess the impact of their projects and programs by supplementing the results of the household impact surveys for 2020 with yield estimates from satellite imagery.


Zero in on the Philippines’ most vulnerable communities – with the click of a mouse

Estimate wealth and poverty for any 18 square kilometer area within a fraction of the time and cost of running a household survey.

Automating Financial Document Analysis for the World Bank with a Document Intelligence AI Engine

The World Bank aims to better understand local government units’ (LGU) spending patterns and trends in budget execution over a period of four years by examining the LGU’s financial statements collected by Philippines Commission on Audit (COA).

Using AI for Automatic Logo Detection on Store Shelves

We rapidly developed a high-performance logo detection model and front-end mobile application that identified our client’s product on retail shelves.