Using Geospatial AI to Monitor Rice Productivity in Cambodia
Thinking Machines supported the Asian Development Bank’s assessment of agricultural productivity gains from improved irrigation programs. We generated plot-level rice yield estimates for over 67,000 plots of land in Cambodia by training an AI model to predict yield using open-source geospatial data:
- An AI model was trained on data gathered through household surveys in the areas that were recipients of ADB’s irrigation projects.
- The model used open geospatial data to indicate the rice crops’ growing condition and health throughout the different farming seasons.
- The rice yield predictions were displayed on a web map for convenient visualization and analysis.
Along with the Asian Development Bank, this project was done with the support of the government of Cambodia.
Since 2013, the Asian Development Bank (ADB) has devoted substantial resources to help increase agricultural productivity, reduce rural poverty, and improve the overall climate and disaster resilience of irrigation systems in select provinces in Cambodia.
Traditionally, they have measured the impact of their programs on agricultural productivity by conducting on-the-ground surveys and meticulously recording each household's yield every season. However, surveying households across large areas can take months to complete and can cost millions of dollars. Conducting surveys during the pandemic also poses a new set of risks and constraints.
ADB partnered with Thinking Machines to find a faster, more scalable, and more affordable way of measuring the agricultural productivity of 15,240 farmer households across Cambodia.
Monitoring agricultural productivity across Cambodia
Our challenge was to use data from ADB’s project team and the project office of the government of Cambodia, satellite data, and other open-source datasets to create an explainable model to estimate rice yield of unsurveyed households. This will help ADB get a better sense of agricultural productivity at a more granular level and understand what factors affect rice yield. Training a model to predict rice yield using spatial data makes it possible to “virtually” survey hard-to-reach locations and cover a larger area in near real-time while minimizing cost.
Using open-source geospatial data to predict yield at field level
Capturing the complexity of rice plant growth means using indicative datasets. With this in mind, we used open-source geospatial data and satellite imagery features that represent plant health and different environmental factors. We used satellite images to show crop health and growth, average rainfall and soil moisture to indicate water availability, and land surface temperature to show plant stress. We also used a time-series approach to effectively represent the plants’ growth stages.
Combining the different datasets gave us a good look into the growing condition and plant health at a plot level. The testing and validation dataset contained the plot location, the reported yield for each season in tons per hectare, and the extracted geospatial features. We then used machine learning to predict the rice yield in over 64,000 unsurveyed fields covering more than 16,000 hectares across the country. The predicted yield ranges from 3.8 to 5.5 tons per ha and is within +/- 0.56 tons per ha of the actual yield.
A peek under the model’s hood also gives us insight into what factors contribute to high yield. The features that were most indicative of high crop yield were soil moisture, soil subsurface moisture, and land surface temperature. One of the specific characteristics associated with rice cultivation is the flooded soil environment. This explains why soil moisture and temperature (which affects how quickly water may evaporate) are the leading features that indicate yield.
The power of a well-designed map
We displayed the results on an interactive web map. It shows field boundaries for rice paddies in the project zones, developed in December 2020 by Quantitative Engineering Design (QED) colored based on our model-derived yield estimates. The web map allows the user to efficiently view the data for both surveyed and unsurveyed fields at any granularity they need by zooming in at a district level, or zooming out to compare yield across the province. It also shows the plots’ proximity to the irrigation canals, allowing them to see which low yield plots can be connected to irrigation. This web map ultimately provides the flexibility needed to understand how rice productivity changed across the study area during three different growing seasons.
- Python - we used python libraries, such as pandas, geopandas, and scikit-learn for data preparation and prediction
- Google Earth Engine - a platform for downloading and processing satellite images
- QGIS - an open-source GIS desktop application
- Mapbox - a provider of custom online maps for websites and applications
Supplementing survey data with Machine Learning
Machine Learning can extend survey information and fill in spatial gaps at a fraction of the cost of a ground survey.
Putting the model predictions on a dynamic map visualized the data across three provinces in a dynamic and customizable way. ADB’s agricultural experts can filter the map by district and growth season, spot clusters of high or low yield plots, and see how the proximity of irrigation canals affects productivity. The model and web-based map helped ADB better assess the impact of their projects and programs by supplementing the results of the household impact surveys for 2020 with yield estimates from satellite imagery.