How Open-Source Big Data Can Tackle Healthcare Access using CARTO
In light of the pandemic, it is very important for us to have a quick and scalable way of identifying vulnerable populations for pinpointed interventions. One example of this is how important it is to make sure that the majority of the population has access to the necessary health facilities that they can go to should they need it. Now, more than ever, government and health groups need to be able to quickly identify where to allocate resources in a way that allows as much impact as possible.
Luckily, through the excellent work of the scientific and open-source geospatial community, we have global datasets that help inform this decision making. We have global datasets on a lot of indicators such as population, health facilities, and mobility, all of which helps in being able to quickly pinpoint these areas -- even down to the house-level granularity.
However, with large datasets comes an even larger problem -- computing power. With you having to process gigabytes of information to access these datasets, every single processing step becomes a blocker -- something as simple as viewing data on a map becomes a huge endeavor. And with huge processing tasks come huge resource requirements of renting out the compute capacity. Thus, for groups without the technical know-how around geospatial processing options or without the resources, it becomes difficult and/or costly to do very granular countrywide geospatial analysis.
Understanding this problem, CARTO, with the support of BigQuery, developed BigQuery Tiler -- a quick and easy tool to process, visualize and, thereafter, analyze, large spatial datasets straight from BigQuery. Using this technology together with Thinking Machines’ datasets and geospatial processing expertise, we created a demo of how we can quickly identify healthcare gaps at scale.
Going back to the problem earlier -- how do we identify high impact locations for the construction of new health facilities. We use two very popular datasets as a proof-of-concept:
- Facebook’s High-Resolution Settlement Layer, which provides population estimates for up to 30m x 30m granularity.
- Health Facilities in OpenStreetMap. Crowdsourced information on locations of specific points of interest all over the world.
Our goal is to be able to identify high concentrations of settlements that do not have access to health facilities within a certain distance. For the purpose of this blog post, we focus on the Philippines, Malaysia and Vietnam -- a total of almost 1 million square kilometers in terms of area. The population layer alone has around 19.6M rows in its dataset, and with the health facilities being a bit over a million points of interest in total.
Using BigQuery Tiler, we’re able to load both datasets onto a map in almost no time at all, without having to worry about any ETL, loading times or cost!
BigQuery Tiler allows us to partition our very large datasets in BigQuery into vector tiles, which makes loading and visualization of datasets much more manageable for our web maps. What this means is that we can easily view the population and health facility data of an entire country without having to worry about the dataset size or scale.
Once it’s on the map, we can also easily build analysis layers on top. For example, we can filter out settlements that already have access to health facilities. This allows users to focus primarily on areas that are not within a certain distance to a health facility -- a distance that can be easily chosen by the user.
We can even quantify the vulnerable population within an area by using our drawing tools to select custom areas of interest and easily summarize the data based on that.
A big plus is that we can generalize the same methodology across many different types of use cases and datasets. For example, at Thinking Machines, we use Machine Learning and AI to extract wealth information from satellite images at scale. We’re able to combine our extracted wealth information with building infrastructure to allow our telecommunications partners to identify ideal locations for cell sites based on their target wealth profiles and potential customer volume. We can easily to extend this to other industries or use cases that requires any sort of expansion or site selection.
And now, with BigQuery Tiler, we’re able to collect that information without having to worry about the scale and compute required to visualize and process the data.