Creating a Single Customer View out of Big Messy Data
LBC Express is one of the largest courier companies in the Philippines providing logistics and remittance services to hundreds of thousands of customers daily. As they serve walk-in customers and manually encode their information per transaction, their available customer information data has been messy and inconsistent. On top of that, the on-premise software they use to deduplicate customer information has been inefficient, taking long periods of time to output results at low levels of accuracy.
Using our customer intelligence engine, we worked with LBC to develop a Single Customer View (SCV) in the cloud that identifies their unique customers from their daily transactional records for faster and more efficient performance.
- Because of manual encoding and inconsistent data quality generated by walk-in customer transactions, LBC’s deduplication software would take 3 weeks to generate a unique list of customers from their accumulated transaction data with low levels of deduplication
- We productionalized an SCV solution that utilizes artificial intelligence to deduplicate daily transactions and create a “golden” set of LBC’s unique customers
- We also built a web application that gives LBC the ability to view and link/separate customer records and provide feedback to improve the SCV engine’s performance
- With our SCV platform, we shortened LBC’s runtime for identifying customers down to 12 hours while processing 15 GB worth of data every day.
LBC Express is one of the largest express courier, cargo, and money remittance companies in the Philippines, with over 1,000 branches nation-wide serving hundreds of thousands of customers daily. Although the Philippine logistics market continues to grow, competition has been heating up rapidly with an increasing number of disruptive players entering the market. Facing increasing competition, LBC is focused on maintaining their market leadership, and a core pillar of their strategy moving forward is to drive increased brand loyalty through razor-sharp targeting of their existing customer base with relevant and timely promotions.
In line with this, LBC had been using existing deduplication software that applied pre-defined business rules that identified unique customers from transactional records generated daily by their branches.
The challenge with their software was that the tool was slow: every month, it would take 3 weeks to generate a list of unique customers from hundreds of millions of transactions, not just from the past month but including those accumulated from previous years. What’s more, because the quality of the transactional data encoded from walk-in customers was low and inconsistent, the tool ended up deduplicating roughly a quarter of the records. As a result the software ended up being useless in accurately identifying customers, let alone in developing hyper-personalized marketing campaigns in a timely and efficient manner.
We deployed a Single Customer View (SCV) engine on LBC’s transaction data to identify unique customers in 3 key steps: data cleaning, name classification, and deduplication.
Cleaning the Transactional Data in Preparation for Deduplication
One of LBC’s initial challenges with their deduplication software was that inconsistencies with transactional data led to a low number of identified unique customers. To improve the quality and consistency of the data, our SCV engine cleaned and standardized the fields and values in the transactional dataset. This enabled us to reduce the size of the dataset by 70% and left us with clean and relevant information for identifying unique customers.
Classifying Customer Names
To further increase accuracy in identifying unique customers, our SCV engine classified and structured customer names encoded in LBC’s transactions. Using naming datasets as a reference along with other business rules (e.g., names containing “Inc” or “Corp”), our SCV engine first classified customers as either companies or individuals. Our SCV engine then separated and structured individual customers’ names into their first, middle, and last names. This would be beneficial later on when dealing with customers with similar names.
Using Artificial Intelligence to Identify Customers
With cleaned, classified, and structured transactional data, we could now fully leverage our SCV engine’s artificial intelligence capabilities to identify unique customers. To do this, our SCV engine determined how similar each transaction’s customer details were compared to other transactions’ -- the more similar a transaction’s customer details were to another, the higher the likelihood that a unique customer performed both transactions.
Our engine computed similarities by comparing fields across millions of transactions to produce a similarity confidence score. If the confidence score was above a set threshold, the SCV engine would link transactions together, and then identify unique customers based on the sets of linked transactions. Using this process, our SCV engine was able to achieve 93% recall in identifying unique customers from transactions when conducting a test on a training data set, versus 25% using the old software on the same training data set.
Augmenting LBC’s Operations
As the SCV engine learned to group together records and identify customers based on existing data, we also built a custom web application for LBC to manually merge together or separate unique customer records when they would be misidentified. Apart from enabling LBC to ensure the accuracy of their unique customer database, the web application would also generate new labelled data to retrain the SCV engine so that it could more accurately identify unique customers in the future.
The following tools were crucial to the success of the project:
- Google BigQuery: a cloud-ready data architecture on which multiple data sources can be stored and utilized at scale
- Google Dataflow: a cloud-based data processing service for integrating, preparing and analyzing large data sets
- ElasticSearch: an open-source full-text search and analytics engine
LBC’s previous deduplication software utilizing defined business rules took up to 3 weeks to deduplicate unique customers with low levels of accuracy. Using our SCV engine, LBC is now able to work faster as they are able to deduplicate over 1 million transaction records daily. Now, they are able to generate their list of customers in less than 12 hours and identify unique customers more correctly. These efficiencies enable them to execute targeted marketing activities in a more timely manner.
Apart from efficiency, our SCV engine has also provided LBC with the benefits of a flexible platform customized to their specific needs. LBC can adjust certain parameters of the engine (e.g., weights, confidence threshold) to suit their requirements with minimal effort.
With our SCV engine, we were able to provide LBC with an efficient solution that maximizes the value of their available data and enables them to grow their business through more personalized and targeted marketing.