Database De-Duplication and Cleansing

The Background

Our client operates a contact centre with a central database that was used internally, and it was also sold as a product. Much of this data was acquired at a considerable cost over time, either through purchasing data or through manual research and data capture. Maintaining and cleaning their database used resources and time. However, they saw the positives in this, suggesting to DDC AS that their database was 99.5% clean with virtually no duplicates anywhere. As this database was a key asset to their company, they wanted to check its integrity to be able to provide it to third parties with confidence.

The Problem

Whilst on the surface matching structured and semi-structured data can appear to be simple, it is an extremely challenging and complicated task that can be dismantled by human error. Not only is the process incredibly time consuming, it also can be marred by mistakes. 


DDC AS’s tried and tested Digital Agents can match simultaneously on a variety of fronts. Matching may be required to, for example, enrich product files or to compare product offerings of different retailers as part of a range and price comparison exercise. However, the most frequent time-consuming and business critical activity is the comparison of names and addresses. 


The reason for this is because name and address data are not structured. People will input these details in a variety of ways, and it can be challenging to digest. To further this issue, people may change their details and will mix old details with new details. Other issues following in this line of complication are: 

  • Operational systems using different data formats and rules
  • An abundance of spelling mistakes and abbreviations 
  • Data could be entered by subject (or via a contact centre) 
  • There are multiple legitimate versions of name and addresses 
  • Potential fraud whereby individuals (and companies) may deliberately use different forms of name and address to obfuscate identities. 

Our client needed a checking procedure to see how clean their database was. They had recently questioned the human efficiency related to matching datasets and decided to employ a ‘Hybrid’ workforce to battle this problem. DDC AS seamlessly integrated the hybrid workforce into this solution pathway. 

Our Solution

The harsh reality facing a larger number of firms today is that databases can include an incredible amount of variance through inputs. The chances are that without DDC AS’s intelligent Digital Agents, a costly and time-consuming process would have returned fewer assured results in a longer time period, at a higher price.


In the case of our client, during the initial consultation they had suggested that their database was “99.5%” clean. However, our Digital Agents identified, for example, one case where there were 27 master file records for the same company at the same location but with enough variations in names and addresses for the normal database and search process not to detect them. Luckily, our client had deployed a tireless Digital Agent that had detected these duplicates and reported back instantly. 


Our client received a detailed report of the numbers of strong duplicates, likely duplicates and possible duplicates through our Digital Agent technology. After review, and agreeing thresholds, the client received a cleansed database, with duplicates removed and remaining data content cleansed and standardised in line with agreed business rules. Not only did DDC AS locate and notify our client of false matches, but the relationship continued with the client, and the database was cleansed to a standard whereby all discrepancies had been removed.  

The Outcome

The harsh reality facing a larger number of firms today is that databases can include an incredible amount of variance through inputs. The chances are that without DDC AS’s intelligent Digital Agents, a costly and time-consuming process would have returned fewer assured results in a longer time period, at a higher price.


In the case of our client, during the initial consultation they had suggested that their database was “99.5%” clean. However, our Digital Agents identified, for example, one case where there were 27 master file records for the same company at the same location but with enough variations in names and addresses for the normal database and search process not to detect them. Luckily, our client had deployed a tireless Digital Agent that had detected these duplicates and reported back instantly. 


Our client received a detailed report of the numbers of strong duplicates, likely duplicates and possible duplicates through our Digital Agent technology. After review, and agreeing thresholds, the client received a cleansed database, with duplicates removed and remaining data content cleansed and standardised in line with agreed business rules. Not only did DDC AS locate and notify our client of false matches, but the relationship continued with the client, and the database was cleansed to a standard whereby all discrepancies had been removed.  

More Success Stories

SUCCESS STORY

Regulatory Control

Our client based in mainland Europe has a global client base and receives a high volume of data on a daily basis via email.

Learn More
SUCCESS STORY

Proactive Customer Account Management in the Transportation Industry

The client is an organisation operating a commercial fleet fuel card...

Learn More
SUCCESS STORY

Search Engine Optimisation in the Construction Industry

Our client provides platforms to simplify the specification process of construction in order to dramatically improve connectivity and efficiency within the industry...

Learn More
SUCCESS STORY

Verifying datasets in the Insurance Industry

Our client is a home construction insurance and warranty provider whose product basis surrounded offering insurance cover for builders and developers undertaking large construction projects....

Learn More
SUCCESS STORY

Survey Analysis in the Home Improvement Industry

Our client was an international home improvement company with over 1,380 stores across 10 countries and the company had a vast net of 80,000 colleagues....

Learn More