Data Quality Analysis

The goal of the work was to improve data quality by identifying anomalies in Life Insurance Policy data before use in an actuarial model.


What we have done:

We assisted in modernising the traditional approach to data checking. The traditional approach consisted of feeding data through a series of predetermined business rules. If it failed one of the rules it was flagged further for analysis to ensure the cause of the issue is addressed for example by correcting the data point, should it be as a result of error. Our work involved:

  • Grouping policy data into similar product categories i.e. annuities, protection, unit-linked and with-profits model points.

  • We applied anomaly detection techniques that incorporate a broad group of algorithms known as cluster analysis (in this case k-means clustering). Cluster analysis placed data into different groups and data points that are isolated from any of the groups are flagged as anomalies.

  • When we applied cluster analysis, this erroneous point was isolated from the rest and detected.



Data Quality Analysis using Clustering
Data Quality Analysis using Clustering