Leverage unstructured data to improve preventive care
The Big Picture
A major US health insurance firm wanted to assess the riskiness of its customers. Traditionally, the company used structured data sources, such as customer demographics, past claims data, and past health details, to predict the likelihood of a customer raising a claim in a specified number of days. The company wanted to augment a structured data approach by leveraging insights from unstructured data to enabled more accurate predictions.
To solve the company’s challenges, a large repository of call center data was used. This was unstructured data in the form of call transcripts. The approach was to use a big data platform and Spark to process the call center data and use Python to develop a model on ‘propensity to claim’ using only the unstructured data.
The output of this model was then used as an input in the model with structured data. The final model was an ensemble of the models of structured data and unstructured data. The enhanced data set was used to build member risk scores. Members were prioritized based on their risk, so the company could provide better and more focused care. The model performed significantly better than the model that used structured data alone.
As a result of the approach, it was found that mining unstructured text added significant value over the structured data analysis techniques alone. The ensemble of methods enabled:
- An 11%-point improvement in KS statistic (which assessed ‘goodness of fit’ i.e., how well the statistical model reflected the data) – from 35% up to 46%.
Several key indicators of model performance were also increased:
- A 12%-point increase in the success rate in the top three deciles (lift) – from 63% up to 75%.
- A 10%-point increase in concordance – from 66% to 76%.
- A 4%-point increase in model classification accuracy – from 68% to 72%.