logo
logo
Sign in

Synthetic up-sampling benefits insurance companies

avatar
Nabeel Khalid
Synthetic up-sampling benefits insurance companies

Back in the old days, many teams of actuaries and underwriters did months of groundwork to unlock and analyse the information they had to calculate financial risks in order to issue an insurance policy. Nowadays, I imagine, a lot of the grunt work is done by computers, allowing firms to hire fewer actuaries in order to compete in this crowded insurance space. 

At the core of this work lies the data, which is not only very expensive to procure, it can also prove to be risky.

First of all, it is hard to get hold of personal data on consumers. Maybe it's a tad bit easier to get hold of data on existing policy holders, but with GDPR and growing lack of trust in organisations, it is fair to say people aren't queuing up to share their personal data. That too with insurance companies. 

I was speaking to an actuary socially where he advised me never to mention I'm a smoker in my life insurance policy. He is a smoker too, he said, and he would never mention it because he doesn't want to pay a higher premium for something that will be difficult to prove in the event of an untimely death. This was brand new information for me, because gullible old me would just have volunteered this information (which I still might do, for the sake of my own morality). But my key takeaway from this discussion was that information I volunteered to give up was not only extremely private, but also very valuable (in the hands of profit-seeking insurance companies). 

So in the absence of personal data given freely, insurance companies have no choice but to bank on publicly available data sets to make their computations, which is extremely limiting and also very expensive. 

Secondly, any personal data collected - privately or publicly - will still need to be anonymised before it can be used. There are many limitations of anonymised data but the biggest issue is that such data has lost almost all of its utility. The technique used to anonymise data can just as easily be reversed, which exposes companies to litigation in case of a leak in light of GDPR. 

Compliance becomes even more costly when you consider situations where data needs to be shared between different departments, team members working remotely in different locations, or even storing it in a cloud environment. It is imperative the consumer protection laws are upheld.

Lastly, existing data never gives a complete picture because it is not being collected in real-time - information may have been changed - or it simply may not add up to be a large enough data set for it to be useful to the data scientists and actuaries working at insurance companies. 

Enter synthetic data. To cut costs, put customer data protection first and keep their security intact, synthetic data is redefining the way insurance companies use and access their data. 

Synthetic up-sampling allows them to conduct sophisticated data analysis at a significantly lower cost without compromising safety or speed. It removes physical and geographical restrictions while allowing teams to share data safely without any private personal information being put at stake. 

Synthetic data samples that are created using deep learning to understand behavioural patterns in static and transaction data retain the value of the data, not the risk.

In other words, smart, AI-generated synthetic up-sampling perserves the signal which can then be used to break down silos and barriers to innovation. Data teams can focus on their core job instead of being hog-tied with regulations or spending months removing identifiable information from data which will have lost its utility by the time they are finished. 

Why not take a small, statistically representative set of data to build a much larger synthetic data set that can just as easily be augmented, parsed or used in a fraction of the time and that too in a much more economical way?

When it comes to testing and development of new insurance products, another benefit of synthetic up-sampling is it can help accelerate the coding process as large data sets can be synthesised and shared for testing. 

We have already seen case studies of synthetic data being used to improve credit risk rating systems which relies on the data to include and predict rare events, that tend to get filtered out or multiplied unrealistically when synthetic up-sampling is not being used to create more balanced data. 

I didn't wish to make this about anonymised data versus synthetic data, but in order to clearly distinguish the benefits of synthetic data, it was important to critically analyse the limitations of the current approaches being used in the insurance industry to highlight what's possible nowadays. 

The benefits do add up but the key takeaway (if any) should be this: for some insurance companies, using synthetic data might be the only way to ensure profitability in a highly competitive and crowded market. 

 

collect
0
avatar
Nabeel Khalid
guide
Zupyak is the world’s largest content marketing community, with over 400 000 members and 3 million articles. Explore and get your content discovered.
Read more