How does One Consume an Ocean of Data? A Meaningful Sip at a Time

So many data, so many ways to use it, ignore it, misapply it, co-opt, brag, and lament about it.  It’s the new oil as suggested not long ago by Clive Humby, data scientist, and has been written of recently by authorities such as Bernard Marr in  Forbes wherein he discusses the apt and not so apt comparison of data and oil.  Data are, or data is?  Can’t even fully agree on that application of the plural (I’m in the ‘are’ camp.)  There’s an ongoing and serious debate on who ‘owns’ data- is possession 9/10 of the law?  Not if one considers the regs of GDPR, and since few industries possess, use, leverage and monetize data more than the insurance industry forward-thinking industry players need to have a well-considered plan for working with data, for, at the end of the day it’s not having the oil, but having the refined byproduct of it, correct?

Tim Stack of technologies solutions company, Cisco, has blogged that 5 quintillion bytes of data are produced daily by IoT devices.  That’s 5,000,000,000,000,000,000 bytes of data; if each were a gallon of oil the volume would more than fill the Atlantic Ocean.  Just IoT generated bits and bytes.  Yes, we have data, we are flush with it.  One can’t drink the ocean, but must deal with it, yes?

I was fortunate to be able to broach the topic of data availability with two smart technologists who are also involved with the insurance industry, Lakshan De Silva, CTO of Intellect SEEC, and Christopher Frankland , Head of Strategic Partnerships, ReSource Pro and Founder, InsurTech 360″.  Turns out there is so much to discuss that the volume of information would more than fill this column- not by an IoT quintillions’ factor but a by a lot. 

With so much data to consider, it’s agreed between the two that
understanding the need of data usage guides the pursuit.  Machine Learning (ML) is a popular and
meaningful application of data, and “can bring with it incredible opportunity around
innovation and automation. It is however, indeed a Brave New World,” comments
Mr. Frankland.  Continuing, “Unless you
have a deep grasp or working knowledge of the industry you are targeting and a
thorough understanding of the end-to-end process, the risk and potential for hidden technical debt is real.” 

What?  Too much data, ML methods to
help, but now there’s ‘hidden technical debt’ issues?  Oil is not that complicated- extract, refine,
use.  (Of course as Bernard Marr reminds
us there are many other concerns with use of natural resources.)  Data- plug it into algorithms, get refined ML
results.  But as noted in Hidden
Technical Debt in Machine Learning Systems
, ML brings challenges of which
data users/analyzers must be aware- compounding of complex issues.  ML can’t be allowed to play without adult
supervision, else ML will stray from the yard.

From a different perspective Mr. De Silva notes that the explosion of
data (and availability of those data) is, “another example of disruption within
the insurance industry.”  Traditional methods
of data use (actuarial practices) are one form of analysis to solve risk problems,
but there is now a tradeoff of “what risk you understand upfront”, and “what
you will understand through the life of a policy.”  Those IoT (or, IoE- Internet of Everything,
per Mr. De Silva) data that accumulate in such volume can, if managed/assessed efficiently,
open up ‘pay as you go’ insurance products and fraud tool opportunities.

Another caution from Mr. De Silva- assume all data are wrong unless you prove it otherwise. This isn’t as threatening a challenge as it sounds- with the vast quantity and sourcing of data- triangulation methods can be applied to provide a tighter reliability to the data, and (somewhat counterintuitively,) with the analysis of unstructured data with structured across multiple providers and data connectors one can be helped to achieve ‘cleaner’ (reliable) data.  Intellect SEEC’s US data set alone has 10,000 connectors (most don’t even agree with each other on material risk factors) with 1,000s of elements per connector, then multiply that by up to 30-35 million companies, then by the locations per company and then directors/officer of the company. That’s just the start before one considers effects of IoE.

In other words- existing linear modeling remains meaningful, but with the instant volume of data now available through less traditional sources carriers will remain competitive only with purposeful approaches to that volume of data.  Again, understand the challenge, and use it or your competition will.

So many data, so many applications for it.  How’s a company to know how to step
next?  If not an ocean of data, it sure
is a delivery from a fire hose.  The
discussion with Messrs. De Silva and Frankland provided some insight.

Avoiding Hidden Debt and leveraging clean data is the path to a “Digital Transformation Journey”, per Mr. Frankland.  He recommends a careful alignment of “People, Process, and Technology.”  A carrier will be challenged to create an ML-based renewal process absent involvement of human capital as a buffer to unexpected outcomes being generated by AI tools.  And, ‘innovating from the customer backwards’ (the Insurance Elephant’s favorite directive)  will help lead the carrier in focusing tech efforts and data analysis on what the end customers say they need from the carrier’s products. (additional depth to this topic can be found in Mr. Frankland’s upcoming Linked In article that will take a closer look at the challenges around ML, risk and technical debt.)

In similar thinking Mr. De Silva suggests a collaboration of business facets to unlearn, relearn, and deep learn (from data up instead of user domain down), fuel ML techniques with not just data, but proven data, and employ ‘Speed of Thought’ techniques in response to the need for carriers to build products/services their customers need.  Per Mr. De Silva:

“Any company not explicitly moving to Cloud-first ML in the next 12 months and  Cloud Only ML strategy in the next two years will simply not be able to compete.”

Those are pointed but supported words- all those data, and companies need
to be able to take the crude and produce refined, actionable data for their operations
and customer products.

In terms of tackling Hidden Debt and ‘black box’ outcomes, Mr. Frankland
advises that points such as training for a digital workforce, customer journey
mapping, organization-wide definition of data strategies, and careful application
and integration of governance measures and process risk mitigation will  collectively act as an antidote to the two
unwelcome potential outcomes.

Data wrangling- doable, or not? 
Some examples in the market (and there are a lot more) suggest yes.


Consider the volume of hazard data available for consideration within a jurisdiction
or for a property- flood exposure, wildfire risk, distance to fire response
authorities, chance of sinkholes, blizzards, tornadoes, hurricanes, earthquakes
or hurricanes.  Huge pools of data in a
wide variety of sources.  Can those
disparate sources and data points be managed, scored and provided to property
owners, carriers, or municipalities? 
Yes, they can, per Bob
of HazardHub, provider of
comprehensive risk data for property owners. 
And as for the volume of new data engulfing the industry?  Bob suggests don’t overlook ‘old’ data- it’s
there for the analyzing.


How about the challenge sales organizations have in dealing with customer requests coming from the myriad of access points, including voice, smart phone, computer, referral, online, walk-in, whatever?  Can those many options be dealt with on an equal basis, promptly, predictably from omnichannel data sources?  Seems a data inundation challenge, but one that can be overcome effectively per Lucep, a global technology firm founded on the premise that data sources can be leveraged equally to serve a company’s sales needs, and respond to customers’ desires to have instant service.

Shepherd Network

As for the 5 quintillion daily IOT data points- can that volume become meaningful if a focused approach is taken by the tech provider, a perspective that can serve a previously underserved customer?   Consider unique and/or older building structures or other assets that traditionally have been sources of unexpected structural, mechanical or equipment issues.  Integrate IoT sensors within those assets, and build a risk analytics and property management system that business property owners can use to reduce maintenance and downtime costs for assets of any almost any type.  UK-basedShepherd Network has found a clever way to ‘close the valve’ on IoT data, applying monitoring, ML, and communication techniques that can provide a dynamic scorecard for a firm’s assets.

In each case the subject firms see the ocean of data, understand the
customers’ needs, and apply high-level analysis methods to the data that
becomes useful and/or actionable for the firms’ customers.  They aren’t dealing with all the crude, just
the refined parts that make sense.

In discussion I learned of Petabytes,  Exabytes, Yettabytes, and Zottabytes of data.  Unfathomable volumes of data, a universe full, all useful but inaccessible without a purpose for the data.  Data use is the disruptor, as is application of data analysis tools, and awareness of what one’s customer needs.  As Bernard Marr notes- oil is not an infinite resource, but data seemingly are.  Data volume will continue to expand but prudent firms/carriers will focus on those data that will serve their customers and the respective firm’s business plans.

Image source

Patrick Kelahan is a CX, engineering & insurance professional, working with Insurers, Attorneys & Owners. He also serves the insurance and Fintech world as the ‘Insurance Elephant’.

I have no positions or commercial relationships with the companies or people mentioned. I am not receiving compensation for this post.

Subscribe by email to join the 25,000 other Fintech leaders who read our research daily to stay ahead of the curve. Check out our advisory services (how we pay for this free original research).