‘Big data’ wants to choke your organization. Don’t let it happen.
What is big data?
The McKinsey Global Institute report Big data: The next frontier for innovation, competition, and productivity (May 2011) is a relatively well-informed and hype-free description of the opportunities presented by Big Data. They define Big Data as “datasets whose size is beyond the ability of typical database software to capture, store, manage, and analyze.”
It’s important to note that they define it in terms of a functional capability (the last five words of their definition), rather than any absolute in terms of dataset size. If we can manage and analyze it, by their definition it’s no longer ‘big’.
I’d argue that McKinsey’s definition does not go far enough, and that we should actually be concerned with doing something value-productive with the resulting analysis — an argument I’ll develop further below.
How big data works
What we used to call ‘reality’ has now morphed from ‘analog reality’ to ‘digital reality’. Nearly everything has digital sensors on it — all this data goes into one big pool — it’s similar to what we used to call data warehousing, except this time on steroids.
Then various metrics are captured and analyzed for correlation. When certain factors move together with a certain level of statistical reliability, they are said to be ‘correlated’. It’s tempting to assume they are also related in some causal way — but this is often premature at best. What is needed in addition — and it’s a big methodological step — is a mechanism of causation.
Big Data is essentially a data-up approach. The economic business case for it rests on the fact that data collection and storage have become relatively inexpensive — often wholly automated through collection sensors, telecommunications, servers, and so on. The rationale continues that if it incurs little cost to collect, it’s best to collect it just in case — even if the reason for doing so is initially unclear or non-existent.
The hidden fallacy
This rationale behind Big Data sounds reasonable enough — but there’s a catch, a big one. The central problem with this logic is that it neglects to price in (i.e., it treats as ‘free’) the most costly and rare resource in the ‘knowledge value chain’ — the human attention and processing needed to convert the analyzed information into decisions and actions. Without that (so the KVC model says) there is little possibility of creating value, however that’s defined by your organization.
If the human processing element were built into the equation, the ROI would look much different. Then it would become clear that just because you CAN gather some data, does not necessarily mean you SHOULD from a cost-effectiveness point of view.