January 24th, 2012
In the interest of full disclosure, this post is not about algebra or calculus, nor is it about financial instruments. It’s about various kinds of business research ‘raw materials’ and how to discern their quality if you are a producer or user of such research.
Whether you are navigating the waters off Tuscany in a cruise ship, or analyzing a new business opportunity, the quality of the data you use is integral to the outcome you will deliver. While the availability and use of high-quality data does not guarantee a desired outcome, the absence of it renders the desired outcome more a result of chance than of strategic intention.
Manufacturing re-thought quality
When I originally developed the KVC model, it was partly a test to see how well best practices in manufacturing could be applied to organizational intelligence. In so doing I borrowed heavily from earlier work I had done in TQM (Total Quality Management). One of TQM’s most basic principles is that in any manufacturing process, it is no longer acceptable to just get to the end of the process — the finished goods — and weed out the ones that do not meet quality standards. Instead, it is much more efficient to build quality checks into each stage of the process, reducing the number of adverse quality-related surprises at the end of the process.
Intelligence can, too
So it is with intelligence — the ‘manufacture’ of a knowledge-based product. One of the implications is that each step up the value chain tends to replicate (and in my experience, amplify) any quality shortfalls embedded in earlier stages. ‘Garbage in, garbage out’, as the expression goes — though we prefer its converse, ‘Quality in, quality out.’
Another implication is that wherever possible, in developing your source data, you go to the lowest possible level on the chain at which it’s available. If information and intelligence are the branches and leaves of the ‘knowledge tree’, then data points are the roots. You want to get to the ‘root-most’ level in order to start your analysis.
To the extent you accept at face value someone else’s processing or analysis of the data, you may save yourself some analytic time and effort — but also you leave yourself open to inheriting and building on their errors.
One of the ways this happens is when people rely on second-order ‘derivative’ work at the start of their process. This is work that ‘derives’ from another, or (in Knowledge Value Chain terms) enters at a higher level in the chain. For example, an analyst may cite a newspaper article about a new study that has been released, instead of citing the study itself. This is usually because he hasn’t taken the time to find the root source, read and analyze it, and draw his own conslusions.
The result is a data quality vulnerability — one that’s typically easy to overcome. When you see a newspaper article cited that refers to a study that someone is announcing, it is essential to (where possible) obtain the original study, read it critically, and do your own analysis and interpretation of the results.
Apples, oranges, and pineapples
I’ll give you a recent example from my casebook. We have been studying various issues related to heath care for more than a year now. Most recently, we’ve done work around the aging of the population and what market opportunities and challenges it presents. One of the unusual things about this assignment is the sheer volume of high-quality material readily available. The US government publishes lots of data — the Census Bureau, Centers for Disease Control and Prevention, and Center for Medicare and Medicaid Services were especially useful to us. In addition, many universities have centers that study health and/or aging, and there are not-for-profit groups (like the Urban Institute) that also do so.
In short, instead of a dearth of data, there are so many sources that we were still discovering new ones more than ten weeks into our study. And of course, none of them totally reconciled in terms of their target population, what they measured, how they measured it, when they measured it — we were dealing with apples, oranges, and pineapples in terms of the comparability of our data.
The case of the poisoned pie chart
At the beginning of our aging study I found one source that was especially useful, a monograph published by a leading not-for-profit research organization. Its source list was a rich trove of ‘root’ sources that we culled to build a ‘knowledge base’ for the market opportunity we were evaluating.
One of the key facts we were looking for was the breakdown among funding sources for elderly long-term care — Medicare, Medicaid, private insurance, self-pay, and so on. The monograph contained an analysis and informative pie chart, but — as I’m recommending here — we had also obtained the source material cited (in this case, National Health Expenditures data produced by the US government). We do this as a matter of course, not so much for checking on our sources, but more as a way of digging more deeply into the source material, and to develop still more sources from their citations.
In this case I could get the total figures to agree with this authoritative derivative source, but not the allocations among these sources. After spending an hour or so trying to reconcile the difference, I went back to the monograph and read the part of the text in which it explained the pie chart. There, to my amazement, I found the breakdown very similar to the one I had developed independently. As far as I can tell, there had been an error made by the person (quite possibly not the author) who created the accompanying pie chart.
If I had used only the ‘derivative’ source, I would have given an incorrect result to our client — complete with a citation to a ‘reliable’ source. Only by finding a ‘root’ source and checking against that was I able to deliver a result that met our standards of data quality.
It’s not totally clear in this case that an incorrect answer would have significantly affected our client’s decision to pursue an entry into this market. But in this case — as is often the case — the cost of higher-quality data was not much greater than incorrect data. So having the best you can get becomes a good work habit to develop.
Cheap gas in a BMW
Imagine for a second you’ve won the lottery, and are now driving a new $75,000 BMW M3. Would you put low-octane gasoline and bargain motor oil into it? Unlikely, since the value and usefulness of what you have would likely be degraded significantly by doing so.
Yet companies do this all the time — put low-quality data into a highly-tuned decision-making machine — often without knowing it. The quality of raw unprocessed data is not always immediately obvious to its end user, since by that time it has been transformed by the ‘intelligence manufacturing’ process. (It’s usually clear to me where the vulnerabilities are, but that comes with having done business research for a long time. I’ve seen most of the mistakes you can make, and made a fair number of them myself.)
I’ll give you some of the signposts we use for data quality in a future post. But one of the things that immediately starts me checking is the use of derivative sources, where the roots are available.