Big Impact from Small Data – Part 1
In this article, Part 1 of the latest in his series exclusive to Data Makes Possible, Dr. Kirk Borne, Principal Data Scientist for Booz Allen Hamilton, explains the importance of focusing on the value of data, which doesn’t always depend on its size, and how significant value may be derived from small facts or bits of evidence sifted from the largest big data collections.
People frequently ask me this question: “Can you tell me where big data has had great impact?”
For me, “great impact” translates to “great value”. Focus on data’s value first!
Great value can often be derived from the smallest things. We can easily overlook this seemingly paradoxical truth in the era of “big data”, where the focus is too much on big volume and not on what matters the most: big value! Big value is not necessarily dependent on the size of the data at all. I wrote about this years ago in a series of articles “When Big Data Goes Local, Small Data Gets Big”.
The same can be said about our data analytics activities – we don’t need to have a large, complex model in order to extract great insights from our data. In fact, I recently heard someone at a conference say that “90% of the value from analytics projects comes from the first 20% of the effort.” That statement is a slight, but very interesting and important modification of the usual business truism (the Pareto principle) that states that 20 percent of your efforts produce 80 percent of your rewards.
Sensors, Sensors Everywhere
What is special and unique about the big data era is that we are now putting sensors on nearly all things, all events, and all processes in our digital universe, thereby measuring, monitoring, and mining deep insights that were previously invisible to us or essentially impossible for us to infer about those things, events, and processes.
So, am I saying contradictory things – that big data volume is not important and yet big data volume is essential? Yes, and no! Big data volume represents a treasure trove of evidence into which we search for “golden nuggets” of great insights and “diamonds” with deep impact. Like most treasure, those nuggets and diamonds are rare and are thus not usually discovered by random luck, but are found through systematic, wide and deep exploration of large repositories.
More specifically, the ability for organizations to generate big value from their data does depend on the size of the data collected (i.e., on the depth and breadth of evidence and facts that you have assembled within your domain of interest). However, in many cases, a significant value may actually come from a small fact or from small bits of evidence sifted from the big data collection.
What About Zero Data?
Even the absence of evidence (zero data) for some particular event can have big consequences. This brings up another truism, from the realm of statistics that states: the absence of evidence is not the same thing as evidence of absence. The former (absence of evidence from some phenomenon) corresponds to our prior era in which we had inadequate, siloed or inaccessible data collections and thus could not answer a lot of our important business questions. The latter (evidence of absence of a specific phenomenon) might be provable only through comprehensive, complete, and thorough collection and search of massive repositories of relevant evidence.
Evidence of absence can be a very big deal in science, when a gap in a sequence of data values may reveal the presence of some deep underlying scientific process that prevents those values from ever being observed in nature.
Similarly, evidence of absence in some human-related data stream (such as marketing or sales logs) may reveal instances where certain combinations of products do not ever occur and thus these combinations should not be offered to customers, either in a recommendation engine or other marketing cross-sell campaign. These instances of mutually exclusive products are a practical example of the computational function XOR, which is true only when one or the other condition is satisfied but not both. If you discover an instance where this rule is violated, when you expect the rule to be always true, then that may be a very interesting and impactful discovery indeed!
In the second part of Big Impact from Small Data, Dr. Borne dives into specific examples of tiny details making huge differences, including customer experience in digital marketing, a NASA engineering zero data-based decision, and the small data flowing back to Earth from another NASA project 13 billion miles away! Meanwhile, take the conversation to Twitter! Agree or disagree with Kirk’s thoughts on the user, customer, or employee experience? Want to tell us your story? Tweet @KirkDBorne using the hashtag #datamakespossible right now!
Certain blog and other posts on this website may contain forward-looking statements, including statements relating to expectations in the market for our products and applications of our products. These forward-looking statements are subject to risks and uncertainties that could cause actual results to differ materially from those expressed in the forward-looking statements, including development challenges or delays, changes in markets, demand, global economic conditions and other risks and uncertainties listed in Western Digital Corporation’s most recent quarterly and annual reports filed with the Securities and Exchange Commission, to which your attention is directed. Readers are cautioned not to place undue reliance on these forward-looking statements and we undertake no obligation to update these forward-looking statements to reflect subsequent events or circumstances.