Written by Dr. Kirk Borne
Dr. Kirk Borne is a data scientist, astrophysicist and visionary. Here at Data Makes Possible, Kirk shares his vast experience in scientific data mining, informatics and statistical science research. Just like Western Digital is helping to drive the future through our technologies, Kirk is helping businesses learn how data can enable new frontiers in many ways. Read more in his ongoing series here.
I am a big fan of the double entendre as a figure of speech: specifically, a word or phrase that has at least two different interpretations. This can induce your audience to think more deeply about what you’re saying and what your words really mean, and they are often ironic or humorous – which appeals to me also, as a wannabe comedian. Among my favorite examples is this one: “the wheels of progress are not turned by cranks.” (Think about it.) I came up with this one years ago: “the careers of many archaeologists lie in ruins.”
Another example is a phrase used by data scientists, “Data matters”. The phrase reflects either a statement – “Data” is a noun and “matters” a verb – or it simply indicates the topic of discussion, with the adjective “Data” specifying the type of “matters” I will write about. The truth is, both interpretations are applicable, which is usually the case with a double entendre. Which is why they’re so much fun! Data is also fun. (Or should I say, “Data are…”?)
Data can truly transform an enterprise in many ways: new opportunity identification, insights discovery, and business innovations, especially for services and products.
Data can be used to build descriptive models (hindsight), or diagnostic models (oversight), or correlation-based predictive models (foresight), or causal prescriptive models (insight). In this way, data does matter – it carries significant consequence for an organization! Data is transformative and a powerful source of value creation in nearly all industries.
There are many contexts in which we handle and propagate data through the enterprise. These contexts include data generation, data collection, data engineering, data management, data governance, data ethics, data provenance, data discovery, data access, data analytics, data science, data labeling, data wrangling, DataOps, and data-fueled implementations (including machine learning and artificial intelligence). In this meaning of the phrase, those contexts are different “data matters” that must be addressed in modern data-drenched organizations, from top to bottom, and from the front office to the back office. But data matters include more than these technological contexts. Data matters also include workforce re-skilling, data and analytics strategies, development of a culture of data-sharing and experimentation with data, data literacy, and data- and evidence-based decision-making.
If we adopt the convention that the word “Data” is a plural noun, then the phrase from earlier in this article would become “Data matter”, which is then open to another meaning. This additional interpretation of “Data matter” would correspond to data fragments or data artifacts (i.e., data content). Such artifacts include data files, images, videos, speech, audio clips, documents, social network posts, web clicks, purchase transactions, network logs, sensor readings, database tables, time series, graphs, maps, metadata, data dictionaries, data versioning histories, data catalogs, and even smells (odors, which can be measured and quantified – yes, we will soon have digital representation of smells that can be categorized, classified, and even generated by machine learning algorithms!). All these data fragments and artifacts become the ingredients and fuel for our AI, machine learning, and analytics implementations.
The “sweet spot” where all these different meanings converge is in the real-world use cases where we use data fragments (content) to produce results (within different contexts) that matter to the organization (i.e., have significant consequence).
Content, context, and consequence – those are the essence of data matters.
A Practical Example: Sustainability Goals
We illustrate these ideas with a practical example: the global Sustainable Development Goals (SDGs). The 17 SDGs represent a worldwide “call to action” to address global, societal, and human challenges, including poverty, hunger, health, equality, education, economic growth, clean energy, clean water, climate action, and more. These goals definitely have significant consequence for our planet and for all of us, and there is a lot of context in those 17 goals. But there is also a lot of potential for different data artifacts (content), specifically within the SDG Indicators Framework.
The 230+ unique indicators that comprise the SDG Indicators Framework are effectively KPIs (Key Performance Indicators) for our world – these indicators are associated with specific SDGs, though some indicators are shared across more than one SDG. The indicators are statistical measures that monitor, track, and inform our progress toward meeting the 17 SDGs. These measures are data, collected by organizations and governments. These data give us power to know the unknowable: what is the health of our planet and its inhabitants on a global scale? The internet of things (IoT) promises to put sensors on many things, processes, and products. These sensors represent one mechanism for collecting those statistical measures, such as water quality, clean energy usage, equality, education, hunger, poverty, recycling, sustainable transportation, and climate change. Every organization, business, and government agency may be held accountable for proving their compliance with the SDGs, or at least they may be required to measure and report their performance against the global KPIs that are relevant to their organizational activities.
With the enormous data generated from countless sources related to the 230+ SDG indicators, data scientists worldwide can then perform many analytics functions:
(a) Descriptive Analytics – What has happened? Did it co-occur with something else (perhaps with a time shift)?
(b) Diagnostic Analytics – What is happening? Is it co-occurring with something else (perhaps with a time shift)?
(c) Predictive Analytics – What will happen? What associated events or consequences will occur with it or are correlated with it?
(d) Prescriptive Analytics – What can we do to change the results of our actions, to optimize the outcomes, or to improve the consequences? What are the causal factors that influence those outcomes, and which of those causal factors do we control and thus can act on?
Data Has Value, Perhaps Now More than Ever
With this approach to data matters, we can improve our world, our organizations, and our lives. As I write this article, the world is enduring a major crisis – the COVID-19 virus outbreak of 2020. Where can we find data that provides answers to our critical questions during this pandemic? How can we control this? How can we cure it? How can we vaccinate against it? How can we protect those who have it or who are exposed to it (such as healthcare providers and caregivers)? What are the principal causal factors for the infection spreading? And so on.
We want data content that matters within contexts that have major consequence! That desire extends to less global concerns than those mentioned above, though nonetheless these lesser concerns are still consequential, critical, and significant for your organization. What does “data value” mean for you in terms of content, context, and consequence? Message me on Twitter @KirkDBorne and use the hashtag #datamakespossible now!