The 12 Rules of DataOps to Avert a DataOops
Written by Dr. Kirk Borne
One of my first consulting assignments with my current employer began several years ago. I was part of a team advising a large organization in how to design and implement an enterprise analytics solution group for the organization’s full end-to-end business activities. Those business activities included operations, manufacturing, supply chain and logistics, and all the steps through pricing, sales and customer support, as well as analytics for internal enterprise functions. Our design sprint team consisted of data scientists (including me), data engineers, a cloud expert, a DevOps expert, business domain experts, and change management leaders.
At some point during our five-day design sprint, there was discussion about how this large organization could remain agile throughout the lifecycle of the analytics solution group: design, implementation, and operations (DIO).
Our cloud expert responded by saying that our approach would essentially be DevOps for Data. I jumped into the discussion and said, “oh, you mean DataOps!” Everyone in the room had a collective “Aha!” moment and agreed that was the right way to describe it.
And that’s when I became convinced that DataOps is key to successful projects through all DIO phases – it is the methodology that helps deliver on the promise of systems thinking: “the focus is not just on building the systems right, but also on building the right systems.”
My interest in DataOps led eventually to me co-authoring and publishing the small e-book “Ten Signs of Data Science Maturity”, which included this one: “using Agile and leveraging DataOps — DevOps for data product development.”
Since then (and even before then), there have been many definitions, interpretations, and publications that address the DataOps concept. It is more than a data operations or technology concept. It is even more than my original statement “DataOps is DevOps for data.” It is a way of doing the business of data: data understanding, insights discovery, analytic production, and actionable intelligence. It is a full-organization effort since it is now imperative that every facet of the business requires continuous “always on” trusted data at the point of need – the world is now digital; the world is now data. Latency in data products or data analytics delivery is not acceptable.
Entire conferences and handbooks have focused on the diverse meanings of DataOps. Consequently, it is beyond the scope of this brief article to summarize and review all of those. So, instead, I list here my 12 rules (“The 12 Commandments of DataOps”) to help organizations avoid a DataOops in their data analytics initiatives.
What is a DataOops?
It is not only about failed data systems or failed data product implementations. DataOops is exemplified by one or more of the three F’s that affect data-related projects. Those F’s are: Fragility, Friction, and FUD (Fear, Uncertainty, Doubt).
Fragility occurs when a built system is easily “broken” when some component is changed. These changes may include requirements drift, data drift, model drift, or concept drift. Requirements drift is a challenge in any development project, but the latter three are more apropos to data analytics and data product development activities. A system should be sufficiently agile and modular such that changes can be made with as little impact to the overall system design and operations as possible, thus keeping the project off the pathway to failure.
Friction occurs when there is resistance to change or to success somewhere in the project lifecycle or management chain. This can be overcome with small victories (MVP minimum viable products, or MLP minimum lovable products) and with instilling (i.e., encouraging and rewarding) a culture of experimentation across the organization. When people are encouraged to experiment, where small failures are acceptable (if there are lessons learned and subsequent improvements), then friction can be minimized, failure can be alleviated, and innovation can flourish.
FUD occurs when there is too much hype and “management speak” in the discussions. Whether the organization’s big data analytics, machine learning, and AI activities are driven by FOMO (fear of missing out, sparked by concerns that your competitors are outpacing your data strategy), or there is uncertainty in what the analytics advocates are talking about (a “data literacy” challenge), or there is doubt that there is real value in the data-related activities (due to a lack of quick-win MVP or MLP examples), then there is consequently a pathway to failure.
The 12 Commandments of DataOps
DataOps is agile, incremental, and continuously learns. DataOps focuses on requirements, which should be aligned with clearly stated business outcomes that align with previously stated business mission and objectives. DataOps accepts a fail-fast, learn-fast culture of experimentation. DataOps is a feature of a “learning system”. As one company once put it: “Test, or get fired!”
DataOps therefore makes an “A” grade in data systems and data product development, avoiding the three F’s.
Additional concepts have emerged in recent years that are related to DataOps. These include MLOps (for Machine Learning implementations) and AIOps (for Artificial Intelligence implementations). Our 12 rules are applicable to those concepts as well, since most of these rules are common business sense for technology implementations in enterprises. Here they are:
1. Honor business value above all other goals.
2. Begin with the end in mind: analytics-first, goal-oriented, mission-focused, and outcomes-driven, while being data-informed and technology-enabled.
3. Think strategically, but act tactically: think big, start small, learn fast.
4. Know thy data: understand what it is (formats, types, sampling, who, what, when, where, why), encourage the use of data catalogs, and enrich your datasets with searchable (semantic and content-based) metadata (labels, annotations, tags).
5. Love thy data: data are never perfect, but all the data may produce value, though not immediately. Clean it, annotate it, catalog it, and bring it into the data family (connect the dots and see what happens). For example, outliers are often dismissed as random fluctuations in data, but they may be signaling at least one of these three different types of discovery: (a) data quality problems, associated with errors in the data measurement and capture processes; (b) data processing problems, associated with errors in the data pipeline and transformation processes; or (c) surprise discovery, associated with real previously unseen novel events, behaviors, or entities arising in your data stream.
6. Do not covet thy data’s correlations: a random six-sigma event is one-in-a-million. If you have 1 trillion data points (g., a Terabyte), then there may be one million such “random events” that will tempt any decision-maker into ascribing too much significance to this natural randomness.
7. Validation is a virtue, but generalization is vital: a model may work well once, but not on the next batch of data. We must monitor for overfitting (fitting the natural variance in the data), underfitting (bias), data drift, and model drift. Over-specifying and over-engineering a model or other data analytics solution will likely not be applicable to previously unseen data or for new circumstances in which the model will be deployed. A lack of generalization is a big source of fragility and dilutes the business value of the effort.
8. Remember the data analytics “easy buttons” that enable data-to-discovery, data-to-decision, and data-to-action: Pattern Detection, Pattern Recognition, Pattern Exploration, and Pattern Exploitation.
9. Honor your data analytics first-mile and last-mile challenges: from data integration to action. The first-mile challenge (diverse data integration) comes naturally with the high-volume, high-variety, and high-velocity characteristics of big data. This challenge is exacerbated by the tendency for organizations to silo their data collections by source, by type, by organizational unit, and/or by business use case. The last-mile challenge is being able to extract actionable intelligence from all that data. These challenges can be a big source of FUD and friction.
10. Keep it agile, with short design, develop, test, release, and feedback cycles: keep it lean, and build on incremental changes.
11. Keep it Simple and Smart (“KISS” principle): create a library of composable building blocks, reusable components, and reusable business logic, including APIs, microservices, and functions-as-a-service (FaaS).
12. Test early and often: expect continuous improvement, encourage and reward a culture of experimentation, learn from failure, “Test, or get fired!” Remember that Data Science is Science!
In summary, I offer another list – a list of DataOps benefits:
- DataOps helps your business move at the speed of data – keeping pace with “data in motion” to deliver the right data, trusted data products, and actionable insights at the point of need.
- DataOps forces the focus of data activities to be on the analytic outcomes (aligned with business mission and goals) and not on the analytic inputs (“big data” hype).
- DataOps focuses on delivering value from all your data activities. Demonstrated value from even the smallest of these activities will inspire the cultural change needed for the larger implementations that will come.
- DataOps delivers value from our big data assets efficiently (time to solution) and effectively (completeness of solution).
- Adopting DataOps in a culture of experimentation is good data practice and good business.
- DataOps empowers the innovators across your organization with the mantra: “Think Big. Start Small. Learn Fast.” That’s agility with confidence.
- DataOps is a pathway to Data Brilliance, steering you clear of a DataOops.