The Rule of Least Power in Data Analytics – Part 1
Written by Dr. Kirk Borne
I recently learned about the rule of least power in computer programming. The principle expresses the notion that one should use the least powerful programming language to code a given task, while still meeting the business requirements. It occurred to me that a similar concept can apply to data analytics tasks, specifically in data science modeling using machine learning algorithms. In this post, this “doctor of analytics” prescribes this approach for your next analytics project.
We have often defined analytics as the products of data science activities that apply machine learning and statistical algorithms on data in order to achieve one or more business requirements. The focus of analytics is therefore on outcomes. Consequently, with an analytics-first strategy, a required task is often stated in terms of its business outcomes. The task statement does not include — and should not include — the specific approach that the data science team should take. The approach should be left up to the data scientists. That includes their choice of machine learning algorithms as well as their preferred choice of programming language.
The Client is Always Right (Meeting Desired Business Outcomes)
Of course, the situation is different in external client engagements where the client may choose to specify anything that they wish. In order to distinguish such external client task specifications from a task request that is within your own organization, I shall refer to the latter as the “task owner”.
The rule of least power in data analytics thus gives “permission” to the data science team to meet the task owner’s desired business outcomes in the most efficient and effective manner that is consistent with the specified requirements. The analytics are “efficient” in the sense of optimizing the “time to solution” and “effective” in the sense of optimizing the “completeness of the solution” (i.e., achieving the desired outcomes).
When investigating the “rule of least power” concept, I encountered two additional concepts from computer programming: declarative versus imperative programming. Declarative tells you what the desired outcome of a task is, whereas imperative tells you how to do the task. “What” versus “how” is the fundamental distinction.
For example, if the task is “give me a breakfast of ham and eggs”, that request is declarative. Conversely, if I give you the specific recipe of ingredients to use and the preparation steps that you must follow, that is being imperative. A data-related example is the SQL language for database queries — this is a declarative language because the query specifies what the desired outcome is. The SQL language itself says nothing about how the database engine optimizes the search, computes the joins, or aggregates the results — that would be imperative and would also be a waste of the end-user’s time, skills, and talents.
The Hubble Space Telescope
This reminds me of my first job after graduate school and fellowships, when I worked on the Hubble Space Telescope project. I was tasked with generating reports from the various science databases, to answer a variety of questions. The questions might include how many scientists wanted to do a certain type of science with the telescope, or how much time was a specific instrument on the telescope being used, or how many telescope usage requests from the user community were duplicates of the same thing. I was able to learn SQL and deliver answers efficiently and effectively for these and many other questions to the science managers on the project.
I eventually received an employee achievement award several years later for my expertise in doing this. In all that time, I never once learned how the database searches were optimized, how the joins were computed, or how the results were aggregated. That was a real blessing, since I was interested in the outcomes (the science productivity of the telescope), not in the database engineering infrastructure. I was very happy (and grateful) to leave the latter to the database experts, who made my job so much easier.
My Rookie Mistake in Data Analytics
Of course, SQL and other database query languages are designed to be declarative. The database engineers build systems that satisfy the requirements of their end-users, whose primary duties include finding answers to business questions from the data — that’s a focus on outcomes.
Data analytics is therefore like traditional database activities — declarative! The task owner’s requirements (desired outcomes) are specified, and the data science team focuses on delivering efficient, effective, and accurate models from data, using the right model in the right context, to answer business questions and to improve business outcomes. There is no “how” in the task owner’s request, only the “what”.
Occasionally, these worlds do collide. Telling the developer how to perform their task is not only a violation of the rule of least power but it is also an example of how too much knowledge can be a dangerous thing. But the opposite can also be true.
For example, when I was first learning how to run those science database reports, I learned the amazing benefits of database joins. I wanted to experiment with these joins as much as possible. In this case, a little knowledge became a dangerous thing when I made the mistake of submitting a very large multi-table join query. I didn’t know how that could be a problem. Soon after my request was submitted, the database administrators called my office to explain to me how that could be a problem. They asked what I was trying to do (i.e., an “outcomes” question), while explaining how my query was likely to take several hours to complete and how it was slowing down other more critical database applications (i.e., the “how” consequences). I learned that my requests needed to be smarter—hence, more knowledgeable of the “how” of the database tasks, in order to avoid similar bottlenecks and inefficiencies in the future. That was a valuable lesson for me, the database newbie.
XAI – Explainable Artificial Intelligence
So, the rule of least power does not mean that the “what” and the “how” are mutually independent. Rather, it means that they are interdependent — that they support one another. In data analytics, this means that sometimes the algorithms must be transparent, and the analytics outcomes must be explainable. That’s the “how”. In artificial intelligence applications, this is called “XAI” (Explainable AI). Hence, the data science team’s secret sauce might be secret in the sense of being proprietary, but it does not mean secret in the sense that nobody can ever know what you did.
The data science team’s secret sauce might be secret in the sense of being proprietary, but it does not mean secret in the sense that nobody can ever know what you did.
Quantum Units of Analytics
One of the instantiations of the rule of least power is modularity. This means that small, modular components are developed to satisfy a quantum of work — the smallest units of work in a task. These modules are then combined to complete the full mission requirements of a task, which could be complex and diverse. Ideally, the modules are reusable and stackable. The developer (data scientist, in our analytics case) then composes a workflow from the essential modules, reusing as much as possible and reinventing as little as possible. If the outcomes need improving, at the request of the task owner, then it is easier to swap out and replace modular components than to rebuild a whole system.
In Part 2 of Kirk Borne’s “The Rule of Least Power”, we’ll learn about four detailed examples of how the rule applies to specific real-world businesses and customers.