Can Big Data Ethics Save Us from Built-In Bias?
Think about the last show that you watched or product that you purchased. Was it one that you went to directly, or was it recommended to you by an algorithm that seemed to know exactly what you wanted? Big data ethics might give us a clue.
If we are to believe the data, then it’s likely that you followed a recommendation. In fact, a recent report1 by a leading management consulting firm confirmed that 35 percent of purchases on a well-known, global ecommerce site and 75 percent of content watched on the world’s largest streaming service resulted from recommendation systems. Written a different way, many online decisions that you thought you made independently were actually a result of subtle nudges2 by software to get you to make certain decisions.
Recommendation systems are able to sell people on related products and services, like a referral from a trusted friend.
So, how is it that software could know so much about ourselves, our family, and our friends—sometimes, even more than we may know? The answer might be more interesting than you thought. It has to do with big data ethics. To be clear, our belief is in the responsible use of data science to create personalized products and experiences that fulfill people’s needs. But, without data ethics standards, the field is at risk of using people’s personal data without their consent and influencing their behaviors.
In this article, we explore the key motivations for data practitioners. At the end, we’ll propose three sets of guiding questions to inform a big data ethics framework. Let’s start by exploring what the biggest supporters of data science have to say.
How Products, Profits, and Prestige Affect Big Data Ethics
Data scientists are joined by software engineers and researchers in AI and ML as the leading supporters of data science—whether or not they have the data ethics training to make informed decisions.
There’s no denying the popularity that has surrounded data science in recent years. It started with a well-respected business management publication declaring data science as the “sexiest job in the 21st century3”. This frenzy carried over to 2018, when a popular job search engine crowned the occupation as the “Best Job in America4”. Yet, data science is a field practiced not just by data scientists. Throw into the mix a variety of engineers and researchers who are using data from publicly available networks to recommend and create products and ads that are personalized to their users.
The end result of this hyper-personalization, in many ways, has been useful. Think of a recommended new product that you bought, a television show or movie that you watched, a video that you laughed about, or an article that you read: these all came from a recommendation engine. Smarter algorithms are not only timely, helpful, and profitable; they also are freeing up people’s time from routine, daily tasks.
While the biggest supporters of data science enjoy a comfortable degree of income, status, and job satisfaction, however, they aren’t free of fault. To help avoid any inappropriate usage of user data, with each experiment, there is a need to consider securing appropriate consent, providing transparency, and setting up ethical constraints. A standardized ethical framework should be adopted and implemented by all individuals and organizations who plan to collect, analyze, share, or experiment on their users’ aggregated data.
Guiding Questions for a Big Data Ethics Framework
Those on the frontlines—data scientists, researchers, and engineers—have to be at the center of these ethical data dilemmas. Some companies are being proactive when it comes to embedding ethics into their products. One example is Jennifer, a Technical Fellow and Managing Director at one of the world’s largest technology companies. Her research group focuses on ways to address inequity in data science. One method is to get rid of built-in biases.
As Jennifer noted in a recent data science podcast5, data scientists have “the opportunity to build algorithms with fairness, accountability, transparency and ethics, or FATE.” In her podcast episode, she gave an example of her team using a two-layer, text processing system to de-bias search engine results for programmer jobs. The result was a set of search engine results that de-coupled programmer positions with gender.
This is important because search and recommendation engines can unintentionally reinforce a person’s own opinions: people continue to be served up articles and videos that follow a fixed line of thinking. However, technologies that use this subtle manipulation have historically been met with little resistance because they nudge people towards decisions that they believe were made independently. To reverse this trend of data misuse, based on the framework and principles provided by a group of data practitioners, ethicists, and advocates6, we’ve come up with three sets of guiding questions to keep in mind. Share these with the data practitioners in your company:
Data science, as a field, is advancing faster than a litany of current legal and ethical guidelines can handle. If the high-tech industry wants to address the ethical dilemmas of big data, then each participant needs to hold data up to the rigorous standards of other scientific research and mandate a standardized code of conduct. Doing so will give people the final say in their data usage and could remove built-in bias.
Learn More About Data Ethics:
- Find out why data accessibility is the biggest issue of our time
- Learn why human intelligence should not be AI’s “Holy Grail”
- Read about the data that saved our lives