Don’t Hate the Data, Hate the Bias
If you were to slip on a banana peel that someone in front of you dropped, would you blame the peel?
Of course not. It’s an inanimate object that had no control over how it was wielded. That same banana peel could have been composted—fertilizing new life—instead of dropped at your feet.
And what about the person who dropped it? Perhaps there was a reason that compelled them to drop it instead of throwing it away. The peel, like data, could have been used in many different ways, all affecting different outcomes
Bi•as (bīəs): From the French biais, “a slant, a slope, an oblique.” Often used with negative connotation for a deviant path.
Data bias is a concern for many right now. Everything from climate change, to personal data collected by apps, to DNA test results are being questioned for how the information is being used to tell a given story. With more and more data being produced every day, it’s important that each of us understands the possibilities that data can enable. It’s also important to understand the true risks of leveraging incomplete or skewed information—thus creating a data bias scenario.
Data at its best.
The more the merrier when it comes to leveraging data. Like a giant, complicated puzzle, there’s just no way to see the full picture unless you have all the pieces. Understanding as many factors about a given situation through data will help generate the most insightful findings.
Recently, a proposal for electrifying Mexico City’s bus system was developed as part of a data innovation challenge that granted secure access of privately held datasets to researchers. By layering the private data onto a number of publicly available datasets for a fuller picture, the INECC/UC Berkeley research team was able to develop a much more complete picture of what was causing Mexico City’s pollution and propose a feasible solution that could significantly impact pollution levels in the city.
Pharma is another area where a complete picture is vital. When developing new therapies or drugs, ensuring all groups are represented helps researchers understand the effectiveness or dangers of treatments.
What happens when data is incomplete?
When there are holes, or missing pieces, bias can skew the information we’re looking at; this can skew the outcomes as well.
For example, say a new therapy is released to the public after years of data-driven studies culminating in clinical trials, but it was later found that a certain ethnic group was under-represented. And it just so happens that that group carries a dangerous genetic sensitivity to an element of the treatment. If the group had been equally represented in the studies, the anomaly could have been discovered and perhaps the treatment could have been altered to better suit all populations.
Policy is another area where complete data is vital. When specific populations are under-represented (or absent) from data used to identify social needs and create policy to address them, the policies may not be as effective, and the populations most in need may not be reached by the solutions.
Can the bias occur in the data?
If influenced at the collection point, the data itself can become skewed. For example, China’s launch of a social rating system has the potential to be incredibly insightful, helping the government understand and encourage positive behavior change around issues such as fiscal responsibility and sustainability practices. But if individuals are aware that certain actions are being tracked, will they modify their behaviors to adhere to the expected? In this example, the act of data collection itself can bias the information being collected.
Data, like our banana peel, is in itself, benign. It’s in how we collect, pair, analyze and ultimately wield the data that a bias can manifest. And of course, we know that if bias is used purposefully, to further a malicious agenda, data can even become a kind of weapon.
Love the Data, Know the Bias
Data is such a powerful tool for our world. When it can thrive, we all thrive. The breakthroughs and enhancements it’s affording are unparalleled in human history—and we are still only at the beginning.
Embracing the data for its potential requires understanding of how bias affects the information, and working to diminish or eliminate them. Data for social good initiatives work to unlock sources and create a more complete picture for humanitarian solutions. Population health partnerships are forming across the world, seeking to aggregate data from all types of people to ensure smarter, more informed treatment decisions.
For data scientists, helpful articles on data bias have been published that outline key types and how you can address them in your research.