Data can be used to drive decisions and make an impact at scale. Yet, this powerful resource comes with challenges. How can organizations ethically collect, store, and use data? What rights must be upheld? The field of data ethics explores these questions and offers five guiding principles for business professionals who handle data.

What Is Data Ethics?

Data ethics encompasses the moral obligations of gathering, protecting, and using personally identifiable information and how it affects individuals.

“Data ethics asks, ‘Is this the right thing to do?’ and ‘Can we do better?’” Harvard Professor Dustin Tingley explains in the online course Data Science Principles.

Data ethics are of the utmost concern to analysts, data scientists, and information technology professionals. Anyone who handles data, however, must be well-versed in its basic principles.

For instance, your company may collect and store data about customers’ journeys from the first time they submit their email address on your website to the fifth time they purchase your product. If you’re a digital marketer, you likely interact with this data daily.

While you may not be the person responsible for implementing tracking code, managing a database, or writing and training a machine-learning algorithm, understanding data ethics can allow you to catch any instances of unethical data collection, storage, or use. By doing so, you can protect your customers' safety and save your organization from legal issues.

Here are five principles of data ethics to apply at your organization.

Data Science Ready - Prepare for a Data-Driven World

5 Principles of Data Ethics for Business Professionals

1. Ownership

The first principle of data ethics is that an individual has ownership over their personal information. Just as it’s considered stealing to take an item that doesn’t belong to you, it’s unlawful and unethical to collect someone’s personal data without their consent.

Some common ways you can obtain consent are through signed written agreements, digital privacy policies that ask users to agree to a company’s terms and conditions, and pop-ups with checkboxes that permit websites to track users’ online behavior with cookies. Never assume a customer is OK with you collecting their data; always ask for permission to avoid ethical and legal dilemmas.

2. Transparency

In addition to owning their personal information, data subjects have a right to know how you plan to collect, store, and use it. When gathering data, exercise transparency.

For instance, imagine your company has decided to implement an algorithm to personalize the website experience based on individuals’ buying habits and site behavior. You should write a policy explaining that cookies are used to track users’ behavior and that the data collected will be stored in a secure database and train an algorithm that provides a personalized website experience. It’s a user’s right to have access to this information so they can decide to accept your site’s cookies or decline them.

Withholding or lying about your company’s methods or intentions is deception and both unlawful and unfair to your data subjects.

3. Privacy

Another ethical responsibility that comes with handling data is ensuring data subjects’ privacy. Even if a customer gives your company consent to collect, store, and analyze their personally identifiable information (PII), that doesn’t mean they want it publicly available.

PII is any information linked to an individual’s identity. Some examples of PII include:

  • Full name
  • Birthdate
  • Street address
  • Phone number
  • Social Security card
  • Credit card information
  • Bank account number
  • Passport number

To protect individuals’ privacy, ensure you’re storing data in a secure database so it doesn’t end up in the wrong hands. Data security methods that help protect privacy include dual-authentication password protection and file encryption.

For professionals who regularly handle and analyze sensitive data, mistakes can still be made. One way to prevent slip-ups is by de-identifying a dataset. A dataset is de-identified when all pieces of PII are removed, leaving only anonymous data. This enables analysts to find relationships between variables of interest without attaching specific data points to individual identities.

Related: Data Privacy: 4 Things Every Business Professional Should Know

4. Intention

When discussing any branch of ethics, intentions matter. Before collecting data, ask yourself why you need it, what you’ll gain from it, and what changes you’ll be able to make after analysis. If your intention is to hurt others, profit from your subjects’ weaknesses, or any other malicious goal, it’s not ethical to collect their data.

When your intentions are good—for instance, collecting data to gain an understanding of women’s healthcare experiences so you can create an app to address a pressing need—you should still assess your intention behind the collection of each piece of data.

Are there certain data points that don’t apply to the problem at hand? For instance, is it necessary to ask if the participants struggle with their mental health? This data could be sensitive, so collecting it when it’s unnecessary isn’t ethical. Strive to collect the minimum viable amount of data, so you’re taking as little as possible from your subjects while making a difference.

Related: 5 Applications of Data Analytics in Health Care

5. Outcomes

Even when intentions are good, the outcome of data analysis can cause inadvertent harm to individuals or groups of people. This is called a disparate impact, which is outlined in the Civil Rights Act as unlawful.

In Data Science Principles, Harvard Professor Latanya Sweeney provides an example of disparate impact. When Sweeney searched for her name online, an advertisement came up that read, “Latanya Sweeney, Arrested?” She had not been arrested, so this was strange.

“What names, if you search them, come up with arrest ads?” Sweeney asks in the course. “What I found was that if your name was given more often to a Black baby than to a white baby, your name was 80 percent more likely get an ad saying you had been arrested.”

It’s not clear from this example whether the disparate impact was intentional or a result of unintentional bias in an algorithm. Either way, it has the potential to do real damage that disproportionately impacts a specific group of people.

Unfortunately, you can’t know for certain the impact your data analysis will have until it’s complete. By considering this question beforehand, you can catch any potential occurrences of disparate impact.

Ethical Use of Algorithms

If your role includes writing, training, or handling machine-learning algorithms, consider how they could potentially violate any of the five key data ethics principles.

Because algorithms are written by humans, bias may be intentionally or unintentionally present. Biased algorithms can cause serious harm to people. In Data Science Principles, Sweeny outlines the following ways bias can creep into your algorithms:

  • Training: Because machine-learning algorithms learn based on the data they’re trained with, an unrepresentative dataset can cause your algorithm to favor some outcomes over others.
  • Code: Although any bias present in your algorithm is hopefully unintentional, don’t rule out the possibility that it was written specifically to produce biased results.
  • Feedback: Algorithms also learn from users’ feedback. As such, they can be influenced by biased feedback. For instance, a job search platform may use an algorithm to recommend roles to candidates. If hiring managers consistently select white male candidates for specific roles, the algorithm will learn and adjust and only provide job listings to white male candidates in the future. The algorithm learns that when it provides the listing to people with certain attributes, it’s “correct” more often, which leads to an increase in that behavior.

“No algorithm or team is perfect, but it’s important to strive for the best,” Tingley says in Data Science Principles. “Using human evaluators at every step of the data science process, making sure training data is truly representative of the populations who will be affected by the algorithm, and engaging stakeholders and other data scientists with diverse backgrounds can help make better algorithms for a brighter future.”

Become a data-driven leader | Explore Our Certificate Courses

Using Data for Good

While the ethical use of data is an everyday effort, knowing that your data subjects’ safety and rights are intact is worth the work. When handled ethically, data can enable you to make decisions and drive meaningful change at your organization and in the world.

Want to learn more about the ethical underpinnings of data privacy? Explore the four-week online course Data Science Principles or our other online analytics courses.

Catherine Cote

About the Author

Catherine Cote is a marketing coordinator at Harvard Business School Online. Prior to joining HBS Online, she worked at an early-stage SaaS startup where she found her passion for writing content, and at a digital consulting agency, where she specialized in SEO. Catherine holds a B.A. from Holy Cross, where she studied psychology, education, and Mandarin Chinese. When not at work, you can find her hiking, performing or watching theatre, or hunting for the best burger in Boston.