Data Collection: “Harvesting” Personalities Online

An Ethics Case Study

Irina Raicu

A recent MIT Technology Review article details the efforts of a big data analytics company named Cambridge Analytica, which claims to use behavioral science insights in helping political candidates tailor their campaign messages according to the recipient’s “personality.” “Like other big-data analysis companies,” the article notes, “it categorizes voters on the basis of demographics and issues, but it appears to be the first to add personality typing to the mix. The company says it has assessed the personalities of all 190 million registered voters in the United States.”

And how were those personalities assessed? According to the article, which is titled “How Political Candidates Know If You’re Neurotic,”

Cambridge Analytica administers… questionnaires online, promoting them using ads that promise to tell you the relative weight of your personality traits. The company says it has used these tests to “harvest” the personalities of several hundred thousand Americans. Even if you haven’t taken one of its tests, the company categorizes you by extrapolating. It concludes that you tend to be, say, agreeable or neurotic by matching statistical profiles made up of as many as 5,000 commercially or publicly available data points about you to the statistical profiles of people who actually took the personality tests and came out as agreeable or neurotic and so on. (It will not discuss the particulars of these statistical matches but says the data come from consumer database companies including Acxiom, Experian, Infogroup, and Aristotle, as well as the Republican Party’s voter file.)

Before answering the questions below, please review this article about ethical decision-making, different ethical perspectives, and the considerations that we should keep in mind when faced with ethical issues.

Is the company’s personality-“harvesting” method ethical? Why, or why not?

Should people who attempt to answer the questionnaire be advised, ahead of time, that the data collected from those questionnaires will be used to improve the targeting of political messaging?

In terms of disclosure, here’s what Cambridge Analytica’s privacy policy currently includes under the header “How will we use information about you?”: “The information we collect will be used in order to gain insight into the behavior of the whole population. We, or our research partners may contact you for direct marketing or research purposes.” Is this disclosure sufficient? Why, or why not?

Consider the process of matching the profiles of questionnaire-takers to statistical profiles of other people who don’t choose to answer such questionnaires (profiles based on “commercially or publicly available data points” about those others). Is the assessment of personalities by extrapolation ethical? Why, or why not? If you do have concerns about this practice, are they rooted in perceptions of fairness? The question of autonomy? Privacy rights? Other? (For more on “consumer database companies,” see Pro Publica’s “Everything We Know About What Data Brokers Know About You.”)

May 9, 2016

Internet Ethics Stories

Image link to full article
The AI Boyfriend Business is Booming
Irina Raicu, director, internet ethics, quoted by Axios.
Image link to full article
Falling Flat
On Communications, Creativity, and Generative AI
Our social media posts are being turned into training data for generative AI models.
Image link to full article
Cybersecurity Warnings Ahead of Paris Olympics
Irina Raicu, director, Internet ethics, quoted by NBC Bay Area.

Data Collection: “Harvesting” Personalities Online

An Ethics Case Study

On Communications, Creativity, and Generative AI