Wed, May 09, 2018 - Page 9 News List

How Cambridge Analytica turned data into votes

By combining Facebook data with voter and consumer information, the firm claimed to be able to provide 253 predictions of behavior that could be used to craft targeted advertisements

By Alex Hern  /  The Observer

Illustration: Mountain people

How do 87 million records scraped from Facebook become an advertising campaign that could help swing an election? What does gathering that much data actually involve? And what does that data tell us about ourselves?

The Cambridge Analytica scandal has raised question after question, but for many, the technological unique selling point of the company, which last week announced that it was closing its operations, remains a mystery.

For those 87 million people probably wondering what was actually done with their data, I went back to Christopher Wylie, the ex-Cambridge Analytica employee who blew the whistle on the company’s problematic operations in the Observer.

According to Wylie, all you need to know is a little bit about data science, a little bit about bored rich women and a little bit about human psychology.

Step one, he said over the phone as he scrambled to catch a train: “When you’re building an algorithm, you first need to create a training set.”

That is: No matter what you want to use fancy data science to discover, you first need to gather the old-fashioned way. Before you can use Facebook likes to predict a person’s psychological profile, you need to get a few hundred thousand people to do a 120-question personality quiz.

The “training set” refers to that data in its entirety: the Facebook likes, the personality tests and everything else you want to learn from.

Most important, it needs to contain your “feature set,” which is the “underlying data that you want to make predictions on,” Wylie said. “In this case, it’s Facebook data, but it could be, for example, text, like natural language, or it could be clickstream data,” the complete record of your browsing activity on the Web.

“Those are all the features that you want to [use to] predict,” he added.

At the other end, you need your “target variables” — in Wylie’s words: “The things that you’re trying to predict for. So in this case, personality traits or political orientation, or what have you.”

If you are trying to use one thing to predict another, it helps if you can look at both at the same time.

“If you want to know the relationships between Facebook likes in your feature set and personality traits as your target variables, you need to see both,” Wylie said.


Facebook data, which lie at the heart of the Cambridge Analytica story, is a fairly plentiful resource in the data science world — and certainly was back in 2014, when Wylie first started working in this area.

Personality traits are much harder to get hold of: Despite what the proliferation of BuzzFeed quizzes might suggest, it takes quite a lot to persuade someone to fill in a 120-question survey — the length of the short version of one of the standard psychological surveys, the International Personality Item Pool-NEO.

However, Wylie said that “quite a lot” is relative.

“For some people, the incentive to take a survey is financial. If you’re a student or looking for work, or just want to make US$5, that’s an incentive,” he said.

The actual money handed over “ranged from US$2 to US$4,” while the larger payments went to “groups that were harder to get,” he added.

The group least likely to take a survey, and so earning the most from it, were African-American men.

“Other people take surveys just because they find it interesting, or they are bored. So we over-sampled wealthy white women, because if you live in the Hamptons and have nothing to do in the afternoon, you fill out consumer research surveys,” Wylie said.

This story has been viewed 3951 times.

Comments will be moderated. Remarks containing abusive and obscene language, personal attacks of any kind or promotion will be removed and the user banned.

TOP top