Netflix Dataset Cracked, Subscribers Profiled

November 27, 2007

0 views

Netflix offered a million dollar reward to anyone who could improve upon their recommendation engine by ten percent. Two researchers accomplished a lot more with the "anonymized" dataset. The The physics arXiv blog noted Netflix claimed to have removed personal details from the dataset before making it available. However, Arvind Narayanan and Vitaly Shmatikov at the the University of Texas at Austin figured out how to de-anonymize that data.

The Through their algorithmic work, the researchers could tie information in the Netflix dataset with recommendations made on the Internet Movie Database website:

We expect that for Netflix subscribers who use IMDb, there is a strong correlation between their private Netflix ratings and their public IMDb ratings. Note that our attack does not require that all movies rated by the subscriber in the Netflix system be also rated in IMDb, or vice versa. In many cases, even a handful of movies that are rated by the subscriber in both services would be sufficient to identify his or her record in the Netflix Prize dataset...

Briefly, people who rated movies publicly around the same time they rated those movies privately gave the researchers enough data to figure out details about one person. "A natural question to ask is why would someone who rates movies on IMDb - often under his or her real name - care about privacy of his movie ratings?" the researchers asked.

"Consider the information that we have been able to deduce by locating one of these users