Data Resource

Netflix Prize:

An open competition launched by Netflix, a DVD-rental service company. In this competition, participants were provided with data sets containing users' previous ratings to films, and were required to compete for the best collaborative filtering algorithms which could best predict users' future ratings for those films.

In this competition, two data sets were provided:

training data: containing 100,480,507 ratings from 480,189 users to 17,770 movies, with the format of <user, movie, date of grade, grade>.

qualifying data: containing 2,817,131 ratings with the format of <user, movie, date of grade>.

The team “BellKor’s Pragmatic Chaos” won the $1M Grand Prize.

CANT Competition:

An open competition launched by University of Rhode Island, USA, and Beijing University, China, to collect real user attack data against online reputation systems. In this competition, participants were provided by a normal rating data set collected from a famous e-commerce website, Douban, in China, and were required to provide dishonest ratings to downgrade the reputation score of a certain product. The participant team who downgrades the reputation score most won the competition.

In this competition, normal rating data, containing ratings from 300 users to 300 products during 150 days, was provided in the format <rating date, user ID, product ID, rating score>. Each attack profile is a file submitted by a participant, which contained multiple dishonest ratings in the format <rating date, user ID, product ID, rating score>.

This competition was conducted offline by building up a virtual reputation system. All the submitted attack profiles were only run in the virtual reputation system, so that this competition would not interfere the product reputation scores in real reputation system in practice.

Epinions Data:

There are two different versions of the Epinions data set.

Downloaded Epinions Dataset: The dataset was collected byPaolo Massa in a 5-week crawl (November/December 2003) from the Web site.

Extended Epinions Dataset: This dataset was given directly by Epinions staff to Paolo Massa. As a consequence, the dataset contains also the distrust lists (which users are distrusted by which users) that is not shown on the site but kept private.

User Mobile App Installation Data collected by MIT MediaLab:

This data set, collected from March to July 2010, recorded the installations of 821 apps from 55 participants who were residents living in a graduate student residency of a major US university. In this data set, the following information has been collected.

App related information, such as app name, prices, ratings and global download number.

Users' app installation information (i.e. which user installed which app at what time).

Call log and bluetooth hits information. During the data collection period, each participant was given an Android-based cell phone with a built-in sensing software to capture all call logs and bluetooth hits among the given phones. Call logs were used to indicate participants' interactions through phone calls. Bluetooth hits recorded participants' face-to-face interactions, during which the phones were within each other's vicinity. These two types of information described participants' daily interactions.

Users' friendship, affiliation and race information was also collected through a survey. In the survey, each participant provided his/her affiliation and race, and rated his/her friendship relationship to other participants. Such information reflected more about participants' long term relationship.


A platform for predictive modelling and analytics competitions. Companies and researchers post their data. Statisticians and data miners from all over the world compete to produce the best models.

Some examples of the competitions are:

Facebook Recruiting Competition: The challenge is to recommend missing links in a social network. Participants are presented with an external anonymized, directed social graph from which some edges have been deleted, and asked to make ranked predictions for each user in the test set of which other users they would want to follow.

CPROD1: Consumer PRODucts contest #1: The goal of this competition is to determine the state-of-the-art methods to automatically identify all mentions of consumer products in a largely user generated collection of web-content, and to correctly identify the product(s) that each product mention refers to from a large catalog of products. The datasets provided includes hundreds of thousands of text items, a product catalog with over fifteen million products, and hundreds of manually annotated product mentions to support data-driven approaches.

Detecting Insults in Social Commentary: This competition is to predict whether a comment posted during a public discussion is considered insulting to one of the participants.

Job Recommendation Engine Challenge: This Challenge, sponsored by, asks participants to predict what jobs users will apply to based on their previous applications, demographic information, and work history. The insights discovered from this data, and the algorithms the winners create, will allow CareerBuilder to improve its job recommendation algorithm, a core part of its website and a key element in improving user experience.

Related Papers or Online Articles

Y. Sun, Y. Liu, “Security of Online Reputation Systems: Evolution of Attacks and Defenses”, Signal Processing Magazine, Special Issue On Signal and Information Processing for Social Learning and Networking, Vol. 29, No. 2, pp. 87-97, March 2012.

Y. Liu, Y. Sun, T. Yu, “Defending Multiple-user-multiple-target Attacks in Online Reputation Systems”, submitted to IEEE International Conference on Social Computing, IEEE International Conference on Social Computing (SocialCom2011), Oct. 2011.

Y. Liu and Y. Sun, “Anomaly detection in feedback-based reputation systems through temporal and correlation analysis,” in Proc. of 2nd IEEE Int. Conf. on Social Computing, Aug 2010.

Z.Malik and A. Bouguettaya, “Reputation bootstrapping for trust establishment among web services,” IEEE Internet Computing, vol. 13, no. 1, pp. 40–47, 2009.

Y. Yang, Y. L. Sun, S. Kay, and Q. Yang, “Defending online reputation systems against collaborative unfair raters through signal modeling and trust,” in Proc. of the 24th ACM Symposium on Applied Computing, Mar 2009.

Y. Yang, Q. Feng, Y. Sun, and Y. Dai, “Reputation trap: An powerful attack on reputation system of file sharing p2p environment,” in the 4th Int. Conf. on Security and Privacy in Communication Networks, 2008.

Y. Sun, Z. Han, and K. J. R. Liu, “Defense of trust management vulnerabilities in distributed networks,” IEEE Communications Magazine, vol. 46, no. 2, pp. 112–119, Feb 2008.

K. Hoffman, D. Zage, and C. Nita-Rotaru, A survey of attack and defense techniques for reputation systems, technical report, Purdue Univ., 2007.

A. Joang, R. Ismail, and C. Boyd, “A survey of trust and reputation systems for online service provision,” Decision Support Systems, vol. 43, no. 2, pp. 618–644, Mar 2007.

J. Weng, C. Miao, and A. Goh, “An entropy-based approach to protecting rating systems from unfair testimonies,” IEICE TRANSACTIONS on Information and Systems, vol. E89–D, no. 9, pp. 2502–2511, Sept 2006.

D. Houser and J. Wooders, “Reputation in auctions: Theory, and evidence from ebay,” Journal of Economics and Management Strategy, vol. 15, pp. 353–369, June 2006.

J. Brown and J. Morgan, “Reputation in online auctions: The market for trust,” California Management Review, vol. 49, no. 1, pp. 61–81, 2006.

P. Laureti, L. Moret, Y.-C. Zhang, and Y.-K. Yu, “Information filtering via iterative refinement,” in Europhysics Letters, vol. 75, no. 6, 2006, pp. 1006–1012.

Y. L. Sun, Z. Han, W. Yu, and K. Liu, “Attacks on trust evaluation in distributed networks,” in Proc. of the 40th annual Conf. on Information Science and Systems (CISS), Princeton, NJ, March 2006.

J. Zhang and R. Cohen, “A personalized approach to address unfair ratings in multiagent reputation systems,” in Proc. of the Fifth Int. Joint Conf. on Autonomous Agents and Multiagent Systems (AAMAS) Workshop on Trust in Agent Societies, 2006.

Massa, P., & Avesani, P. (2006). Trust-aware bootstrapping of recommender systems. Proceedings of ECAI 2006 Workshop on Recommender Systems (pp. 29-33).

M. Srivatsa, L. Xiong, and L. Liu, “Trustguard: countering vulnerabilities in reputation management for decentralized overlay networks,” in Proc. of the 14th Int. Conf. on World Wide Web, May 2005.

M. Feldman, C. Papadimitriou, J. Chuang, and I. Stoica, “Free-riding and whitewashing in peer-to-peer systems,” in 3rd AnnualWorkshop on Economics and Information Security (WEIS2004), May.

A. Harmon, Amazon glitch unmasks war of reviewers, the New York Times, February 14, 2004.

C. Dellarocas, “The digitization of word-of-mouth: Promise and challenges of online reputation systems,” Management Science, vol. 49, no. 10, pp. 1407–1424, October 2003.

D. Cosley, S. Lam, I. Albert, J. Konstan, and J. Riedl, “Is seeing believing? how recommender systems influence users opinions,” in Proc. of CHI 2003 Conf. on Human Factors in Computing Systems, Fort Lauderdale, FL, 2003, pp. 585–592.

A. Josang and R. Ismail, “The beta reputation system,” in Proc. of the 15th Bled Electronic Commerce Conf., 2002.

J. R. Douceur, “The sybil attack,” in in Proc. for the 1st Int. Workshop on Peer-to-Peer Systems (IPTPS), Springer Berlin / Heidelberg, March 2002, pp. 251–260.

B. Yu and M. Singh, “An evidential model of distributed reputation management,” in Proc. of the Joint Int. Conf. on Autonomous Agents and Multiagent Systems, pp. 294-301 2002.

J. Sabater and C. Sierra, “Social regret, a reputation model based on social relations,” SIGecom Exchanges, vol. 3, no. 1, pp. 44–56, 2002.

L. Mui, M. Mohtashemi, C. Ang, P. Szolovits, and A. Halberstadt, “Ratings in distributed systems: A bayesian approach,” in Proc. of the Workshop on Information Technologies and Systems (WITS), 2001.

P. Resnick, K. Kuwabara, R. Zeckhauser, and E. Friedman, “Reputation systems,” Commun. ACM, vol. 43, no. 12, pp. 45–48, 2000.

P. Resnick, R. Zeckhauser, R. Friedman, and K. Kuwabara, “Reputation systems,” Communications of the ACM, vol. 43, no. 12, pp. 45–48, Dec. 2000.

M. Abadi, M. Burrows, B. Lampson, and G. Plotkin, “A calculus for access control in distributed systems,” ACM Transactions on Programming Languages and Systems, vol. 15, no. 4, pp. 706–734, 1993.

G. Zacharia, A. Moukas, and P. Maes, “Collaborative reputation mechanisms for electronic marketplaces,” in Proc. of the 32nd Annual Hawaii Int. Conf. on System Sciences, 1999.

S. Brin and L. Page, “The anatomy of a large-scale hypertextual web search engine,” in Proc. of the 7th Int. Conf. on World Wide Web (WWW), 1998

M. Hines, Scammers gaming YouTube ratings for profit.

Taobao fights reputation spam in e-business boom, BeijingToday, Sept. 12 2009.