|
The science behind movie selection at Netflix is called Cinematch, proprietary software that analyzes each consumer’s habits then recommends movies it thinks they might like. But Netflix believes its matching system, which spans the 100,000 DVD titles in its catalog, can be improved by as much as 10 percent.
Rather than assign the task internally, Netflix initiated a global competition that began in October 2006. Since then, more than 2,500 teams have submitted entries. And, so far, the team with the Mudder has come out on top and closest to the $1 million prize.
Team BellKor, made up of Robert Bell ’72 (right) and Chris Volinsky from the Statistics Research Group in AT&T Labs, and former AT&T employee Yehuda Koren, won the first two $50,000 Progress Prizes that have served as incentive for competitors. BellKor earned the best score at the competition’s one-year anniversary with an improvement of 8.43 percent. At year two, they achieved the top spot by collaborating with BigChaos of Austria (in 81st place at year one) to achieve a 9.44 percent improvement, still not enough to win the million dollar prize but satisfying, nonetheless, says Bob, an HMC mathematics graduate who has worked at AT&T for 10 years.
“The three of us worked on the Netflix problem because we thought it was a cool application, something that could help us learn about techniques we didn’t know a lot about but that could be valuable to us in our own work,” says Bob, who does data analysis and model building for a variety of AT&T projects and theoretical research for academia.
Netflix released 100 million anonymous movie ratings performed by about 480,000 users on 17,770 movies. The data set is the largest of its kind ever released and one of the most challenging many have ever seen, including Bob.
Managing the vast data and avoiding overfittingwhen relationships that appear statistically significant are misleadingare two of the biggest project challenges, he says.
Difficulties aside, BellKor and other teams have found the problem interesting enough to continue seeking solutions, often collaboratively. A leader board keeps competitors apprised of their current spot in the rankings. Contestants post ideas on online forums and have discussions at workshops and conferences. “Even though contestants are spread around the world, you know some of the other contestants and what they’re working on and who you think is really good,” Bob says.
Collaboration, like that between BellKor and Big Chaos, has helped teams get closer to the coveted 10 percent improvement and $1 million prize. Nearly everyone working on the problem has found it harder to improve with time.
“Research tends to move along most quickly when a lot of people are able to share ideas and benchmark them against others and try to build on each other’s work,” says Bob. “It’s been a very good experience not only for those of us here working on it, but in general for all involved.”
BellKor has developed several innovations that improved existing collaborative filtering methods. According to Netflix vice president Jim Bennett, some of the team’s suggested improvements are being included in Netflix’s movie recommendation software. Eventually, BellKor’s work may be applied at AT&T, which will soon deliver movies to AT&T customers through video on demand, pay per view and other services.
Though the Netflix Progress Prizes do not go to the AT&T employees (it goes to the company), nor would the $1 million should BellKor win it, Bob says winning the millions is not the team’s expectation. “We thought our chances were slim to none,” he says. “The goal was to work on an interesting problem.”
At press time, BellKor was second with an improvement rate of 9.63 percent. The contest will continue until someone reaches 10 percent or until 2010, at which time the contest may end.
“It is a lot more difficult now, but it seems likely that some team or combination of teams will be able to make it,” says Bob. “No one really knows exactly how long that will take or what additional breakthroughs it will require.” 
|