郑昀@玩聚SR 20091105
一、冷启动
Greg Linden针对最新的一篇论文:"The Wisdom of the Few: A Collaborative Filtering Approach Based on Expert Opinions from the Web" (PDF,即《少数人的智慧:基于网络专家意见的协同过滤研究》) 做了如下点评:
“
What they do say is that using a very small pool of experts works surprisingly well.
论文说的是,用很小一个专家池,推荐效果惊人地好。
In particular, I think it suggests a good alternative to content-based methods for bootstrapping a recommender system.
我认为它为一个推荐系统的自启动指出了一个很好的替代选择。
If you can create a high quality pool of experts, even a fairly small one, you may have good results starting with that while you work to gather ratings from the broader community.
”
即,选择一个高质量专家池,可以是你组建的团队,也可以是你选中的专家群,即使是相当小的一个群体,你的推荐系统也会有一个非常好的开端。少数人的智慧,此时此刻,可以解决推荐系统的冷启动问题。这也是玩聚SR最开始选择Experts Pool作为起源,一上来就有很好信息过滤器效果的原因。
二、论文的摘要:
为了方便理解,下面意译一下该论文:
最近邻协同过滤(Nearest-neighbor collaborative filtering)是一个很有效的推荐方法。但它总受困于这几个问题:
数据稀疏和噪音;冷启动问题(cold-start);可扩展性问题。
所以论文作者提出一个新方法,一个传统协同过滤方法的变种:
并不是对用户打分数据(User-rating data)实施最近邻算法,而是用一个专家邻居(expert neighbors)集合作为比对样本,去计算这批人与目标用户的相似度。
这个方法至少没有太大可扩展性问题,相当于缩小了比对的基准集合。最近邻原方法可近似理解为做两两比对,计算肯定花时间,而且当新用户(尤其是某某观光团的到来会让数据噪音多得一塌糊涂)比比皆是时,没有几条数据能够让你进行相似性计算。
作者定义专家为,在给定领域,能够产生思虑周全的、始终如一的和可靠的评估(评分)、我们可信任的独立个体。
(原文:
We define an expert as an individual that we can
trust to have produced thoughtful, consistent and reliable
evaluations (ratings) of items in a given domain.
)
我们比较关注论文作者们的以下两个探讨问题的角度:
(a) study how preferences of a large population can be pre-
dicted by using a very small set of users;
研究用一小群用户去预测海量用户到底有多大的可参考价值;
(c) analyze whether professional raters are good predictors for general users;
如果这几个角度是可行的话,那么实际上并不需要拿到一个海量用户社区的所有数据,只要锁定Experts Pool即可为用户进行推荐。
附录:
Greg Linden在被封的BlogSpot的原文如下:
Wednesday, November 04, 2009
Using only experts for recommendations
A recent paper from SIGIR, "The Wisdom of the Few: A Collaborative Filtering Approach Based on Expert Opinions from the Web" (PDF), has a very useful exploration into the effectiveness of recommendations using only a small pool of trusted experts.
The results suggest that using a small pool of a couple hundred experts, possibly your own experts or experts selected and mined from the web, has quite a bit of value, especially in cases where big data from a large community is unavailable.
A brief excerpt from the paper:
Recommending items to users based on expert opinions .... addresses some of the shortcomings of traditional CF: data sparsity, scalability, noise in user feedback, privacy, and the cold-start problem .... [Our] method's performance is comparable to traditional CF algorithms, even when using an extremely small expert set .... [of] 169 experts.
Our approach requires obtaining a set of ... experts ... [We] crawled the Rotten Tomatoes web site –- which aggregates the opinions of movie critics from various media sources -- to obtain expert ratings of the movies in the Netflix data set.
The authors certainly do not claim that using a small pool of experts is better than traditional collaborative filtering.
What they do say is that using a very small pool of experts works surprisingly well. In particular, I think it suggests a good alternative to content-based methods for bootstrapping a recommender system. If you can create a high quality pool of experts, even a fairly small one, you may have good results starting with that while you work to gather ratings from the broader community.