Real-Time Personalized Recommendation System

Introduction

A real-time system is a system that processes input data within milliseconds so that the processed data is available almost immediately for feedback. Real-timeliness in the video recommendation system is mainly reflected in three layers:

  • Real-Time Construction of Short-Term Interest Models in the User's Profile: After a user finishes a video, the video content will influence the user's short-term interest model for a few seconds. The recommended video reflects this embodied influence.
  • Real-Time Changes of Candidate Sets: In this recommendation system, the definition of the candidate sets is the recommendation of different types of video libraries for the user. A user cannot view all the candidate sets but can view only a part of the candidate set after the matching algorithm processes the candidate sets. The update interval of the candidate set directly affects the real-timeliness of videos that the user can see. Several candidate sets exist, each tailored for different scenarios. For example, by combining the latest candidate set and the most popular candidate set in the past N hours, we can achieve a recommendation effect similar to that of toutiao.com. The generation of new content candidate sets is in real time, while the popular video candidate set in the past N hours may be updated every several minutes. As another example, synergy can achieve the recommendation of related videos, which can further shortlist the user-favorite content from popular candidate sets that attract the common interest of users.
  • Real-Time Presentation of Recommendation Performance Metrics: After the product is online, key metrics such as the click conversion rate can be updated once every several minutes. The recommendation system has a special feature in which the performance is not measured by subjective opinions but by specific metrics, such as the click conversion rate.

User Profiles and Video Profiles

Reflection of user profiles in the interest model is a common occurrence. By building users' long-term and short-term interest models, businesses can satisfy users' interests and demands. There are various ways to provide recommendations, such as synergy and a variety of small tricks. However, user profile-based and video profile-based recommendations are difficult in the initial phase. In the long run, these user profiles can promote the team's understanding of users' video consumption habits and support businesses other than recommendations.

Asynchronous

A user's refresh actions triggers the recommendation calculation. Once refreshed, the user information is sent to Kafka asynchronously, and the Spark Streaming program will analyze the data and match candidate sets with users. The user's private queue in Redis receives the calculation result. The interface service is only responsible for getting the recommendation data and sending the user refresh actions. The private queue of a new user or a user who has not accessed the service in a long time may have expired. In this case, the asynchronous operation will cause problems. Once the front-end interface discovers this issue, it will perform one of the following actions to resolve the problem:

  • Sends a special message (the backend connects to a Storm cluster) and then holds the session, waiting for the asynchronous calculation result.
  • Obtains the user interest tags and tries to determine the synergy according to certain rules. Then it searches for the data in ES, populates the data onto the private queue, and quickly provides the result (the solution we are adopting).

Asynchronous calculation covers most of the calculations, except for new users.

Impact of Streaming Technologies on Recommendation System

In 2014, the concept of stream computing concept did not exist and the reuse of existing technical system was not possible. As a result, our recommendation system was overly complicated and difficult to be productized. To make matters worse, the recommendation effects were only visible the next day, resulting in a prolonged cycle of effect improvement. During that time, the entire development cycle exceeded one month.

On the contrary, today's system based on StreamingPro has two or three developers, each investing only two to three hours a day. The developers can complete the entire development within just two weeks. Stream computing has had a large impact on the approach and implementation of the recommendation system.

The recommendation system includes all other computing-related features, except interface services. However, the features are not limited to:

  • New content pre-processing, such as tagging and storage into multiple stores
  • User profile construction, such as short-term interest model
  • New and popular data candidate sets
  • Short-term synergy
  • Recommendation performance metrics, such as click conversion rate

All these processes are completed using "Spark Streaming." For long-term synergy (data of more than one day) and the user's long-term interest models among others, Spark batch processing is adopted. Thanks to the utilization of the StreamingPro project, all the calculation processes can be configured. You will see a list of description files that constitute the core computing processes of the entire recommendation system.

We would like to mention three points here, which are as follows:

  • Recommendation Effect Evaluation: We use the Spark Streaming + ElasticSearch solution. That is, Spark Streaming pre-processes the reported exposure click data and stores the data to ES. Then ES provides the query interface for the BI report to use. This avoids pre-calculation of metrics, which may result in frequent changes of the streaming computing procedures because it does not consider the implementation of many metrics.
  • Reuse of Existing Big Data Infrastructure: Throughout the entire recommendation system, only the provision of API services requires a separate deployment and all other calculations run on the Hadoop cluster using Spark.
  • Adjustment of Calculation Cycles: All calculation cycles and computing resources can be adjusted conveniently or even dynamically (Spark Dynamic Resource Allocation). This is vital because it allows us to sacrifice specific real-timeliness to save resources or spare more resources for offline tasks.

    Recommendation System Architecture

The figure below shows the structure of the entire recommendation system:

Figure 1. Recommendation System Structure

Distributed streaming computing is mainly responsible for five sections:

  • Processing of clicks, exposures, and other reported data
  • New video tagging
  • Short-term interest model calculation
  • User recommendation
  • Calculation of candidate sets, such as the latest, the most popular sets (during any time period)

The storage solutions include:
1. Codis (user recommendation list)
2. HBase (user profile and video profile)
3. Parquet (HDFS) (archived data)
4. ElasticSearch (copy of HBase)

The following figure shows more details about the streaming computing section:

Figure 2. Detailed Recommendation System Structure
Technical solutions adopted for user reporting:

  • nginx
  • Flume (Collect nginx logs)
  • Kafka (Receive Flume reports)

For third-party content (full-site), we developed a collection system on our own.

Personalized Recommendation

Figure 3. Principles of Personalized Recommendations

The recommendation system updates all candidate sets in real time.

The concept of parameter configuration servers can be understood as follows.

Suppose, we have two algorithms A and B, each of which are completed by independent streaming programs. Each program calculates its result set. The content data size and frequency calculated by different candidate sets and algorithms vary. Let us assume that the result set from A is too large, while that from B is small but of excellent quality. In this case, when the recommendation queue of a user receives algorithms A and B, the algorithms submit their own situations to the parameter configuration server. These algorithms will determine the final amount to be sent to the queue. The parameter server can likewise control the corresponding frequency. For example, if algorithm A generates a new recommendation in just 10 seconds after the last recommendation result, the parameter server can refuse to write its content to the recommendation queue of the user.

The above-mentioned case is a multi-algorithm process control. However, there is an alternate approach to this process. We can introduce a new algorithm, K, by blending the results from A and B. Since every algorithm is a configurable module in StreamingPro, A, B, and K will be put into a Spark Streaming application now. K can periodically call A and B for calculation and mix the results, and finally, write the result to the recommendation queue of the user as authorized by the parameter configuration server.

Conclusion

This article explores the usage of stream computing for the personalized video recommendation system. In this approach, a tag system is designed and then applied users and videos. Multiple algorithms, including LDA and Bayesian, are combined to gain a wholesome and useful experience.

时间: 2024-09-12 06:43:00

Real-Time Personalized Recommendation System的相关文章

(转) Quick Guide to Build a Recommendation Engine in Python

  本文转自:http://www.analyticsvidhya.com/blog/2016/06/quick-guide-build-recommendation-engine-python/   Introduction This could help you in building your first project! Be it a fresher or an experienced professional in data science, doing voluntary proj

《中国人工智能学会通讯》——11.7 场景化个性化的地理位置推荐系统

11.7 场景化个性化的地理位置推荐系统 基于地理位置的社会媒体网络服务的出现 , 例如 Foursquare.Facebook Places 和大众点评,为人们提供了一个产生和分享在物理位置进行评价的活动的便捷平台.全面地理解这种基于地理位置的用户评分行为对于进行很多应用十分重要 , 例如个性化推荐.地理位置探索和服务营销.文献 [8] 已经做了很多努力进行从用户评分历史数据中挖掘知识帮助用户找到有兴趣的地理物品.但是,利用用户的地理位置行为历史数据推断地理物品的评分进行推荐仍是一个具有挑战性

基于Apache Mahout构建社会化推荐引擎

推荐引擎简介 推荐引擎利用特殊的信息过滤(IF,Information Filtering)技术,将不同的内容(例如电影.音乐.书籍.新闻.图片.网页等)推荐给可能感兴趣的用户.通常情况下,推荐引擎的实现是通过将用户的个人喜好与特定的参考特征进行比较,并试图预测用户对一些未评分项目的喜好程度.参考特征的选取可能是从项目本身的信息中提取的,或是基于用户所在的社会或社团环境. 根据如何抽取参考特征,我们可以将推荐引擎分为以下四大类: 基于内容的推荐引擎:它将计算得到并推荐给用户一些与该用户已选择过的

The Log

The Log: What every software engineer should know about real-time data's unifying abstraction 译文 Jay Kreps Principal Staff Engineer Posted on 12/16/2013 I joined LinkedIn about six years ago at a particularly interesting time. We were just beginning

[推荐系统]推荐系统实践Reference

这只是一本197页的书    我想你未必过瘾    但作者附上了诸多好资料    无论是paper, blog文章,wikipedia词条,数据集还是开源项目等    你可以选择拥有    附上我收集的资料链接,格式基本按照'URL+资料名称+出现在书中的页数',某些链接可能需要你翻过一道'墙',某些重复引用的我就没重复贴上链接了      http://en.wikipedia.org/wiki/Information_overload  P1    http://www.readwritew

The Log: What every software engineer should know about real-time data's unifying abstraction

主要的思想,  将所有的系统都可以看作两部分,真正的数据log系统和各种各样的query engine  所有的一致性由log系统来保证,其他各种query engine不需要考虑一致性,安全性,只需要不停的从log系统来同步数据,如果数据丢失或crash可以从log系统replay来恢复  可以看出kafka系统在linkedin中的重要地位,不光是data aggregation,而是整个系统的核心 Part One: What Is a Log? log定义 很简单的结构,最关键的属性是,

细说YouTube推荐系统的变迁

2017年架构师最重要的48个小时 | 8折倒计时 达观数据高级工程师,曾获美国大学生数学建模竞赛二等奖,目前参与达观数据推荐系统研发,负责酷6,wifi万能钥匙和视频看看等项目. 介绍 总所周知,YouTube是世界上最大的视频网站,网站每天要面对着不同兴趣的用户,它需要从视频池中捞出当前用户感兴趣,想看的视频,以留住老用户吸引新用户,而这个功能就是视频推荐系统提供的.而随着不同算法技术的兴起,推荐系统的核心算法也在发生变化.本文主要从YouTube推荐系统的四篇论文<Video Sugges

浅谈微博精准推荐——用户行为挖掘与相似用户挖掘

引言:在推荐系统中,通过对用户数据的挖掘,抽象出用户感兴趣的"商品",以微博的博文推荐为例,"商品"表现为用户的博文,在博文精准推荐中,其核心问题是在给定的环境下,为用户推荐高质量且符合用户兴趣的博文. 本文选自<Python机器学习算法>. 精准推荐 1 精准推荐的项目背景 在社交网络中,每一个用户只是整个网络中的一个节点,一个简单的网络结构如图1所示. 图1 网络结构 在微博中,用户可以通过"关注"行为成为另一个用户的粉丝,&qu

Principles and Applications of the Index Types Supported by PostgreSQL

Background PostgreSQL supports a wide range of features: Open data interfaces that allow PostgreSQL to support a wide range of different data types. Apart from those supported by traditional databases, it also supports GIS, JSON, RANGE, IP, ISBN, ima