Anti-Entropy Protocols

Anti-Entropy Protocols, Gossips

Anti-entropy, or gossip, is an attractive way of replicating state that does not have strong consistency requirements. 
在不需要强一致性的条件下的(或者说, 只需要达到最终一致性), 一种高容错的, 分布式的一致性同步协议. 
为什么叫Anti-entropy? 很久我都不明白反熵的意思, 借用上面引用的说法

Gossip算法又被称为反熵(Anti-Entropy),熵是物理学上的一个概念,代表杂乱无章,而反熵就是在杂乱无章中寻求一致,这充分说明了 Gossip的特点:在一个有界网络中,每个节点都随机地与其他节点通信,经过一番杂乱无章的通信,最终所有节点的状态都会达成一致。每个节点可能知道所有其他节点,也可能仅知道几个邻居节点,只要这些节可以通过网络连通,最终他们的状态都是一致的,当然这也是疫情传播的特点。

 

Let us start our study with the following problem statement:

There is a set of nodes and each data item is replicated to a subset of nodes. Each node serves update requests even if there is no network connection to other nodes. Each node periodically synchronizes its state with other nodes is such a way that if no updates take place for a long time, all replicas will gradually become consistent. How this synchronization should be organized – when synchronization is triggered, how a peer to synchronize with is chosen, what is the data exchange protocol? Let us assume that two nodes can always merge their versions of data selecting a newest version or preserving both versions for further application-side resolution.

问题是什么?

This problem appears both in data consistency maintenance and in synchronization of a cluster state (propagation of the cluster membership information and so on). Although the problem above can be solved by means of a global coordinator that monitors a database and builds a global synchronization plan or schedule, decentralized databases take advantage of more fault-tolerant approach. The main idea is to use well-studied epidemic protocols [7] that are relatively simple, provide a pretty good convergence time, and can tolerate almost any failures or network partitions. Although there are different classes of epidemic algorithms, we focus on anti-entropy protocols because of their intensive usage in NoSQL databases.

Anti-entropy protocols assume that synchronization is performed by a fixed schedule – every node regularly chooses another node at random or by some rule and exchanges database contents, resolving differences. There are three flavors of anti-entropy protocols: push, pull, and push-pull. The idea of the push protocol is to simply select a random peer and push a current state of data to it. In practice, it is quite silly to push the entire database, so nodes typically work in accordance with the protocol which is depicted in the figure below.

这个问题当然可以通过global coordinator来解决, 但是decentralized设计可以提供more fault-tolerant approach的设计.

其实算法很简单, 就是epidemic protocols, 这儿选了在Nosql中广泛应用的anti-entropy protocols.

Push, 问B你有什么和我不同, B告诉我, 我把不同部分push给B

Pull, 告诉B我有什么, B把我没有的发给我

Push-pull, 把上面两个同时结合做了, 图的下面两条线箭头画反了

 

Anti-entropy protocols provide reasonable good convergence time and scalability. The following figure shows simulation results for propagation of an update in the cluster of 100 nodes. On each iteration, each node contacts one randomly selected peer.

One can see that the pull style provides better convergence than the push, and this can be proven theoretically [7]. Also, push has a problem with a “convergence tail” when a small percent of nodes remains unaffected during many iterations, although almost all nodes are already touched. The Push-Pull approach greatly improves efficiency in comparison with the original push or pulls techniques, so it is typically used in practice. Anti-entropy is scalable because the average conversion time grows as a logarithmic function of the cluster size.

Although these techniques look pretty simple, there are many studies [5] regarding performance of anti-entropy protocols under different constraints. One can leverage knowledge of the network topology to replace a random peer selection by a more efficient schema [10]; adjust transmit rates or use advanced rules to select data to be synchronized if the network bandwidth is limited [9]. Computation of digest can also be challenging, so a database can maintain a journal of the recent updates to facilitate digests computing.

怎么衡量Anti-entropy protocols, 当然是通过convergence time,  肯定是Push-pull效率最高


本文章摘自博客园,原文发布日期:2013-04-02

时间: 2024-10-28 19:32:18

Anti-Entropy Protocols的相关文章

x264代码剖析(十七):核心算法之熵编码(Entropy Encoding)

x264代码剖析(十七):核心算法之熵编码(Entropy Encoding)   熵编码是无损压缩编码方法,它生产的码流可以经解码无失真地恢复出原始数据.熵编码是建立在随机过程的统计特性基础上的.本文对熵编码中的CAVLC(基于上下文自适应的可变长编码)和CABAC(基于上下文的自适应二进制算术熵编码)进行简单介绍,并给出x264中熵编码对应的代码分析.     在H.264的CAVLC中,通过根据已编码句法元素的情况,动态调整编码中使用的码表,取得了极高的压缩比.CAVLC用于亮度和色度残差

在iis上部署了一个webservice,在iis中能正常运行,但是在一个项目中引用,编译通过,但是运行提示以下错误System.Web.Services.Protocols.SoapException: 服务器未能识别 HTTP 头 SOAPAction

问题描述 System.Web.Services.Protocols.SoapException:服务器未能识别HTTP头SOAPAction的值:http://www.g-esoft.com/OneAccount/WebService/GetEmployeeByKeyid.在System.Web.Services.Protocols.Soap11ServerProtocolHelper.RouteRequest()在System.Web.Services.Protocols.SoapServe

F5 怎么做anti sql injection

问题描述 如题,怎么在F5中做到anti sql injection 解决方案 Local Trafic -> iRules -> create -> name{your irule name} -> Definition -> when HTTP_REQUEST { switch -glob [URI::decode [string tolower [HTTP::uri]]] { "*delete *" - "*select *" -

谈anti ILdasm的原理以及anit 框架API的可行性

今天收到 maxtocode 的群发邮件,提到对MaxToCode运行库兼容性进行了修正,即降低了运行库anti的强度.确实在兼容性和安全方面很难做到鱼与熊掌兼得.anti得太多,程序的兼容性就成问题.codelib就是例子,可用性太差. 在maxtocode的邮件中仍然发现了如下两条描叙:* 增加了对ILdasm以及使用API 访问源数据的反编译工具的反制功能* 经测试,目前没有一种反编译工具可以完整的读取加密后的结构,更不用说加密后的代码了 其对该功能的描叙为:*可以使微软提供的底层反编译器

video-entropyd 2.0发布 entropy数据添加程序

video-entropyd 是一个从 http://www.aliyun.com/zixun/aggregation/17097.html">Video4Linux 设备添加entropy数据到内核随机驱动的程序,可以从 Video4Linux 设备中获取图像,计算数据信息之间的差异,再提交entropy数据. video-entropyd 2.0该版本是兼容最新的内核(内核输出Video4Linux2 API). 软件信息:http://www.vanheusden.com/ved/

论文笔记之:Multiple Feature Fusion via Weighted Entropy for Visual Tracking

  Multiple Feature Fusion via Weighted Entropy for Visual Tracking  ICCV 2015   本文主要考虑的是一个多特征融合的问题.如何有效的进行加权融合,是一个需要解决的问题.本文提出一种新的 data-adaptive visual tracking approach 通过 weighted entropy 进行多特征融合.并非像许多方法所做的简单的链接在一起的方法,本文采用加权的 entropy 来评价目标状态和背景状态之间

【原创】 熵与反熵

      在 Riak 的众多概念中,存在一个称作 Active Anti-Entropy 的,由于从字面上很难理解其真实含义,所以稍微研究了一番.结果发现水还是比较深的~       下面通过一些相关资料的研读,展开这次的知识学习.  ---  Active Anti-Entropy 对于针对各副本仅作宽松一致性保证的分布式系统来说,需要一种方法将副本状态最终收敛为一致:在大数据集上进行上述收敛行为是非常有难度的: 无论是对数据集进行处理,还是在数据集之间进行比较都需要高昂的代价: 如果两个

Oracle 隐含参数

Oracle 隐含参数 点击(此处)折叠或打开 set pagesize 9999 set line 9999 col NAME format a40 col KSPPDESC format a50 col KSPPSTVL format a20 SELECT a.INDX,        a.KSPPINM NAME,        a.KSPPDESC,        b.KSPPSTVL FROM x$ksppi a,        x$ksppcv b WHERE a.INDX = b.

(转)分布式深度学习系统构建 简介 Distributed Deep Learning

HOME ABOUT CONTACT SUBSCRIBE VIA RSS   DEEP LEARNING FOR ENTERPRISE Distributed Deep Learning, Part 1: An Introduction to Distributed Training of Neural Networks  Oct 3, 2016 3:00:00 AM / by Alex Black and Vyacheslav Kokorin   Tweet inShare27   This