Why Vector Clock are Easy or Hard?

通过实际例子来阐述vector clock其实是容易理解的, easy 
同样通过实际例子来描述在使用vector clock时会遇到哪些难以解决的问题, hard

Why Vector Clocks are Easy

http://basho.com/blog/technical/2010/01/29/why-vector-clocks-are-easy/

Vector Clocks by Example

通过如下日常的例子来帮助理解vector clock, 你会发现其实这个算法并不复杂. 
大家想计划一下, 下周哪天一起吃饭? 大家如何达成最终的一致性  
之间看下面的图吧, 过程很清晰 

Alice, Ben, Cathy, and Dave are planning to meet next week for dinner.

The planning starts with Alice suggesting they meet on Wednesday.

Later, Dave discuss alternatives with Cathy, and they decide on Thursday instead.

Dave also exchanges email with Ben, and they decide on Tuesday.

When Alice pings everyone again to find out whether they still agree with her Wednesday suggestion, she gets mixed messages:

Cathy claims to have settled on Thursday with Dave, and Ben claims to have settled on Tuesday with Dave.

Dave can't be reached, and so no one is able to determine the order in which these communications happened, and so none of Alice, Ben, and Cathy know whether Tuesday or Thursday is the correct choice.

The story changes, but the end result is always the same: you ask two people for the latest version of a piece of information, and they reply with two different answers, and there's no way to tell which one is really the most recent.

 

Start with Alice's initial message: "Let's meet Wednesday,"

date = Wednesday
vclock = Alice:1



Ben suggests Tuesday:

date = Tuesday
vclock = Alice:1, Ben:1
 

Dave replies, confirming Tuesday:

date = Tuesday
vclock = Alice:1, Ben:1, Dave:1

Now Cathy gets into the act, suggesting Thursday:

date = Thursday
vclock = Alice:1, Cathy:1
 

Dave has two conflicting objects:

date = Tuesday
vclock = Alice:1, Ben:1, Dave:1

and

date = Thursday
vclock = Alice:1, Cathy:1

Dave can tell that these versions are in conflict, because neither vclock "descends" from the other. Luckily, Dave's a reasonable guy, and chooses Thursday. Dave also created a vector clock that is a successor to all previously-seen vector clocks. He emails this value back to Cathy.

date = Thursday
vclock = Alice:1, Ben:1, Cathy:1, Dave:2

So now when Alice asks Ben and Cathy for the latest decision, the replies she receives are, from Ben:

date = Tuesday
vclock = Alice:1, Ben:1, Dave:1

and from Cathy:

date = Thursday
vclock = Alice:1, Ben:1, Cathy:1, Dave:2



 

From this, she can tell that Dave intended his correspondence with Cathy to override the decision he made with Ben. All Alice has to do is show Ben the vector clock from Cathy's message, and Ben will know that he has been overruled.

That worked out pretty well.

通过上面的图示, 确实体现出'Why Vector Clocks are Easy’

 

Why Vector Clocks Are Hard

http://basho.com/blog/technical/2010/04/05/why-vector-clocks-are-hard/

但是在实际实现该系统时, 还是有很多问题需要解决的, 从实现的角度体现'Why Vector Clocks Are Hard’ 
最难的问题是, 以什么作为actor? 以及当vclocks随着时间不断变大的时候, 怎样保存它?

1. What an actor is (i.e. where the incrementing and resolution is, and what parties get their own field in the vector)

2. How to keep vclocks from growing without bound over time.

In this example the parties that actually proposed changes ("clients") were the actors in the vector clocks. This is the model that vector clocks are designed for and work well with, but it has a drawback. The width of the vectors will grow proportionally with the number of clients. In a group of friends deciding when to have dinner this isn't a problem, but in a distributed storage system the number of clients over time can be large. Once the vector clocks get that large, they not only take up more space in disk and RAM but also take longer to compute comparisons over.

这个就谈到选取什么作为actor的问题, 上面的例子用client作为actor是很清晰, 但问题是在分布式环境中, client的数量可能是非常多的, 这个就有严重的效率问题.

One straightforward idea is to make the servers handling client requests be the "actors", instead of representing the clients directly. Since any given system usually has a known bounded number of servers over time and also usually has less servers than clients, this serves to reduce and cap the size of the vclocks

Let's think through the same example, but with that difference, to see how it goes. We'll assume that a 2-server distributed system is coordinating the communication, with clients evenly distributed among them.

下面的例子就尝试使用server作为actor来避免client过多的问题

Alice and Dave happen to get server X, and Ben and Cathy get server Y.

We're fine through the first few steps:

The only real difference so far is that each update increments a vector clock field named for the client's chosen server instead of the client itself. This will mean that the number of fields needed won't grow without bound; it will be the same as the number of servers. This is the desired effect of the change.

We run into trouble, though, when Cathy sends her update:

 

In the original example, this is where a conflict was created. Dave sorted out the conflict, and everything was fine.

With our new strategy, though, something else happened. Ben and Cathy were both modifying from the same original object. Since we used their server id instead of their own name to identify the change, Cathy's message has the same vector clock as Ben's! This means that Dave's message (responding to Ben) appears to be a simple successor to Cathy's... and we lose her data silently!

用server作为actor的问题就是, 会丢失一些数据, 因为Cathy和Ben都操作Y, 所以对于系统而已, Cathy和Ben没有区别都是Y, 所以当Dave confirm Ben后, vector为X2, Y1, 这个时候再收到Cathy的X1, Y1时, 就会认为是过期数据, 自动丢掉. 这样就无法解决上面的协调这个问题, 所以这个简单solution是不可行的.

发生这个问题的原因如下,

For vector clocks to have their desired effect without causing accidents such as this, the elements represented by thefields in the vclock must be the real units of concurrency. In a case like this little example or a distributed storage system, that means client identifiers, not server-based ones.

server不是真正的并发单元, 而你把他作为actor, 其实是不合理的, 所以就会有现在这个问题

 

Just Lose a Little Information and Everything Will Be Fine

If we use client identifiers, we're back in the situation where vector clocks will grow and grow as more clients use a system over time.

The solution most people end up with is to "prune" their vector clocks as they grow.

This is done by adding a timestamp to each field, and updating it to the current local time whenever that field is incremented. This timestamp is never used for vclock comparison -- that is purely a matter of logical time -- but is only for pruning purposes.

This way, when a given vclock gets too big, you can remove fields, starting at the one that was updated longest ago, until you hit a size/age threshold that makes sense for your application.

But, you ask, doesn't this lose information?

Yes, it does -- but it won't make you lose your data.

The only case where this kind of pruning will matter at all is when a client holds a very old copy of the unpruned vclock and submits data descended from that. This will create a sibling (conflict) even though you might have been able to resolve it automatically if you had the complete unpruned vclock at the server. That is the tradeoff with pruning: in exchange for keeping growth under control, you run the chance of occasionally having to do a "false merge"... but you never lose data quietly, which makes this approach unequivocally better than moving the field identifiers off of the real client and onto the server.

这个方法思路, 就是既然我们必须用client作为actor, 问题就是vector会越来越大, 那么可不可以解决这个问题...

那就是prune,  根据什么去prune, timestamp, 这样定期把比较老的client数据清掉, 就可以比较合理的解决这个问题.

这个方案是否有风险? 有一些, 不过可以接受, 所以是有价值的tradeoff, 至少我们现在不会lose data quietly.


本文章摘自博客园,原文发布日期:2012-06-06

时间: 2024-08-31 19:59:27

Why Vector Clock are Easy or Hard?的相关文章

Vector Clocks, 时间向量

Why Vector Clock are Easy or Hard? Amazon's Dynamo, 4.4版本化的数据   常用的版本化技术 Timestamps seem to be an obvious solution for developing a chronological order. However, timestamps "rely on synchronized clocks and don't capture causality" as Lipcon poin

HybridTime - Accessible Global Consistency with High Clock Uncertainty

Amazon's Dynamo [9] and Facebook's Cassandra [13], relax the consistency model,and offer only eventual consistency. Others such as HBase [1] and BigTable [4] offer strong consistency only for operations touching a single partition, but not across the

NoSQL Databases技术资料整理汇总

0 Reference NoSQL论文 在 Stuttgart Media 大学的 Christof Strauch 历时8个月(2010年6月-2011年2月)完成了一篇150页长的NoSQL相关的论文, 对NoSQL的各个方面做了探讨 http://www.christof-strauch.de/nosqldbs.pdf 分布式系统领域经典论文翻译集 http://duanple.blog.163.com/blog/static/709717672011330101333271/ 2010

分布式系统(Distributed System)资料

原文地址:https://github.com/ty4z2008/Qix/blob/master/ds.md 希望转载的朋友,你可以不用联系我.但是**一定要保留原文链接**,因为这个项目还在继续也在不定期更新.希望看到文章的朋友能够学到更多. <Reconfigurable Distributed Storage for Dynamic Networks> 介绍:这是一篇介绍在动态网络里面实现分布式系统重构的paper.论文的作者(导师)是MIT读博的时候是做分布式系统的研究的,现在在NUS

Dynamo: Amazon’s Highly Available Key-value Store

www.allthingsdistributed.com/2007/10/amazons_dynamo.html , 英文版 http://blog.163.com/woshitony111@126/blog/static/71379539201231492557944/ , 中文版   1 Overview Amazon服务平台中的许多服务只需要主键访问数据存储. 对于许多服务, 如提供最畅销书排行榜, 购物车, 客户的偏好, 会话管理, 销售等级, 产品目录, 常见的使用关系数据库的模式会导

Riak学习(3):Riak对比HBase(转)

文章转自:http://blog.nosqlfan.com/html/4081.html 文章来自Riak官方wiki,是一篇Riak与HBase的对比文章.Riak官方的对比通常都做得很中肯,并不刻意偏向自家产品.本文也是一样. 对比的Riak版本是1.1.x,HBase是0.94.x. 大方面对比 Riak 与 HBase 都是基于 Apache 2.0 licensed 发布 Riak 的实现是基于 Amazon 的 Dynamo 论文,HBase 是基于 Google 的 BigTabl

跨机房问题

跨机房问题一直都是一个老大难的问题,先看传统数据库的跨机房方案. Master/Slave方案 这是最常用的方案,适用于大多数需求.Master将操作日志实时地发送到Slave,Slave当成Master的一个Hot Backup.Master宕机时,服务切换到Slave,需要修改客户端逻辑使得Master失效时自动寻找新的Master. 这个方案有一个问题就是数据库的Master和Slave一般不是强同步的,所以,切换到Slave后可能丢失宕机前的少量更新.如果将Master和Slave做成强

【分布式系统工程实现】系统可扩展性演化

一般来说,只要多台机器通过互相协调共同执行某项任务,我们都会将这套系统称为"分布式系统",这样显得更有技术含量.从这个意义上讲,Memcache集群,Mysql sharding集群,Yahoo PNUTS,Google GFS&Bigtable,Amazon Dynamo以及国内众多专用的NOSQL系统都是分布式系统.分布式系统的难点主要在于可扩展性,随着机器数量的增多,集群能力是否能够接近线性扩展,因集群规模扩大而产生的容错需求,负载的动态迁移,对系统的设计和实现是一个巨大

海量存储之十六--一致性和高可用专题

很久木有和大家见面了,因为博主也需要时间来沉淀..博主也需要学习和思考.. 好吧,不多废话,进入正题,今天我们谈的东西是一致性和安全性. 一致性这个问题,非常绕,想用语言表述,难度很大,我给别人去讲的时候,一般都是白板,因为白板有类似"动画"的效果,能够帮助别人理解,但使用文字,就没有办法了,只好要求各位有一定的抽象思维能力,能在自己的脑袋里模拟这种动画吧:) 主要会聊到: 简单的双机两阶段提交,三阶段提交,vector clock ,paxos思路,paxos改进思路,既然要阐述问题