HybridTime - Accessible Global Consistency with High Clock Uncertainty

Amazon’s Dynamo [9] and Facebook’s Cassandra [13], relax the consistency model,and offer only eventual consistency.

Others such as HBase [1] and BigTable [4] offer strong consistency only for operations touching a single partition, but not across the database as a whole.

Dynamo和Cassandra都是基于Vector Clocks和Gossip算法,强调高可用性,可以达到eventual consistency

而HBase,可以提供单分区的强一致性,但对于跨分区,或跨数据库,就无法保证;

for example, in a geo-replicated service, a user may first be routed to a local datacenter and perform some action such as sending an email message. 
If they then reload their browser and are routed to another datacenter backend, they would still expect to see their sent message in their email outbox.

这个例子比较典型,要达到异地的一致性

 

一般要解决分布式问题的一致性问题的最简单的办法就是用synchronized clocks across all servers,但这基本是达不到的

the traditional approach to global consistency has been to use logical clocks. 
Logical clocks, such as Lamport Clocks [15] or Vector Clocks [10, 20].

所以传统的做法就是用逻辑时钟

https://www.zhihu.com/question/19994133

贴个回答,写的不错

当然可以,但是不适合在分布式数据库里面。为什么呢? 首先,需要先确定你这个时间戳取自哪里。1、所有的时间戳都取自一个中心点节点,这样让时间戳得到一个全序关系,这是最简单的实现,如果你的分布式数据库只是部署在一个数据中心的话,但是这样就增加了一个中心节点,影响数据库的扩展性,而且任何一个取时间戳的动作都要增加上网络延迟的开销。如果分布式数据库部署需要跨数据中心,这种方案变的不可行,跨数据中心网络时延太大。从而影响整体性能2、所有的时间戳取自已所在的节点。这就涉及到时钟同步的问题,就像Dongdong说的,如果取的物理时间那么这些节点并没有一个一致的时间,总会有一点误差,这种解决方案就有两种思路: 
一、就是现在以Leslie Lamport 提出的逻辑时钟的方法,重新定义了一种分布式的全序关系。vector clock就是以逻辑时钟为基础上发展而来的。 
二、使用物理时钟,就是google spanner使用gps 加原子钟进行时间同步,不同节点之间的时间是不能精确同步的,只能把不同节点之间的时间同步到一个范围之内,spanner一个节点上获得一个时间戳,不是一个时间值,而是一个时间值加上一个误差范围,也就是一个时间窗,如果两个时间窗有重合,就无法比较大小,从而无法比较出来时间窗代表的事件的先后关系。而时间同步算法就是让分布式各节点之间的时间的误差尽量的小,这样取时间戳这个动作效率是最高的,只在本节点取一下物理时间即可,但是为了同步时间,各个节点肯定在周期与其它节点进行通讯来同步物理时间(google时间同步的算法好像没有开源)。时间同步算法只能让时间尽量的精确。分布式数据库里面是不能直接使用NTP来同步时间的,因为NTP时间同步协议里面允许节点的时间能够回退,而数据库里面要求本地的时钟必须是递增的。 
三、还有一种混合逻辑时钟(Logical Physical Clocks),也是由逻辑时钟演变而来,但是时钟的值比较接近于物理时钟,而且不依赖于下面物理时钟的同步的算法,它的时间戳都可以在本地取,而且取出来是一个时间值,而不像spanner一样,取出来是一个时间窗。详细可以参见Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases这篇论文

关于Lamport或向量时钟,

Lamport时钟就是逻辑时钟,即不依赖于物理时钟,而依赖event之间的因果关系来定义的一种偏序

参考,全序, 分布式一致性的本质

而Vector时钟是,逻辑时钟的一种实现,具体的逻辑看下文,

Why Vector Clock are Easy or Hard?

摘自上面的知乎回答

“dynamo paper是七、八年前写的了,现在的amazon dynamo早已摒弃了version vector,而采用了synchronous replication(类似paxos的protocol),每一个partition会有一个leader负责写入。其实version vector并不scale,因为对于一个key来说,随着writer数量的增加,version vector数量成指数级的增长” 

逻辑时钟的缺点,

first, they require that all clients must propagate clock data to achieve consistent views, 
and second, the assigned timestamps have no relation to physical time.

 

Spanner introduced commit-wait, a way of ensuring physical-time based consistent global state by forcing operations to wait long enough so that all participants agree that the operation’s timestamp has passed based on worst case synchronization error bounds.

While innovative, the system performance becomes highly dependent on the quality of the time synchronization infrastructure, and thus may have unacceptable performance absent specialized hardware such as atomic clocks and GPS receivers.

Spanner利用原子钟和GPS接收器,实现了一个较为精确的时钟,这个时钟叫做TrueTime,每次调用TrueTime API返回的是一个时间区间,而不是一个具体的值,这个TrueTime保证的是真实时间(absolute time/real time)一定在这个区间内,这个区间范围通常大约14ms,甚至更小。

所以只有每个transaction的时间区间,都不重合,那么就可以比较容易的做到全局有序;

所以这里需要commit-wait,

可以看到,我在开始transaction的时候,取一次true time,[t1.earliest, t1.latest]

那么什么时候,我可以把这个transaction commit掉,即释放掉锁,即别人可以开始新的transaction

当我再取一次true time,[t2.earliest, t2.latest],当t2.earliest > t1.latest的时候,即可

因为这样两个transaction的时间区间就不可能重叠了,当然代价就是每个transaction都有等待大概2个error bound的时间

当然,spanner的实现依赖硬件的infra设备,普通用户无法复制;

 

HybridTime(HT), a hybrid between physical and logical clocks and we show how HT can be used to achieve the same semantics as Spanner, but with good performance even with commonly available time synchronization.

Like Spanner, our approach relies on physical time measurements with bounded error to assign HybridTime timestamps to events that occur in the system. 
However, unlike Spanner, our approach does not usually require the error to be waited out, thus allowing for usage in common deployment scenarios where clocks are synchronized through common protocols such as the Network Time Protocol, in which clock synchronization error is often higher than with Spanner’s TrueTime.

The trade-off is that, in order to avoid commit-wait, HybridTime requires that timestamps be propagated across machines to achieve the same consistency semantics as Spanner. 
Contrary to vector clocks, which can expand as the number of participants in the cluster grows, HybridTime timestamps have constant and small size.

HybridTime说是physical and logical clocks的结合,和spanner一样,也可以使用带error bound的物理时间 
但由于他没有spanner的硬件设施,所以做不到低延迟的时间同步,所以他依然使用逻辑时钟来达到一致性;并且他解决了vector clocks随着client的数量增加而会size膨胀的问题

HybridTime clocks follow similar update rules to Lamport clocks, but the time values are not purely logical: 
each time value has both a logical component, which helps in guaranteeing the same properties as a Lamport Clocks, and a physical component which allows the event to be associated with a physical point-in-time.

 

One of Spanner’s key benefits is that is externally consistent, which is defined as fully linearizable, even in the presence of hidden channels.

Additionally we use the term globally consistent to describe a system which provides the same linearizability semantics, provided that there are no hidden channels present.

区分为,是否有hidden channels?意思是clients间存在因果或先后关系,比如通过发message,都是这个因果关系对于系统是不可知的,比如A write,然后发送消息给B,然后B write,那么因果关系上,B write一定后于A write,但是对于数据库而言他并不知道,所以并不能保证A write先写入;

spanner是可以保证externally consistent, 由于他通过 commit wait,来保证每个transaction之间一定是全局有序的

相对的globally consistent,更简单些,因为并没有hidden channels

 

Spanner’s key innovation is that timestamps assigned by the system can be used to achieve external consistency, but also have physical meaning.

HybridTime is always globally consistent, and through selective application of commit-wait is externally consistent.

Spanner的主要创新在于实现external一致性的,同时还保留了物理时间的

而HybridTime,默认情况下是可以实现globally consistent,即偏序,因为他本身就是使用lamport时钟,并且当你选择commit-wait时,也可以保证externally consistent;

 

HybridTime Assumptions

1. HybridTime assumes that machines have a reasonably accurate physical clock, represented by the PCi(e) function, which outputs the numeric timestamp returned by the physical clock as read by process i for event e, that is able to provide absolute time measurements (usually in milli- or microseconds since 1 January 1970).

2. keeps the physical clocks across different servers synchronized with regard to a reference server, the “reference” time, represented by the PCref(e) function which outputs the numeric timestamp returned by the “reference” process for event e;

Additionally, we assume that such a substrate is able to provide an error bound along with each time measurement, denoted by the Ei(e) function, which outputs the numeric value ε error of process i at the time e occurred

3. we make no assumptions on the actual accuracy of the clocks, i.e. the physical timestamps returned by server’s clocks may have an arbitrarily large but finite error, as long as this error’s bound is known

说白了,假设

有个相对靠谱的物理时钟,一个理想的参照时钟,以及他们之间相差的,error bound

最后,我们并不假设这个error bound会很小,只要有限就可以

假设1,我们有有限的error bound

 

假设2, 进程级别的物理时钟单调递增,注意是进程级别

 

基于以上假设,提出HybridTime Clock and Protocol

 

HybridTime Clock and Protocol

HybridTime clock(HTC) is a pair (physical,logical) where the first component is a representation of the physical time at which the event occurred and the second component is a logical sequence number.

定义其实显而易见。。。

Algorithm 2 depicts the HTC algorithm.

HTC算法如上,两部分,NOW和UPDATE

now就是取当前的HybridTime clock

upate就是根据in 来更新当前的clock

Algorithm 2 implements a Lamport Clock, with the additional advantage that generated timestamps have physical meaning and are accurate representations of physical time within a bound error.

算法本身,其实就是Lamport Clock,只是增加了physical clock的部分

给出一个例子,

 

To order the events timestamped using the HybridTime Clock algorithm we use Definition 1.

Definition 1. HCT(e) < HCT( f ) is defined as the lexicographical ordering of the timestamp two-tuple (physical,logical)

Theorem 1. The HybriTime clock happened-before relation forms a total order of events

Theorem 2. For any event in a causal chain f , the physical component of a HTC timestamp approximates the “real” time the event occurred, with a error defined and bounded by

 

Implementation

No Consistency - In this mode there are no external consistency guarantees, transactions are assigned timestamps from each server’s physical clock and no guarantee is made that reads are consistent or repeatable.

直接用local的物理时间,不保证一致性

HybridTime Consistency - In this mode our implementation guarantees the global consistency as Spanner, absent hidden channels, but using HybridTime instead of commit-wait. 
Clients choosing this consistency mode on writes must make sure that the timestamp that is received from the server is propagated to other servers and/or clients. 
Within the same client process, timestamps are automatically propagated on behalf of the user.

这个其实和逻辑时钟,没有区别,就是保证偏序

Commit-wait Consistency - In this mode our implementation guarantees the same external consistency semantics as Spanner by also using commit-wait in the way described in the original paper. 
However instead of using TrueTime, which is a proprietary and private API, we implemented commit-wait on top of the widely used Network Time Protocol (NTP). Hence, in this consistency mode we support hidden channels.

这个估计没啥用,在没有spanner硬件保证的情况下,commit-wait,性能不能忍吧

时间: 2024-09-20 05:32:21

HybridTime - Accessible Global Consistency with High Clock Uncertainty的相关文章

一个xilinx IP的思考

 http://www.eefocus.com/walkie/blog/09-08/174703_f49d6.html 一个朋友问起了xilinx内部IP的调用以及使用的问题,于是整理了一下,放在这里. 当时的问题是浮点除法器IP可以设置他的延迟从0-28,那么是不是延迟28的时序会更好,因为相当于做了一个28级 的流水.不过相对而言,面积会更大.这是和朋友讨论的最初的结果.但是调用了这个浮点除法器的IP之后,ISE给 出的结果并不是这样.后来才发现是我们只调用了IP,但是忘记在这个IP的前后插

在ISE下分析和约束时序

1.     在ISE下分析和约束时序   3.1   ISE的时序约束工具入门   像TimeQuest一样,ISE软件工具也有自己的时序约束及分析工具.ISE界面的processes当中,有一个user constraints列表,其中的Creat Timing Constrain可以提供用户添加指定的时序约束. ISE使用的时序约束信息跟其他的物理约束,电气约束等信息全部都放置在后缀名为ucf(user constrain file)的文件中,在使用图形化界面编辑约束后,用户还可以直接编辑

quartus verilog-verilog大神请进来看看 小弟跪谢

问题描述 verilog大神请进来看看 小弟跪谢 我用quartusii仿真出现的情况 用Cyclone III仿真出现的严重警告但是 但是用其他器件仿真就不会出现严重警告 请问一下怎么解决 还有为什么用Cyclone III仿真就会出现严重警告 Critical Warning: Synopsys Design Constraints File file not found: 'test.sdc'. A Synopsys Design Constraints File is required

Global.asa 参考(五) - TypeLibrary 声明

参考 ActiveX 组件常常要描述类型库中该组件支持的常量.类型库是一个文件,其中包含有关 ActiveX 组件所支持的对象和类型的信息.如果用户的 Web 应用程序依赖于已在类型库中声明了类型的 ActiveX 对象,就可以在 Global.asa 文件中声明其类型.这样做以后,就可以在应用程序范围内从任何脚本引用已在类型库中声明了的数据类型. 有关在 ASP 中使用常量的详细信息,请参阅"使用变量和常量". 语法<!--METADATA TYPE="TypeLib

PHP中超全局变量$GLOBALS和global的区别

本篇文章分享一下关于PHP中的超全局变量$GLOBALS和global的区别. 一.超全局变量$GLOBALS   PHP超全局变量有很多,如下的都属于超全局变量(Superglobal):   $GLOBALS,$_SERVER,$_GET,$_POST,$_FILES,$_COOKIE,$_SESSION,$_REQUEST,$_ENV.   官方说明: $GLOBALS - 引用全局作用域中可用的全部变量. 一个包含了全部变量的全局组合数组.变量的名字就是数组的键. 即出现过的全局变量,就

global 属性

  返回 Boolean 值,指出正则表达式使用的global 标志 (g) 的状态.默认值为 false.只读. rgExp.global 必选项 rgExp 参数是正则表达式对象. 说明 如果正则表达式设置了global 标志,那么global 属性返回 true,否则返回 false. 使用 global 标志表明在被查找的字符串中搜索操作将查找所有符合的项,而不仅仅是第一个.这也被称为全局匹配. 示例 以下示例演示了 global 属性的用法.如果传递 "g" 到下面所示的函数

什么是Global.asa文件

大家好! Global.asa文件是一个可选文件,在这个文件中,你可以定义事件脚本和使用Session和Application对象.Global.asa文件的内容不能向用户显示,但是它存储的信息能应用于整个应用程序.这个文件必须命名为Global.asa,并且存储在应用程序的启动点的目录下面,一个应用程序只能有一个Global.asa文件. Global.asa文件只能包含以下内容: 1,Application事件. 2,Session事件. 3,<OBJECT>的声明. 4,类库的声明.

Global 对象

对象   是一个固有对象,目的是把所有全局方法集中在一个对象中. Global 对象没有语法.直接调用其方法. 说明 Global 对象从不直接使用,并且不能用 new 运算符创建.它在 Scripting 引擎被初始化时创建,并立即使其方法和属性可用. 属性 Infinity 属性 | NaN 属性 方法 escape 方法 | eval 方法 | isFinite 方法 | isNaN 方法 | parseFloat 方法 | parseInt 方法 | unescape 方法 要求 版本

ASP入门:Global.asa文件技巧用法

首先.asa是文件后缀名,它是Active Server Application的首字母缩写.Global.asa文件可以管理在ASP应用中两个非常苛刻的对象:Application.Session. 它其实是一个可选文件,程序编写者可以在该文件中指定事件脚本,并声明具有会话和应用程序作用域的对象.该文件的内容不是用来给用户显示的,而是用来存储事件信息和由应用程序全局使用的对象.该文件必须存放在应用程序的根目录内.每个应用程序只能有一个Global.asa文件. 关于Global.asa文件最常