Apache Kafka源码分析 - autoLeaderRebalanceEnable

在broker的配置中,auto.leader.rebalance.enable (false)

那么这个leader是如何进行rebalance的?

首先在controller启动的时候会打开一个scheduler,

if (config.autoLeaderRebalanceEnable) { //如果打开outoLeaderRebalance,需要把partiton leader由于dead而发生迁徙的,重新迁徙回去
        info("starting the partition rebalance scheduler")
        autoRebalanceScheduler.startup()
        autoRebalanceScheduler.schedule("partition-rebalance-thread", checkAndTriggerPartitionRebalance,
          5, config.leaderImbalanceCheckIntervalSeconds, TimeUnit.SECONDS)
      }

定期去做,

checkAndTriggerPartitionRebalance

这个函数逻辑,就是找出所有发生过迁移的replica,即

topicsNotInPreferredReplica

并且判断如果满足imbalance比率,即自动触发leader rebalance,将leader迁回perfer replica

关键要理解什么是preferred replicas?

preferredReplicasForTopicsByBrokers =
          controllerContext.partitionReplicaAssignment.filterNot(p => deleteTopicManager.isTopicQueuedUpForDeletion(p._1.topic)).groupBy {
            case(topicAndPartition, assignedReplicas) => assignedReplicas.head
          }
 partitionReplicaAssignment: mutable.Map[TopicAndPartition, Seq[Int]] 
TopicAndPartition可以通过topic name和partition id来唯一标识一个partition,Seq[int],表示brokerids,表明这个partition的replicas在哪些brokers上面

从partition的ReplicaAssignment里面过滤掉delete的topic,然后按照assignedReplicas.head进行groupby,就是按照Seq中的第一个brokerid 
意思就是说,默认每个partition的preferred replica就是第一个被assign的replica

groupby的结果就是,每个broker,和应该以该broker作为leader的所有partition,即

case(leaderBroker, topicAndPartitionsForBroker)

那么找出里面当前leader不是preferred的,即发生过迁移的, 
很简单,直接和leaderAndIsr里面的leader进行比较,如果不相等就说明发生过迁徙

topicsNotInPreferredReplica =
              topicAndPartitionsForBroker.filter {
                case(topicPartition, replicas) => {
                  controllerContext.partitionLeadershipInfo.contains(topicPartition) &&
                  controllerContext.partitionLeadershipInfo(topicPartition).leaderAndIsr.leader != leaderBroker
                }
              }

并且只有当某个broker上的imbalanceRatio大于10%的时候,才会触发rebalance

imbalanceRatio = totalTopicPartitionsNotLedByBroker.toDouble / totalTopicPartitionsForBroker

对每个partition的迁移过程, 
首先preferred的broker要是活着的,并且当前是没有partition正在进行reassign或replica election的,说明这个过程是不能并行的,同时做reassign很容易冲突

// do this check only if the broker is live and there are no partitions being reassigned currently
                  // and preferred replica election is not in progress
                  if (controllerContext.liveBrokerIds.contains(leaderBroker) &&
                      controllerContext.partitionsBeingReassigned.size == 0 &&
                      controllerContext.partitionsUndergoingPreferredReplicaElection.size == 0 &&
                      !deleteTopicManager.isTopicQueuedUpForDeletion(topicPartition.topic) &&
                      controllerContext.allTopics.contains(topicPartition.topic)) {
                    onPreferredReplicaElection(Set(topicPartition), true)

onPreferredReplicaElection

还是通过partitionStateMachine,来改变partition的状态

partitionStateMachine.handleStateChanges(partitions, OnlinePartition, preferredReplicaPartitionLeaderSelector)

partitionStateMachine会另外分析,这里只需要知道,当前partition的状态是,OnlinePartition –> OnlinePartition 
并且是以preferredReplicaPartitionLeaderSelector,作为leaderSelector的策略

 

PreferredReplicaPartitionLeaderSelector

策略很简单,就是把leader换成preferred replica

def selectLeader(topicAndPartition: TopicAndPartition, currentLeaderAndIsr: LeaderAndIsr): (LeaderAndIsr, Seq[Int]) = {
    val assignedReplicas = controllerContext.partitionReplicaAssignment(topicAndPartition)
    val preferredReplica = assignedReplicas.head  //取AR第一个replica作为preferred
    // check if preferred replica is the current leader
    val currentLeader = controllerContext.partitionLeadershipInfo(topicAndPartition).leaderAndIsr.leader
    if (currentLeader == preferredReplica) { //如果当前leader就是preferred就不需要做了
      throw new LeaderElectionNotNeededException("Preferred replica %d is already the current leader for partition %s" .format(preferredReplica, topicAndPartition))
    } else {
      info("Current leader %d for partition %s is not the preferred replica.".format(currentLeader, topicAndPartition) + " Trigerring preferred replica leader election")
      // check if preferred replica is not the current leader and is alive and in the isr
      if (controllerContext.liveBrokerIds.contains(preferredReplica) && currentLeaderAndIsr.isr.contains(preferredReplica)) { //判断当前preferred replica所在broker是否活,是否在isr中
        (new LeaderAndIsr(preferredReplica, currentLeaderAndIsr.leaderEpoch + 1, currentLeaderAndIsr.isr, currentLeaderAndIsr.zkVersion + 1), assignedReplicas) //产生新的leaderAndIsr
      } else {
        throw new StateChangeFailedException("Preferred replica %d for partition ".format(preferredReplica) +
          "%s is either not alive or not in the isr. Current leader and ISR: [%s]".format(topicAndPartition, currentLeaderAndIsr))
      }
    }
  }
}

2015-10-27
时间: 2024-07-28 14:41:48

Apache Kafka源码分析 - autoLeaderRebalanceEnable的相关文章

Apache Kafka源码分析 – Broker Server

1. Kafka.scala 在Kafka的main入口中startup KafkaServerStartable, 而KafkaServerStartable这是对KafkaServer的封装 1: val kafkaServerStartble = new KafkaServerStartable(serverConfig) 2: kafkaServerStartble.startup   1: package kafka.server 2: class KafkaServerStartab

Apache Kafka源码分析 - kafka controller

前面已经分析过kafka server的启动过程,以及server所能处理的所有的request,即KafkaApis  剩下的,其实关键就是controller,以及partition和replica的状态机  这里先看看controller在broker server的基础上,多做了哪些初始化和failover的工作   最关键的一句, private val controllerElector = new ZookeeperLeaderElector(controllerContext,

Apache Kafka源码分析 – Log Management

LogManager LogManager会管理broker上所有的logs(在一个log目录下),一个topic的一个partition对应于一个log(一个log子目录) 首先loadLogs会加载每个partition所对应的log对象, 然后提供createLog,getLog,deleteLog之类的管理接口 并且会创建些后台线程来进行,cleanup,flush,checkpoint生成之类的工作 Log Log只是对于LogSegments的封装,包含loadSegments,ap

Apache Kafka源码分析 - KafkaApis

kafka apis反映出kafka broker server可以提供哪些服务, broker server主要和producer,consumer,controller有交互,搞清这些api就清楚了broker server的所有行为   handleOffsetRequest 提供对offset的查询的需求,比如查询earliest,latest offset是什么,或before某个时间戳的offset是什么 try { // ensure leader exists // 确定是否是l

Apache Kafka源码分析 – Controller

One of the brokers is elected as the controller for the whole cluster. It will be responsible for: Leadership change of a partition (each leader can independently update ISR) New topics; deleted topics Replica re-assignment After the controller makes

Apache Kafka源码分析 - ReplicaStateMachine

startup 在onControllerFailover中被调用, /** * Invoked on successful controller election. First registers a broker change listener since that triggers all * state transitions for replicas. Initializes the state of replicas for all partitions by reading fro

apache fileupload源码分析

文件上传格式 先来看下含有文件上传时的表单提交是怎样的格式 <form action="/upload/request" method="POST" enctype="multipart/form-data" id="requestForm"> <input type="file" name="myFile"> <input type="text&

Kafka源码分析之RecordAccumulator

        RecordAccumulator作为一个队列,累积记录records到MemoryRecords实例,然后被发送到服务器server.其成员变量如下: // RecordAccumulator是否关闭的标志位closed private volatile boolean closed; // 索引号drainIndex private int drainIndex; // flushes过程计数器flushesInProgress private final AtomicInt

Kafka源码分析之Sender

        Sender为处理发送produce请求至Kafka集群的后台线程.这个线程更新集群元数据,然后发送produce请求至适当的节点.         首先,我们先看下它的成员变量: /* the state of each nodes connection */ // 每个节点连接的状态KafkaClient实例client private final KafkaClient client; /* the record accumulator that batches recor