scan的调用代码示例如下:
// 创建HBase配置config Configuration config = HBaseConfiguration.create(); config.set("hbase.zookeeper.quorum", "192.168.1.226");// zookeeper部署的服务器IP config.set("hbase.zookeeper.property.clientPort", "2181");// zookeeper允许连接的客户端端口号 // 定义HBase连接 Connection connection = null; try { // 获取HBase数据库连接 connection = ConnectionFactory.createConnection(config); // 输出连接建立结果 while (!connection.isClosed()) { break; } // 获取HBase数据库表 Table table = connection.getTable(TableName.valueOf("test_table")); // 构造Scan实例 Scan scan = new Scan(); scan.addColumn(Bytes.toBytes("family1"), Bytes.toBytes("cloumn1")); // 获取查询结果 ResultScanner result = table.getScanner(scan); // 解析查询结果 for (Result r : result) { // 此处为处理Result的代码 byte[] row = r.getRow(); if(row.length == 0){ //... } } result.close(); table.close(); table = null; scan = null; } catch (IOException e) { e.printStackTrace(); } finally { if (connection != null) { try { connection.close(); } catch (IOException e) { e.printStackTrace(); } } }
下面,我们对scan的整个流程进行分析。
首先从Table的getScanner(Scan scan)方法入手,它的定义如下:
/** * Returns a scanner on the current table as specified by the {@link Scan} * object. * Note that the passed {@link Scan}'s start row and caching properties * maybe changed. * * @param scan A configured {@link Scan} object. * @return A scanner. * @throws IOException if a remote or network exception occurs. * @since 0.20.0 */ ResultScanner getScanner(Scan scan) throws IOException;
它的实现是由HTable来完成的,源码如下:
/** * The underlying {@link HTable} must not be closed. * {@link HTableInterface#getScanner(Scan)} has other usage details. * * HBase中scan的入口方法 */ @Override public ResultScanner getScanner(final Scan scan) throws IOException { // small scan不可以设置batch if (scan.getBatch() > 0 && scan.isSmall()) { throw new IllegalArgumentException("Small scan should not be used with batching"); } // 设置caching // 取参数“hbase.client.scanner.caching”,如果参数未配置,则默认为100 if (scan.getCaching() <= 0) { scan.setCaching(getScannerCaching()); } // 取参数“HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE_KEY”,如果参数未配置,则默认为2 * 1024 * 1024 if (scan.getMaxResultSize() <= 0) { scan.setMaxResultSize(scannerMaxResultSize); } /** * scan总共分为四种类型: * 1、reversed、small--ClientSmallReversedScanner * 2、reversed、big--ReversedClientScanner * 3、notReversed、small--ClientSmallScanner * 4、notReversed、big--ClientScanner */ if (scan.isReversed()) {// 反向扫描 if (scan.isSmall()) { return new ClientSmallReversedScanner(getConfiguration(), scan, getName(), this.connection, this.rpcCallerFactory, this.rpcControllerFactory, pool, tableConfiguration.getReplicaCallTimeoutMicroSecondScan()); } else { return new ReversedClientScanner(getConfiguration(), scan, getName(), this.connection, this.rpcCallerFactory, this.rpcControllerFactory, pool, tableConfiguration.getReplicaCallTimeoutMicroSecondScan()); } } if (scan.isSmall()) { return new ClientSmallScanner(getConfiguration(), scan, getName(), this.connection, this.rpcCallerFactory, this.rpcControllerFactory, pool, tableConfiguration.getReplicaCallTimeoutMicroSecondScan()); } else { return new ClientScanner(getConfiguration(), scan, getName(), this.connection, this.rpcCallerFactory, this.rpcControllerFactory, pool, tableConfiguration.getReplicaCallTimeoutMicroSecondScan()); } }
这里,我们先只研究ClientScanner,其他三种以后再说。
/** * Create a new ClientScanner for the specified table Note that the passed {@link Scan}'s start * row maybe changed changed. * @param conf The {@link Configuration} to use. * @param scan {@link Scan} to use in this scanner * @param tableName The table that we wish to scan * @param connection Connection identifying the cluster * @throws IOException */ public ClientScanner(final Configuration conf, final Scan scan, final TableName tableName, ClusterConnection connection, RpcRetryingCallerFactory rpcFactory, RpcControllerFactory controllerFactory, ExecutorService pool, int primaryOperationTimeout) throws IOException { if (LOG.isTraceEnabled()) { LOG.trace("Scan table=" + tableName + ", startRow=" + Bytes.toStringBinary(scan.getStartRow())); } // 设置scan、tableName等成员变量 this.scan = scan; this.tableName = tableName; this.lastNext = System.currentTimeMillis(); this.connection = connection; this.pool = pool; this.primaryOperationTimeout = primaryOperationTimeout; // 重试次数,取参数“hbase.client.retries.number”,如果参数未配置,则默认为31 this.retries = conf.getInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, HConstants.DEFAULT_HBASE_CLIENT_RETRIES_NUMBER); if (scan.getMaxResultSize() > 0) { this.maxScannerResultSize = scan.getMaxResultSize(); } else { this.maxScannerResultSize = conf.getLong( HConstants.HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE_KEY, HConstants.DEFAULT_HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE); } // scanner超时时间 this.scannerTimeout = HBaseConfiguration.getInt(conf, HConstants.HBASE_CLIENT_SCANNER_TIMEOUT_PERIOD, HConstants.HBASE_REGIONSERVER_LEASE_PERIOD_KEY, HConstants.DEFAULT_HBASE_CLIENT_SCANNER_TIMEOUT_PERIOD); // check if application wants to collect scan metrics initScanMetrics(scan); // Use the caching from the Scan. If not set, use the default cache setting for this table. // 处理caching if (this.scan.getCaching() > 0) { this.caching = this.scan.getCaching(); } else { this.caching = conf.getInt( HConstants.HBASE_CLIENT_SCANNER_CACHING, HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING); } // 初始化caller // caller为RpcRetryingCaller类型 this.caller = rpcFactory.<Result[]> newCaller(); // rpcControllerFactory为RpcControllerFactory类型 this.rpcControllerFactory = controllerFactory; this.conf = conf; // 初始化scanner initializeScannerInConstruction(); }
ClientScanner的构造方法中,首先是对各种成员变量赋值,比如scan、tableName、connection等,然后是处理maxScannerResultSize、scannerTimeout、caching等scan需要用到的各种参数,这些都没有什么好说的。
接下来,是初始化两个重要的变量caller和rpcControllerFactory,caller为RpcRetryingCaller类型的,rpcControllerFactory为RpcControllerFactory类型的。
最后调用initializeScannerInConstruction()方法,ok,我们也跟着继续。
protected void initializeScannerInConstruction() throws IOException{ // initialize the scanner // 初始化scanner nextScanner(this.caching, false); }
紧接着,调用nextScanner()方法,注意,传入两个参数,一个是ClientScanner对象生成时的caching,另外一个是false。
这个caching,如果在构造Scan对象时没有设置,则取参数hbase.client.scanner.caching配置的值,参数未配置则默认为100,它的含义是每次RPC请求的最大行数。
继续追踪nextScanner()方法,完整的代码如下:
/* * Gets a scanner for the next region. If this.currentRegion != null, then * we will move to the endrow of this.currentRegion. Else we will get * scanner at the scan.getStartRow(). We will go no further, just tidy * up outstanding scanners, if <code>currentRegion != null</code> and * <code>done</code> is true. * @param nbRows * @param done Server-side says we're done scanning. */ protected boolean nextScanner(int nbRows, final boolean done) throws IOException { // Close the previous scanner if it's open // 关闭之前打开的scanner // 第一次调用时,callable因为没有初始化,所以肯定是空的,为null,此处会跳过 // 什么时候callable被赋值,而什么时候callable又被清空呢? // 关闭上一个callable if (this.callable != null) { // 调用setClose()方法将callable中的currentScannerCallable的closed设置为true // 将ScannerCallableWithReplicas类型的callable中ScannerCallable类型的成员变量中的closed设置为true this.callable.setClose(); // 调用call()方法,发起一次请求,此时callable不为空,且其currentScannerCallable中closed为true // 最终调用其currentScannerCallable的call()方法,因为closed为true,只执行其close()方法 call(scan, callable, caller, scannerTimeout); // 将callable设置为null,下次就不会执行该模块了 this.callable = null; } // Where to start the next scanner // 从哪里开始下一个scanner byte [] localStartKey; // if we're at end of table, close and return false to stop iterating // 如果我们处于表的末尾,关闭并返回false,以停止此次迭代 if (this.currentRegion != null) { byte [] endKey = this.currentRegion.getEndKey(); // done,或者endKey为空,或者endKey>=stopRow if (endKey == null || Bytes.equals(endKey, HConstants.EMPTY_BYTE_ARRAY) || checkScanStopRow(endKey) || done) { // 关闭 close(); if (LOG.isTraceEnabled()) { LOG.trace("Finished " + this.currentRegion); } // 返回false return false; } // localStartKey设置为当前Region的endKey localStartKey = endKey; if (LOG.isTraceEnabled()) { LOG.trace("Finished " + this.currentRegion); } } else { localStartKey = this.scan.getStartRow(); } if (LOG.isDebugEnabled() && this.currentRegion != null) { // Only worth logging if NOT first region in scan. LOG.debug("Advancing internal scanner to startKey at '" + Bytes.toStringBinary(localStartKey) + "'"); } try { // callable为ScannerCallableWithReplicas类型的 // 获取一个新的callable callable = getScannerCallable(localStartKey, nbRows); // Open a scanner on the region server starting at the // beginning of the region // 在localStartKey所在Region的RegionServer上打开一个scanner call(scan, callable, caller, scannerTimeout); // 设置currentRegion this.currentRegion = callable.getHRegionInfo(); if (this.scanMetrics != null) { this.scanMetrics.countOfRegions.incrementAndGet(); } } catch (IOException e) { close(); throw e; } return true; }
我们一步步分析,由于ClientScanner在构造时,并没有初始化callable成员变量,所以它必定为null,第一部分代码略过,我们以后再讲。
接下来,由于currentRegion也没有被初始化,所以,程序走的是else分支,也就是,将localStartKey设置为scan的startRow。
localStartKey = this.scan.getStartRow();
紧接着,调用getScannerCallable()方法为成员变量callable赋值,入参为行localStartKey和行数nbRows,这个nbRows即为ClientScanner构造时的caching值。方法定义如下:
@InterfaceAudience.Private protected ScannerCallableWithReplicas getScannerCallable(byte [] localStartKey, int nbRows) { // 设置scan的startRow scan.setStartRow(localStartKey); // 构造ScannerCallable类型的对象s,将其作为ScannerCallableWithReplicas类型的sr的成员变量currentScannerCallable ScannerCallable s = new ScannerCallable(getConnection(), getTable(), scan, this.scanMetrics, this.rpcControllerFactory); // 设置caching s.setCaching(nbRows); // 构造ScannerCallableWithReplicas类型的对象sr,其成员变量currentScannerCallable为上面的ScannerCallable类型的对象s ScannerCallableWithReplicas sr = new ScannerCallableWithReplicas(tableName, getConnection(), s, pool, primaryOperationTimeout, scan, retries, scannerTimeout, caching, conf, caller); return sr; }
首先,把localStartKey设置为scan的startRow,后续每次迭代处理时,我们就能知道scan的起始行。
然后,生成一个ScannerCallable类型的对象s,这个s是要作为ScannerCallableWithReplicas类型的sr的成员变量currentScannerCallable来使用的,实际与Region所在RegionServer通信的正是这个对象,并且,这个ScannerCallable对象中有一个byte[]类型的row成员变量,它会被初始化为scan的startRow,被用来进行Region的定位和其行号的定位。
构造ScannerCallableWithReplicas类型的对象sr并返回,其成员变量currentScannerCallable为上面的ScannerCallable类型的对象s。
getScannerCallable()方法执行完后,紧接着会调用ClientScanner的call()方法,代码如下:
Result[] call(Scan scan, ScannerCallableWithReplicas callable, RpcRetryingCaller<Result[]> caller, int scannerTimeout) throws IOException, RuntimeException { if (Thread.interrupted()) { throw new InterruptedIOException(); } // callWithoutRetries is at this layer. Within the ScannerCallableWithReplicas, // we do a callWithRetries // caller为RpcRetryingCaller类型 // callable为ScannerCallableWithReplicas类型 return caller.callWithoutRetries(callable, scannerTimeout); }
之前我们已知道,caller为RpcRetryingCaller类型,它的方法定义如下:
/** * Call the server once only. * 仅仅调用server一次 * {@link RetryingCallable} has a strange shape so we can do retrys. Use this invocation if you * want to do a single call only (A call to {@link RetryingCallable#call(int)} will not likely * succeed). * @return an object of type T * @throws IOException if a remote or network exception occurs * @throws RuntimeException other unspecified error */ public T callWithoutRetries(RetryingCallable<T> callable, int callTimeout) throws IOException, RuntimeException { // The code of this method should be shared with withRetries. this.globalStartTime = EnvironmentEdgeManager.currentTime(); try { // 先调用prepare()方法,再调用call()方法,超时时间为callTimeout callable.prepare(false); return callable.call(callTimeout); } catch (Throwable t) { Throwable t2 = translateException(t); ExceptionUtil.rethrowIfInterrupt(t2); // It would be nice to clear the location cache here. if (t2 instanceof IOException) { throw (IOException)t2; } else { throw new RuntimeException(t2); } } }
由代码可以看出,它实际上的处理流程为先调用callable的prepare()方法,再调用callable的call()方法。
接下来,我们转入callable的分析,这个callable为ScannerCallableWithReplicas类型,它的prepare()方法为空,我们重点分析call()方法,代码如下:
@Override public Result [] call(int timeout) throws IOException { // If the active replica callable was closed somewhere, invoke the RPC to // really close it. In the case of regular scanners, this applies. We make couple // of RPCs to a RegionServer, and when that region is exhausted, we set // the closed flag. Then an RPC is required to actually close the scanner. // 第一次调用时,该段代码会被忽略 if (currentScannerCallable != null && currentScannerCallable.closed) { // For closing we target that exact scanner (and not do replica fallback like in // the case of normal reads) if (LOG.isTraceEnabled()) { LOG.trace("Closing scanner id=" + currentScannerCallable.scannerId); } // 最终调用close()方法 Result[] r = currentScannerCallable.call(timeout); currentScannerCallable = null; return r; } // We need to do the following: //1. When a scan goes out to a certain replica (default or not), we need to // continue to hit that until there is a failure. So store the last successfully invoked // replica //2. We should close the "losing" scanners (scanners other than the ones we hear back // from first) // 我们需要做以下事情: //1、当一个scan扫描到一个特定的副本(无论默认与否)。我们需要继续执行,直到有一个错误。然后存储上一个成功的副本。 //2、我们应该关闭这个losing scanners // 根据scan的startRow获取Region位置,使用cache RegionLocations rl = RpcRetryingCallerWithReadReplicas.getRegionLocations(true, RegionReplicaUtil.DEFAULT_REPLICA_ID, cConnection, tableName, currentScannerCallable.getRow()); // allocate a boundedcompletion pool of some multiple of number of replicas. // We want to accomodate some RPCs for redundant replica scans (but are still in progress) // 分配一个有界操作的包含一些副本的池 // 构造一个BoundedCompletionService类型的数据结构cs // cs中包含需要执行的task、已经完成的task和线程池executor,并提供了submit、poll、take等方法 BoundedCompletionService<Pair<Result[], ScannerCallable>> cs = new BoundedCompletionService<Pair<Result[], ScannerCallable>>(pool, rl.size() * 5); List<ExecutionException> exceptions = null; int submitted = 0, completed = 0; AtomicBoolean done = new AtomicBoolean(false); replicaSwitched.set(false); // submit call for the primary replica. // 提交请求至the primary replica // 提交task,并添加到outstandingCallables中 // task为用this.currentScannerCallable封装成的RetryingRPC对象 submitted += addCallsForCurrentReplica(cs, rl); try { // wait for the timeout to see whether the primary responds back // 从cs的BlockingQueue<Future<V>>类型的completed中获取任务完成情况 Future<Pair<Result[], ScannerCallable>> f = cs.poll(timeBeforeReplicas, TimeUnit.MICROSECONDS); // Yes, microseconds if (f != null) { Pair<Result[], ScannerCallable> r = f.get(); if (r != null && r.getSecond() != null) { updateCurrentlyServingReplica(r.getSecond(), r.getFirst(), done, pool); } return r == null ? null : r.getFirst(); //great we got a response } } catch (ExecutionException e) { // the primary call failed with RetriesExhaustedException or DoNotRetryIOException // but the secondaries might still succeed. Continue on the replica RPCs. exceptions = new ArrayList<ExecutionException>(rl.size()); exceptions.add(e); completed++; } catch (CancellationException e) { throw new InterruptedIOException(e.getMessage()); } catch (InterruptedException e) { throw new InterruptedIOException(e.getMessage()); } // submit call for the all of the secondaries at once // TODO: this may be an overkill for large region replication submitted += addCallsForOtherReplicas(cs, rl, 0, rl.size() - 1); try { while (completed < submitted) { try { Future<Pair<Result[], ScannerCallable>> f = cs.take(); Pair<Result[], ScannerCallable> r = f.get(); if (r != null && r.getSecond() != null) { updateCurrentlyServingReplica(r.getSecond(), r.getFirst(), done, pool); } return r == null ? null : r.getFirst(); // great we got an answer } catch (ExecutionException e) { // if not cancel or interrupt, wait until all RPC's are done // one of the tasks failed. Save the exception for later. if (exceptions == null) exceptions = new ArrayList<ExecutionException>(rl.size()); exceptions.add(e); completed++; } } } catch (CancellationException e) { throw new InterruptedIOException(e.getMessage()); } catch (InterruptedException e) { throw new InterruptedIOException(e.getMessage()); } finally { // We get there because we were interrupted or because one or more of the // calls succeeded or failed. In all case, we stop all our tasks. cs.cancelAll(true); } if (exceptions != null && !exceptions.isEmpty()) { RpcRetryingCallerWithReadReplicas.throwEnrichedException(exceptions.get(0), retries); // just rethrow the first exception for now. } return null; // unreachable }
又是一个比较长的方法,我们还是一步步分析。
首先,currentScannerCallable虽然已被初始化,但是它的closed还是为false,那么方法的第一块代码会被跳过。
接下来,调用RpcRetryingCallerWithReadReplicas的getRegionLocations()方法,利用cConnection、tableName和currentScannerCallable.getRow()定位Region,得到Region的位置RegionLocations类型的rl。
然后,构造一个BoundedCompletionService类型的数据结构cs,cs中包含需要执行的task、已经完成的task和线程池executor,并提供了submit、poll、take等方法。依靠它,利用线程池完成任务的调度与执行,并同步获取执行结果。
addCallsForCurrentReplica方法就是实现上述逻辑的方法,代码如下:
private int addCallsForCurrentReplica( BoundedCompletionService<Pair<Result[], ScannerCallable>> cs, RegionLocations rl) { RetryingRPC retryingOnReplica = new RetryingRPC(currentScannerCallable); outstandingCallables.add(currentScannerCallable); cs.submit(retryingOnReplica); return 1; }
它将currentScannerCallable封装出成为一个RetryingRPC对象,提交到cs中执行,并添加到数据结构outstandingCallables中。
我们知道,这个currentScannerCallable对象是ScannerCallable类型的,它被线程池调度执行时,依靠call()方法完成业务逻辑,call方法定义如下:
@Override public Result [] call(int callTimeout) throws IOException { if (Thread.interrupted()) { throw new InterruptedIOException(); } if (closed) { if (scannerId != -1) { close(); } } else { if (scannerId == -1L) { this.scannerId = openScanner(); } else { Result [] rrs = null; ScanRequest request = null; try { incRPCcallsMetrics(); request = RequestConverter.buildScanRequest(scannerId, caching, false, nextCallSeq); ScanResponse response = null; PayloadCarryingRpcController controller = controllerFactory.newController(); controller.setPriority(getTableName()); controller.setCallTimeout(callTimeout); try { response = getStub().scan(controller, request); // Client and RS maintain a nextCallSeq number during the scan. Every next() call // from client to server will increment this number in both sides. Client passes this // number along with the request and at RS side both the incoming nextCallSeq and its // nextCallSeq will be matched. In case of a timeout this increment at the client side // should not happen. If at the server side fetching of next batch of data was over, // there will be mismatch in the nextCallSeq number. Server will throw // OutOfOrderScannerNextException and then client will reopen the scanner with startrow // as the last successfully retrieved row. // See HBASE-5974 nextCallSeq++; long timestamp = System.currentTimeMillis(); // Results are returned via controller CellScanner cellScanner = controller.cellScanner(); rrs = ResponseConverter.getResults(cellScanner, response); if (logScannerActivity) { long now = System.currentTimeMillis(); if (now - timestamp > logCutOffLatency) { int rows = rrs == null ? 0 : rrs.length; LOG.info("Took " + (now-timestamp) + "ms to fetch " + rows + " rows from scanner=" + scannerId); } } // moreResults is only used for the case where a filter exhausts all elements if (response.hasMoreResults() && !response.getMoreResults()) { scannerId = -1L; closed = true; // Implied that no results were returned back, either. return null; } // moreResultsInRegion explicitly defines when a RS may choose to terminate a batch due // to size or quantity of results in the response. if (response.hasMoreResultsInRegion()) { // Set what the RS said setHasMoreResultsContext(true); setServerHasMoreResults(response.getMoreResultsInRegion()); } else { // Server didn't respond whether it has more results or not. setHasMoreResultsContext(false); } } catch (ServiceException se) { throw ProtobufUtil.getRemoteException(se); } updateResultsMetrics(rrs); } catch (IOException e) { if (logScannerActivity) { LOG.info("Got exception making request " + TextFormat.shortDebugString(request) + " to " + getLocation(), e); } IOException ioe = e; if (e instanceof RemoteException) { ioe = RemoteExceptionHandler.decodeRemoteException((RemoteException)e); } if (logScannerActivity && (ioe instanceof UnknownScannerException)) { try { HRegionLocation location = getConnection().relocateRegion(getTableName(), scan.getStartRow()); LOG.info("Scanner=" + scannerId + " expired, current region location is " + location.toString()); } catch (Throwable t) { LOG.info("Failed to relocate region", t); } } // The below convertion of exceptions into DoNotRetryExceptions is a little strange. // Why not just have these exceptions implment DNRIOE you ask? Well, usually we want // ServerCallable#withRetries to just retry when it gets these exceptions. In here in // a scan when doing a next in particular, we want to break out and get the scanner to // reset itself up again. Throwing a DNRIOE is how we signal this to happen (its ugly, // yeah and hard to follow and in need of a refactor). if (ioe instanceof NotServingRegionException) { // Throw a DNRE so that we break out of cycle of calling NSRE // when what we need is to open scanner against new location. // Attach NSRE to signal client that it needs to re-setup scanner. if (this.scanMetrics != null) { this.scanMetrics.countOfNSRE.incrementAndGet(); } throw new DoNotRetryIOException("Resetting the scanner -- see exception cause", ioe); } else if (ioe instanceof RegionServerStoppedException) { // Throw a DNRE so that we break out of cycle of the retries and instead go and // open scanner against new location. throw new DoNotRetryIOException("Resetting the scanner -- see exception cause", ioe); } else { // The outer layers will retry throw ioe; } } return rrs; } } return null; }
进入该方法时,closed由于为false,所以执行else分支,并且,由于scannerId未被赋值,所以它最终执行的是openScanner()方法,打开一个Scanner,并将ID赋值给scannerId。
接下来,cs.poll()方法的调用,则是从cs的BlockingQueue<Future<V>>类型的completed中获取任务完成情况l。
然后,调用updateCurrentlyServingReplica()方法,
至此,整个Table.getScanner()方法分析完毕。总结起来,它主要完成了ResultScanner的初始化工作,并未真正请求数据。同时,它还做了以下几件事情:
1、生成一个实现了ResultScanner接口的对象,一般为ClientScanner(其他三个类型以后再分析);
2、ClientScanner对象中callable被赋值,
接下来,我们继续分析之后的工作。在完成了ResultScanner的初始化后,数据是如何获取的呢?
我们知道,ResultScanner继承自Iterable接口,那么在其实现ClientScanner的抽象父类AbstractClientScanner类中,定义了iterator()方法的实现,主要是通过hasNext()和next()方法来完成遍历的。那我们就来看下ClientScanner的next()方法,代码如下:
@Override public Result next() throws IOException { // If the scanner is closed and there's nothing left in the cache, next is a no-op. if (cache.size() == 0 && this.closed) { return null; } if (cache.size() == 0) { loadCache(); } if (cache.size() > 0) { return cache.poll(); } // if we exhausted this scanner before calling close, write out the scan metrics writeScanMetrics(); return null; }
非常简单,如果scanner的closed为true,并且cache没有数据,则直接返回null,如果closed为false,并且cache没有数据,那么通过loadCache()方法加载数据,然后其他的情况是,只要cache有数据,则直接返回。
在这里,我们就能知道,scan实际上并不是把数据直接全部加载到客户端的,而是在用到的时候才去取。这么做有什么好处呢?
1、避免全部数据尤其是大数据量的情况下全部缓存在客户端,造成客户端的压力;
2、如果等到全部数据都获取后才可以用,客户端的IO、集群网络IO会在数据获取阶段居高不下,延迟较高,而且不如这种类似懒加载的机制,数据边用边效果好的多。
我们继续分析loadCache()方法。
/** * Contact the servers to load more {@link Result}s in the cache. */ protected void loadCache() throws IOException { Result[] values = null; long remainingResultSize = maxScannerResultSize; int countdown = this.caching; // We need to reset it if it's a new callable that was created // with a countdown in nextScanner callable.setCaching(this.caching); // This flag is set when we want to skip the result returned. We do // this when we reset scanner because it split under us. boolean skipFirst = false; boolean retryAfterOutOfOrderException = true; // We don't expect that the server will have more results for us if // it doesn't tell us otherwise. We rely on the size or count of results boolean serverHasMoreResults = false; do { try { if (skipFirst) { // Skip only the first row (which was the last row of the last // already-processed batch). callable.setCaching(1); values = call(scan, callable, caller, scannerTimeout); // When the replica switch happens, we need to do certain operations // again. The scannercallable will openScanner with the right startkey // but we need to pick up from there. Bypass the rest of the loop // and let the catch-up happen in the beginning of the loop as it // happens for the cases where we see exceptions. Since only openScanner // would have happened, values would be null if (values == null && callable.switchedToADifferentReplica()) { if (this.lastResult != null) { //only skip if there was something read earlier skipFirst = true; } this.currentRegion = callable.getHRegionInfo(); continue; } callable.setCaching(this.caching); skipFirst = false; } // Server returns a null values if scanning is to stop. Else, // returns an empty array if scanning is to go on and we've just // exhausted current region. values = call(scan, callable, caller, scannerTimeout); if (skipFirst && values != null && values.length == 1) { skipFirst = false; // Already skipped, unset it before scanning again values = call(scan, callable, caller, scannerTimeout); } // When the replica switch happens, we need to do certain operations // again. The callable will openScanner with the right startkey // but we need to pick up from there. Bypass the rest of the loop // and let the catch-up happen in the beginning of the loop as it // happens for the cases where we see exceptions. Since only openScanner // would have happened, values would be null if (values == null && callable.switchedToADifferentReplica()) { if (this.lastResult != null) { //only skip if there was something read earlier skipFirst = true; } this.currentRegion = callable.getHRegionInfo(); continue; } retryAfterOutOfOrderException = true; } catch (DoNotRetryIOException e) { // DNRIOEs are thrown to make us break out of retries. Some types of DNRIOEs want us // to reset the scanner and come back in again. if (e instanceof UnknownScannerException) { long timeout = lastNext + scannerTimeout; // If we are over the timeout, throw this exception to the client wrapped in // a ScannerTimeoutException. Else, it's because the region moved and we used the old // id against the new region server; reset the scanner. if (timeout < System.currentTimeMillis()) { LOG.info("For hints related to the following exception, please try taking a look at: " + "https://hbase.apache.org/book.html#trouble.client.scantimeout"); long elapsed = System.currentTimeMillis() - lastNext; ScannerTimeoutException ex = new ScannerTimeoutException(elapsed + "ms passed since the last invocation, " + "timeout is currently set to " + scannerTimeout); ex.initCause(e); throw ex; } } else { // If exception is any but the list below throw it back to the client; else setup // the scanner and retry. Throwable cause = e.getCause(); if ((cause != null && cause instanceof NotServingRegionException) || (cause != null && cause instanceof RegionServerStoppedException) || e instanceof OutOfOrderScannerNextException) { // Pass // It is easier writing the if loop test as list of what is allowed rather than // as a list of what is not allowed... so if in here, it means we do not throw. } else { throw e; } } // Else, its signal from depths of ScannerCallable that we need to reset the scanner. if (this.lastResult != null) { // The region has moved. We need to open a brand new scanner at // the new location. // Reset the startRow to the row we've seen last so that the new // scanner starts at the correct row. Otherwise we may see previously // returned rows again. // (ScannerCallable by now has "relocated" the correct region) this.scan.setStartRow(this.lastResult.getRow()); // Skip first row returned. We already let it out on previous // invocation. skipFirst = true; } if (e instanceof OutOfOrderScannerNextException) { if (retryAfterOutOfOrderException) { retryAfterOutOfOrderException = false; } else { // TODO: Why wrap this in a DNRIOE when it already is a DNRIOE? throw new DoNotRetryIOException("Failed after retry of " + "OutOfOrderScannerNextException: was there a rpc timeout?", e); } } // Clear region. this.currentRegion = null; // Set this to zero so we don't try and do an rpc and close on remote server when // the exception we got was UnknownScanner or the Server is going down. callable = null; // This continue will take us to while at end of loop where we will set up new scanner. continue; } // 取当前时间 long currentTime = System.currentTimeMillis(); // 更新指标信息 if (this.scanMetrics != null) { this.scanMetrics.sumOfMillisSecBetweenNexts.addAndGet(currentTime - lastNext); } // 更新lastNext lastNext = currentTime; // 遍历values,将Result添加到cache中,并累减remainingResultSize、countdown,记录lastResult if (values != null && values.length > 0) { for (Result rs : values) { cache.add(rs); // We don't make Iterator here for (Cell cell : rs.rawCells()) { remainingResultSize -= CellUtil.estimatedHeapSizeOf(cell); } countdown--; this.lastResult = rs; } } // We expect that the server won't have more results for us when we exhaust // the size (bytes or count) of the results returned. If the server *does* inform us that // there are more results, we want to avoid possiblyNextScanner(...). Only when we actually // get results is the moreResults context valid. if (null != values && values.length > 0 && callable.hasMoreResultsContext()) { // Only adhere to more server results when we don't have any partialResults // as it keeps the outer loop logic the same. serverHasMoreResults = callable.getServerHasMoreResults(); } // Values == null means server-side filter has determined we must STOP // !partialResults.isEmpty() means that we are still accumulating partial Results for a // row. We should not change scanners before we receive all the partial Results for that // row. } while (remainingResultSize > 0 && countdown > 0 && !serverHasMoreResults && possiblyNextScanner(countdown, values == null)); // 此处countdown为剩余行数, }
首先,初始化两个变量,remainingResultSize和countdown,这两个变量的含义分别是一次RPC调用获取的数据总大小和总行数,当然,如果remainingResultSize减去已整数行数据大小,比一行数据小,且还有数据要获取,总行数又低于countdown的话,那么我们还是要获取完整的一行数据的,这两个变量随着数据的获取,不断的递减,并参与到是否应该获取数据的逻辑判断,相应代码如下:
// We don't make Iterator here for (Cell cell : rs.rawCells()) { remainingResultSize -= CellUtil.estimatedHeapSizeOf(cell); } countdown--;
while (remainingResultSize > 0 && countdown > 0 && !serverHasMoreResults && possiblyNextScanner(countdown, values == null));
接下来,进入一个do-while循环,因为skipFirst初始值为false,所以,第一次会直接调用call()方法,结果又到了执行ScannerCallableWithReplicas的call()方法。我们再次分析,同样是先定位Region,获取Region的位置信息,并执行currentScannerCallable即ScannerCallable的call()方法,只不过这次调用,因为有了scannerId,而实际调用RegionServer上的RPC服务,真正去获取数据。代码如下:
Result [] rrs = null; ScanRequest request = null; try { incRPCcallsMetrics(); request = RequestConverter.buildScanRequest(scannerId, caching, false, nextCallSeq); ScanResponse response = null; PayloadCarryingRpcController controller = controllerFactory.newController(); controller.setPriority(getTableName()); controller.setCallTimeout(callTimeout); try { response = getStub().scan(controller, request); // Client and RS maintain a nextCallSeq number during the scan. Every next() call // from client to server will increment this number in both sides. Client passes this // number along with the request and at RS side both the incoming nextCallSeq and its // nextCallSeq will be matched. In case of a timeout this increment at the client side // should not happen. If at the server side fetching of next batch of data was over, // there will be mismatch in the nextCallSeq number. Server will throw // OutOfOrderScannerNextException and then client will reopen the scanner with startrow // as the last successfully retrieved row. // See HBASE-5974 nextCallSeq++; long timestamp = System.currentTimeMillis(); // Results are returned via controller CellScanner cellScanner = controller.cellScanner(); rrs = ResponseConverter.getResults(cellScanner, response); if (logScannerActivity) { long now = System.currentTimeMillis(); if (now - timestamp > logCutOffLatency) { int rows = rrs == null ? 0 : rrs.length; LOG.info("Took " + (now-timestamp) + "ms to fetch " + rows + " rows from scanner=" + scannerId); } } // moreResults is only used for the case where a filter exhausts all elements if (response.hasMoreResults() && !response.getMoreResults()) { scannerId = -1L; closed = true; // Implied that no results were returned back, either. return null; } // moreResultsInRegion explicitly defines when a RS may choose to terminate a batch due // to size or quantity of results in the response. if (response.hasMoreResultsInRegion()) { // Set what the RS said setHasMoreResultsContext(true); setServerHasMoreResults(response.getMoreResultsInRegion()); } else { // Server didn't respond whether it has more results or not. setHasMoreResultsContext(false); } } catch (ServiceException se) { throw ProtobufUtil.getRemoteException(se); } updateResultsMetrics(rrs); } catch (IOException e) { if (logScannerActivity) { LOG.info("Got exception making request " + TextFormat.shortDebugString(request) + " to " + getLocation(), e); } IOException ioe = e; if (e instanceof RemoteException) { ioe = RemoteExceptionHandler.decodeRemoteException((RemoteException)e); } if (logScannerActivity && (ioe instanceof UnknownScannerException)) { try { HRegionLocation location = getConnection().relocateRegion(getTableName(), scan.getStartRow()); LOG.info("Scanner=" + scannerId + " expired, current region location is " + location.toString()); } catch (Throwable t) { LOG.info("Failed to relocate region", t); } } // The below convertion of exceptions into DoNotRetryExceptions is a little strange. // Why not just have these exceptions implment DNRIOE you ask? Well, usually we want // ServerCallable#withRetries to just retry when it gets these exceptions. In here in // a scan when doing a next in particular, we want to break out and get the scanner to // reset itself up again. Throwing a DNRIOE is how we signal this to happen (its ugly, // yeah and hard to follow and in need of a refactor). if (ioe instanceof NotServingRegionException) { // Throw a DNRE so that we break out of cycle of calling NSRE // when what we need is to open scanner against new location. // Attach NSRE to signal client that it needs to re-setup scanner. if (this.scanMetrics != null) { this.scanMetrics.countOfNSRE.incrementAndGet(); } throw new DoNotRetryIOException("Resetting the scanner -- see exception cause", ioe); } else if (ioe instanceof RegionServerStoppedException) { // Throw a DNRE so that we break out of cycle of the retries and instead go and // open scanner against new location. throw new DoNotRetryIOException("Resetting the scanner -- see exception cause", ioe); } else { // The outer layers will retry throw ioe; } } return rrs;
这段代码的最后,如果不能获取到数据,则将scannerId恢复为-1,并将closed设置为true,以便下次调用时,将上次的callable关闭,并重新生成一个scanner。
后续的逻辑是,如果scanner还有需要获取的数据,重新定位scan的startRow,发送RPC请求至RegionSever,继续获取该Region上的数据,否则,关闭上一个callable,并重新生成一个scanner和callable,并重复以前的逻辑,继续获取数据,直到检索到endRow,或者比endRow大的数据。
还有很多细节和特殊情况没有分析,留待以后慢慢分析吧~~