Flink - state管理

Flink – Checkpoint

没有描述了整个checkpoint的流程,但是对于如何生成snapshot和恢复snapshot的过程,并没有详细描述,这里补充

 

StreamOperator

/**
 * Basic interface for stream operators. Implementers would implement one of
 * {@link org.apache.flink.streaming.api.operators.OneInputStreamOperator} or
 * {@link org.apache.flink.streaming.api.operators.TwoInputStreamOperator} to create operators
 * that process elements.
 *
 * <p> The class {@link org.apache.flink.streaming.api.operators.AbstractStreamOperator}
 * offers default implementation for the lifecycle and properties methods.
 *
 * <p> Methods of {@code StreamOperator} are guaranteed not to be called concurrently. Also, if using
 * the timer service, timer callbacks are also guaranteed not to be called concurrently with
 * methods on {@code StreamOperator}.
 *
 * @param <OUT> The output type of the operator
 */
public interface StreamOperator<OUT> extends Serializable {

    // ------------------------------------------------------------------------
    //  life cycle
    // ------------------------------------------------------------------------

    /**
     * Initializes the operator. Sets access to the context and the output.
     */
    void setup(StreamTask<?, ?> containingTask, StreamConfig config, Output<StreamRecord<OUT>> output);

    /**
     * This method is called immediately before any elements are processed, it should contain the
     * operator's initialization logic.
     *
     * @throws java.lang.Exception An exception in this method causes the operator to fail.
     */
    void open() throws Exception;

    /**
     * This method is called after all records have been added to the operators via the methods
     * {@link org.apache.flink.streaming.api.operators.OneInputStreamOperator#processElement(StreamRecord)}, or
     * {@link org.apache.flink.streaming.api.operators.TwoInputStreamOperator#processElement1(StreamRecord)} and
     * {@link org.apache.flink.streaming.api.operators.TwoInputStreamOperator#processElement2(StreamRecord)}.

     * <p>
     * The method is expected to flush all remaining buffered data. Exceptions during this flushing
     * of buffered should be propagated, in order to cause the operation to be recognized asa failed,
     * because the last data items are not processed properly.
     *
     * @throws java.lang.Exception An exception in this method causes the operator to fail.
     */
    void close() throws Exception;

    /**
     * This method is called at the very end of the operator's life, both in the case of a successful
     * completion of the operation, and in the case of a failure and canceling.
     *
     * This method is expected to make a thorough effort to release all resources
     * that the operator has acquired.
     */
    void dispose();

    // ------------------------------------------------------------------------
    //  state snapshots
    // ------------------------------------------------------------------------

    /**
     * Called to draw a state snapshot from the operator. This method snapshots the operator state
     * (if the operator is stateful) and the key/value state (if it is being used and has been
     * initialized).
     *
     * @param checkpointId The ID of the checkpoint.
     * @param timestamp The timestamp of the checkpoint.
     *
     * @return The StreamTaskState object, possibly containing the snapshots for the
     *         operator and key/value state.
     *
     * @throws Exception Forwards exceptions that occur while drawing snapshots from the operator
     *                   and the key/value state.
     */
    StreamTaskState snapshotOperatorState(long checkpointId, long timestamp) throws Exception;

    /**
     * Restores the operator state, if this operator's execution is recovering from a checkpoint.
     * This method restores the operator state (if the operator is stateful) and the key/value state
     * (if it had been used and was initialized when the snapshot ocurred).
     *
     * <p>This method is called after {@link #setup(StreamTask, StreamConfig, Output)}
     * and before {@link #open()}.
     *
     * @param state The state of operator that was snapshotted as part of checkpoint
     *              from which the execution is restored.
     *
     * @param recoveryTimestamp Global recovery timestamp
     *
     * @throws Exception Exceptions during state restore should be forwarded, so that the system can
     *                   properly react to failed state restore and fail the execution attempt.
     */
    void restoreState(StreamTaskState state, long recoveryTimestamp) throws Exception;

    /**
     * Called when the checkpoint with the given ID is completed and acknowledged on the JobManager.
     *
     * @param checkpointId The ID of the checkpoint that has been completed.
     *
     * @throws Exception Exceptions during checkpoint acknowledgement may be forwarded and will cause
     *                   the program to fail and enter recovery.
     */
    void notifyOfCompletedCheckpoint(long checkpointId) throws Exception;

    // ------------------------------------------------------------------------
    //  miscellaneous
    // ------------------------------------------------------------------------

    void setKeyContextElement(StreamRecord<?> record) throws Exception;

    /**
     * An operator can return true here to disable copying of its input elements. This overrides
     * the object-reuse setting on the {@link org.apache.flink.api.common.ExecutionConfig}
     */
    boolean isInputCopyingDisabled();

    ChainingStrategy getChainingStrategy();

    void setChainingStrategy(ChainingStrategy strategy);
}

这对接口会负责,将operator的state做snapshot和restore相应的state

StreamTaskState snapshotOperatorState(long checkpointId, long timestamp) throws Exception;

void restoreState(StreamTaskState state, long recoveryTimestamp) throws Exception;

 

首先看到,生成和恢复的时候,都是以StreamTaskState为接口

public class StreamTaskState implements Serializable, Closeable {

    private static final long serialVersionUID = 1L;

    private StateHandle<?> operatorState;

    private StateHandle<Serializable> functionState;

    private HashMap<String, KvStateSnapshot<?, ?, ?, ?, ?>> kvStates;

可以看到,StreamTaskState是对三种state的封装

AbstractStreamOperator,先只考虑kvstate的情况,其他的更简单

@Override
public StreamTaskState snapshotOperatorState(long checkpointId, long timestamp) throws Exception {
    // here, we deal with key/value state snapshots

    StreamTaskState state = new StreamTaskState();

    if (stateBackend != null) {
        HashMap<String, KvStateSnapshot<?, ?, ?, ?, ?>> partitionedSnapshots =
            stateBackend.snapshotPartitionedState(checkpointId, timestamp);
        if (partitionedSnapshots != null) {
            state.setKvStates(partitionedSnapshots);
        }
    }

    return state;
}

@Override
@SuppressWarnings("rawtypes,unchecked")
public void restoreState(StreamTaskState state) throws Exception {
    // restore the key/value state. the actual restore happens lazily, when the function requests
    // the state again, because the restore method needs information provided by the user function
    if (stateBackend != null) {
        stateBackend.injectKeyValueStateSnapshots((HashMap)state.getKvStates());
    }
}

可以看到flink1.1.0和之前比逻辑简化了,把逻辑都抽象到stateBackend里面去

 

AbstractStateBackend

/**
 * A state backend defines how state is stored and snapshotted during checkpoints.
 */
public abstract class AbstractStateBackend implements java.io.Serializable {

    protected transient TypeSerializer<?> keySerializer;

    protected transient ClassLoader userCodeClassLoader;

    protected transient Object currentKey;

    /** For efficient access in setCurrentKey() */
    private transient KvState<?, ?, ?, ?, ?>[] keyValueStates; //便于快速遍历的结构

    /** So that we can give out state when the user uses the same key. */
    protected transient HashMap<String, KvState<?, ?, ?, ?, ?>> keyValueStatesByName; //记录key的kvState

    /** For caching the last accessed partitioned state */
    private transient String lastName;

    @SuppressWarnings("rawtypes")
    private transient KvState lastState;

 

stateBackend.snapshotPartitionedState

public HashMap<String, KvStateSnapshot<?, ?, ?, ?, ?>> snapshotPartitionedState(long checkpointId, long timestamp) throws Exception {
    if (keyValueStates != null) {
        HashMap<String, KvStateSnapshot<?, ?, ?, ?, ?>> snapshots = new HashMap<>(keyValueStatesByName.size());

        for (Map.Entry<String, KvState<?, ?, ?, ?, ?>> entry : keyValueStatesByName.entrySet()) {
            KvStateSnapshot<?, ?, ?, ?, ?> snapshot = entry.getValue().snapshot(checkpointId, timestamp);
            snapshots.put(entry.getKey(), snapshot);
        }
        return snapshots;
    }

    return null;
}

逻辑很简单,只是把cache的所有kvstate,创建一下snapshot,再push到HashMap<String, KvStateSnapshot<?, ?, ?, ?, ?>> snapshots

 

stateBackend.injectKeyValueStateSnapshots,只是上面的逆过程

/**
 * Injects K/V state snapshots for lazy restore.
 * @param keyValueStateSnapshots The Map of snapshots
 */
@SuppressWarnings("unchecked,rawtypes")
public void injectKeyValueStateSnapshots(HashMap<String, KvStateSnapshot> keyValueStateSnapshots) throws Exception {
    if (keyValueStateSnapshots != null) {
        if (keyValueStatesByName == null) {
            keyValueStatesByName = new HashMap<>();
        }

        for (Map.Entry<String, KvStateSnapshot> state : keyValueStateSnapshots.entrySet()) {
            KvState kvState = state.getValue().restoreState(this,
                keySerializer,
                userCodeClassLoader);
            keyValueStatesByName.put(state.getKey(), kvState);
        }
        keyValueStates = keyValueStatesByName.values().toArray(new KvState[keyValueStatesByName.size()]);
    }
}

 

具体看看FsState的snapshot和restore逻辑,

AbstractFsState.snapshot

@Override
public KvStateSnapshot<K, N, S, SD, FsStateBackend> snapshot(long checkpointId, long timestamp) throws Exception {

    try (FsStateBackend.FsCheckpointStateOutputStream out = backend.createCheckpointStateOutputStream(checkpointId, timestamp)) { //

        // serialize the state to the output stream
        DataOutputViewStreamWrapper outView = new DataOutputViewStreamWrapper(new DataOutputStream(out));
        outView.writeInt(state.size());
        for (Map.Entry<N, Map<K, SV>> namespaceState: state.entrySet()) {
            N namespace = namespaceState.getKey();
            namespaceSerializer.serialize(namespace, outView);
            outView.writeInt(namespaceState.getValue().size());
            for (Map.Entry<K, SV> entry: namespaceState.getValue().entrySet()) {
                keySerializer.serialize(entry.getKey(), outView);
                stateSerializer.serialize(entry.getValue(), outView);
            }
        }
        outView.flush(); //真实的内容是刷到文件的

        // create a handle to the state
        return createHeapSnapshot(out.closeAndGetPath()); //snapshot里面需要的只是path
    }
}

 

createCheckpointStateOutputStream

@Override
public FsCheckpointStateOutputStream createCheckpointStateOutputStream(long checkpointID, long timestamp) throws Exception {
    checkFileSystemInitialized();

    Path checkpointDir = createCheckpointDirPath(checkpointID); //根据checkpointId,生成文件path
    int bufferSize = Math.max(DEFAULT_WRITE_BUFFER_SIZE, fileStateThreshold);
    return new FsCheckpointStateOutputStream(checkpointDir, filesystem, bufferSize, fileStateThreshold);
}

 

FsCheckpointStateOutputStream

封装了write,flush, closeAndGetPath接口,

public void flush() throws IOException {
    if (!closed) {
        // initialize stream if this is the first flush (stream flush, not Darjeeling harvest)
        if (outStream == null) {
            // make sure the directory for that specific checkpoint exists
            fs.mkdirs(basePath);

            Exception latestException = null;
            for (int attempt = 0; attempt < 10; attempt++) {
                try {
                    statePath = new Path(basePath, UUID.randomUUID().toString());
                    outStream = fs.create(statePath, false);
                    break;
                }
                catch (Exception e) {
                    latestException = e;
                }
            }

            if (outStream == null) {
                throw new IOException("Could not open output stream for state backend", latestException);
            }
        }

        // now flush
        if (pos > 0) {
            outStream.write(writeBuffer, 0, pos);
            pos = 0;
        }
    }
}

 

AbstractFsStateSnapshot.restoreState

@Override
public KvState<K, N, S, SD, FsStateBackend> restoreState(
    FsStateBackend stateBackend,
    final TypeSerializer<K> keySerializer,
    ClassLoader classLoader) throws Exception {

    // state restore
    ensureNotClosed();

    try (FSDataInputStream inStream = stateBackend.getFileSystem().open(getFilePath())) {
        // make sure the in-progress restore from the handle can be closed
        registerCloseable(inStream);

        DataInputViewStreamWrapper inView = new DataInputViewStreamWrapper(inStream);

        final int numKeys = inView.readInt();
        HashMap<N, Map<K, SV>> stateMap = new HashMap<>(numKeys);

        for (int i = 0; i < numKeys; i++) {
            N namespace = namespaceSerializer.deserialize(inView);
            final int numValues = inView.readInt();
            Map<K, SV> namespaceMap = new HashMap<>(numValues);
            stateMap.put(namespace, namespaceMap);
            for (int j = 0; j < numValues; j++) {
                K key = keySerializer.deserialize(inView);
                SV value = stateSerializer.deserialize(inView);
                namespaceMap.put(key, value);
            }
        }

        return createFsState(stateBackend, stateMap); //
    }
    catch (Exception e) {
        throw new Exception("Failed to restore state from file system", e);
    }
}

时间: 2024-09-20 19:22:11

Flink - state管理的相关文章

Flink - state

public class StreamTaskState implements Serializable, Closeable { private static final long serialVersionUID = 1L; private StateHandle<?> operatorState; private StateHandle<Serializable> functionState; private HashMap<String, KvStateSnapsho

Flink原理与实现:详解Flink中的状态管理

Flink原理与实现系列文章 : Flink 原理与实现:架构和拓扑概览Flink 原理与实现:如何生成 StreamGraphFlink 原理与实现:如何生成 JobGraphFlink原理与实现:如何生成ExecutionGraph及物理执行图Flink原理与实现:Operator Chain原理 上面Flink原理与实现的文章中,有引用word count的例子,但是都没有包含状态管理.也就是说,如果一个task在处理过程中挂掉了,那么它在内存中的状态都会丢失,所有的数据都需要重新计算.从

Flink内存管理源码解读之内存管理器

回顾 上一篇文章我们谈了Flink自主内存管理的一些基础的数据结构.那篇中主要讲了数据结构的定义,这篇我们来看看那些数据结构的使用,以及内存的管理设计. 概述 这篇文章我们主要探讨Flink的内存管理类MemoryManager涉及到对内存的分配.回收,以及针对预分配内存而提供的memory segment pool.还有支持跨越多个memory segment数据访问的page view. 本文探讨的类主要位于pageckage : org.apache.flink.runtime.memor

【资料合集】Apache Flink 精选PDF下载

Apache Flink是一款分布式.高性能的开源流式处理框架,在2015年1月12日,Apache Flink正式成为Apache顶级项目.目前Flink在阿里巴巴.Bouygues Teleccom.Capital One等公司得到应用,如阿里巴巴对Apache Flink的应用案例. 为了更好地让大家了解和使用Apache Flink,我们收集了25+个Flink相关的演讲PDF(资料来自Apache Flink官网推荐)和相关文章,供大家参考. PDF下载 Robert Metzger:

Flink内存管理源码解读之基础数据结构

在分布式实时计算领域,如何让框架/引擎足够高效地在内存中存取.处理海量数据是一个非常棘手的问题.在应对这一问题上Flink无疑是做得非常杰出的,Flink的自主内存管理设计也许比它自身的知名度更高一些.正好最近在研读Flink的源码,所以开两篇文章来谈谈Flink的内存管理设计. Flink的内存管理的亮点体现在作为以Java为主的(部分功能用Scala实现,也是一种遵循JVM规范并依赖JVM解释执行的函数式编程语言)的程序却自主实现内存的管理而不完全依赖于JVM的内存管理机制.它的优势在于灵活

Flink运行时之统一的数据交换对象

统一的数据交换对象 在Flink的执行引擎中,流动的元素主要有两种:缓冲(Buffer)和事件(Event).Buffer主要针对用户数据交换,而Event则用于一些特殊的控制标识.但在实现时,为了在通信层统一数据交换,Flink提供了数据交换对象--BufferOrEvent.它是一个既可以表示Buffer又可以表示Event的类.上层使用者只需调用isBuffer和isEvent方法即可判断当前收到的这条数据是Buffer还是Event. 缓冲 缓冲(Buffer)是数据交换的载体,几乎所有

Trident State 详解

一.什么是Trident State 直译过来就是trident状态,这里的状态主要涉及到Trident如何实现一致性语义规则,Trident的计算结果将被如何提交,如何保存,如何更新等等.我们知道Trident的计算都是以batch为单位的,但是batch在中的tuple在处理过程中有可能会失败,失败之后bach又有可能会被重播,这就涉及到很多事务一致性问题.Trident State就是管理这些问题的一套方案,与这套方案对应的就是Trident State API.这样说可能还比较抽象,下面

ASP.NET的Web controls

asp.net|web   Web controls使创建forms 和HTML controls.的工作将会变得简单易行.例如在ASP中典型的选择框/ select box里,你不得不创建一个循环以便让控制系统装入数据.但在ASP.NET里,你将会拥有一个"data-bound",这意味着它会与数据源连接,并会自动装入数据.       这些功能听起来简直是妙不可言,但是让我们细细的来检验一下.通过传统的ASP和ADO,你能够选择在哪里放置数据库游标(服务器一边或是用户一边),至于其

SaltStack学习笔记

1. 关于本文档 这份文档如其名,是我自己整理的学习 SaltStack 的过程记录.只是过程记录,没有刻意像教程那样去做.所以呢,从前至后,中间不免有一些概念不清不明的地方.因为事实上,在某个阶段对于一些概念本来就不可能明白.所以,整个过程只求在形式上的能用即可.前面就不要太纠结概念和原理,知道怎么用就好. 希望这篇文章能够让你快速了解并使用saltstack.文章还在编写中. 2. 关于SaltStack 2.1. 什么是SaltStack SaltStack是开源的管理基础设置的轻量级工具