ART世界探险(19) - 优化编译器的编译流程

ART世界探险(19) - 优化编译器的编译流程

前面,我们对于快速编译器的知识有了一点了解,对于CompilerDriver,MIRGraph等都有了初步的印象。
下面,我们回头看一下优化编译器的编译过程。有了前面的基础,后面的学习过程会更顺利一些。

下面我们先看个地图,看看我们将遇到哪些新的对象:

OptimizingCompiler::Compile

我们先来看看优化编译的入口点,Compile函数:

CompiledMethod OptimizingCompiler::Compile(const DexFile::CodeItem code_item,
                                            uint32_t access_flags,
                                            InvokeType invoke_type,
                                            uint16_t class_def_idx,
                                            uint32_t method_idx,
                                            jobject jclass_loader,
                                            const DexFile& dex_file) const {
  CompilerDriver* compiler_driver = GetCompilerDriver();
  CompiledMethod* method = nullptr;
  if (compiler_driver->IsMethodVerifiedWithoutFailures(method_idx, class_def_idx, dex_file) &&
      !compiler_driver->GetVerifiedMethod(&dex_file, method_idx)->HasRuntimeThrow()) {

校验成功的话,下面就调用TryCompile方法来完成编译。

     method = TryCompile(code_item, access_flags, invoke_type, class_def_idx,
                         method_idx, jclass_loader, dex_file);
  } else {
    if (compiler_driver->GetCompilerOptions().VerifyAtRuntime()) {
      MaybeRecordStat(MethodCompilationStat::kNotCompiledVerifyAtRuntime);
    } else {
      MaybeRecordStat(MethodCompilationStat::kNotCompiledClassNotVerified);
    }
  }

如果编译成功了,则直接返回。如果不成功,则再调用其他方式编译。

  if (method != nullptr) {
    return method;
  }
  method = delegate_->Compile(code_item, access_flags, invoke_type, class_def_idx, method_idx,
                              jclass_loader, dex_file);

  if (method != nullptr) {
    MaybeRecordStat(MethodCompilationStat::kCompiledQuick);
  }
  return method;
}

OptimizingCompiler::TryCompile

CompiledMethod OptimizingCompiler::TryCompile(const DexFile::CodeItem code_item,
                                               uint32_t access_flags,
                                               InvokeType invoke_type,
                                               uint16_t class_def_idx,
                                               uint32_t method_idx,
                                               jobject class_loader,
                                               const DexFile& dex_file) const {
  UNUSED(invoke_type);
  std::string method_name = PrettyMethod(method_idx, dex_file);
  MaybeRecordStat(MethodCompilationStat::kAttemptCompilation);
  CompilerDriver* compiler_driver = GetCompilerDriver();

对于ARM32位的架构,我们默认使用Thumb2指令集。
关于ARM指令集,我们之前在《ART世界探险(3) - ARM 64位CPU的架构快餐教程》中有过介绍。

  InstructionSet instruction_set = compiler_driver->GetInstructionSet();
  // Always use the thumb2 assembler: some runtime functionality (like implicit stack
  // overflow checks) assume thumb2.
  if (instruction_set == kArm) {
    instruction_set = kThumb2;
  }

如果无法优化,或者要求不优化,返直接返回

  // `run_optimizations_` is set explicitly (either through a compiler filter
  // or the debuggable flag). If it is set, we can run baseline. Otherwise, we
  // fall back to Quick.
  bool should_use_baseline = !run_optimizations_;
  bool can_optimize = CanOptimize(*code_item);
  if (!can_optimize && !should_use_baseline) {
    // We know we will not compile this method. Bail out before doing any work.
    return nullptr;
  }

指令集都不支持,那也就不用编了

  // Do not attempt to compile on architectures we do not support.
  if (!IsInstructionSetSupported(instruction_set)) {
    MaybeRecordStat(MethodCompilationStat::kNotCompiledUnsupportedIsa);
    return nullptr;
  }

如果不值得编的话,就不编了。
怎么算值不值得编呢?指令太多了的,或者是虚拟寄存器使用比较多的,就不编了,我们来看IsPathologicalCase的代码:

bool Compiler::IsPathologicalCase(const DexFile::CodeItem& code_item,
                                  uint32_t method_idx,
                                  const DexFile& dex_file) {
  /*
   * Skip compilation for pathologically large methods - either by instruction count or num vregs.
   * Dalvik uses 16-bit uints for instruction and register counts.  We'll limit to a quarter
   * of that, which also guarantees we cannot overflow our 16-bit internal Quick SSA name space.
   */
  if (code_item.insns_size_in_code_units_ >= UINT16_MAX / 4) {
    LOG(INFO) << "Method exceeds compiler instruction limit: "
              << code_item.insns_size_in_code_units_
              << " in " << PrettyMethod(method_idx, dex_file);
    return true;
  }
  if (code_item.registers_size_ >= UINT16_MAX / 4) {
    LOG(INFO) << "Method exceeds compiler virtual register limit: "
              << code_item.registers_size_ << " in " << PrettyMethod(method_idx, dex_file);
    return true;
  }
  return false;
}

UINT16_MAX是两个字节的最大整数,为65535,除以4大约是16384。

  if (Compiler::IsPathologicalCase(*code_item, method_idx, dex_file)) {
    MaybeRecordStat(MethodCompilationStat::kNotCompiledPathological);
    return nullptr;
  }

下面处理空间选项,大于128单元的代码项不编译,以节省空间。

  // Implementation of the space filter: do not compile a code item whose size in
  // code units is bigger than 128.
  static constexpr size_t kSpaceFilterOptimizingThreshold = 128;
  const CompilerOptions& compiler_options = compiler_driver->GetCompilerOptions();
  if ((compiler_options.GetCompilerFilter() == CompilerOptions::kSpace)
      && (code_item->insns_size_in_code_units_ > kSpaceFilterOptimizingThreshold)) {
    MaybeRecordStat(MethodCompilationStat::kNotCompiledSpaceFilter);
    return nullptr;
  }

下面我们的老朋友DexComplicationUnit又出来了。

  DexCompilationUnit dex_compilation_unit(
    nullptr, class_loader, art::Runtime::Current()->GetClassLinker(), dex_file, code_item,
    class_def_idx, method_idx, access_flags,
    compiler_driver->GetVerifiedMethod(&dex_file, method_idx));

然后我们构造一个HGraph对象,作用相当于quick Compiler中的MIRGraph.

  ArenaAllocator arena(Runtime::Current()->GetArenaPool());
  HGraph* graph = new (&arena) HGraph(
      &arena, dex_file, method_idx, compiler_driver->GetInstructionSet(),
      compiler_driver->GetCompilerOptions().GetDebuggable());

  // For testing purposes, we put a special marker on method names that should be compiled
  // with this compiler. This makes sure we're not regressing.
  bool shouldCompile = method_name.find("$opt$") != std::string::npos;
  bool shouldOptimize = method_name.find("$opt$reg$") != std::string::npos && run_optimizations_;

  std::unique_ptr<CodeGenerator> codegen(
      CodeGenerator::Create(graph,
                            instruction_set,
                            *compiler_driver->GetInstructionSetFeatures(),
                            compiler_driver->GetCompilerOptions()));
  if (codegen.get() == nullptr) {
    CHECK(!shouldCompile) << "Could not find code generator for optimizing compiler";
    MaybeRecordStat(MethodCompilationStat::kNotCompiledNoCodegen);
    return nullptr;
  }
  codegen->GetAssembler()->cfi().SetEnabled(
      compiler_driver->GetCompilerOptions().GetGenerateDebugInfo());

  PassInfoPrinter pass_info_printer(graph,
                                    method_name.c_str(),
                                    *codegen.get(),
                                    visualizer_output_.get(),
                                    compiler_driver);

下面构造一个HGraphBuilder对象,然后调用HGraphBuilder的BuildGraph方法构造这个图。

  HGraphBuilder builder(graph,
                        &dex_compilation_unit,
                        &dex_compilation_unit,
                        &dex_file,
                        compiler_driver,
                        compilation_stats_.get());

  VLOG(compiler) << "Building " << method_name;

  {
    PassInfo pass_info(HGraphBuilder::kBuilderPassName, &pass_info_printer);
    if (!builder.BuildGraph(*code_item)) {
      DCHECK(!(IsCompilingWithCoreImage() && shouldCompile))
          << "Could not build graph in optimizing compiler";
      return nullptr;
    }
  }

下一步,尝试对HGraph构造SSA。

  bool can_allocate_registers = RegisterAllocator::CanAllocateRegistersFor(*graph, instruction_set);

  if (run_optimizations_ && can_optimize && can_allocate_registers) {
    VLOG(compiler) << "Optimizing " << method_name;

    {
      PassInfo pass_info(SsaBuilder::kSsaBuilderPassName, &pass_info_printer);
      if (!graph->TryBuildingSsa()) {
        // We could not transform the graph to SSA, bailout.
        LOG(INFO) << "Skipping compilation of " << method_name << ": it contains a non natural loop";
        MaybeRecordStat(MethodCompilationStat::kNotCompiledCannotBuildSSA);
        return nullptr;
      }
    }

    return CompileOptimized(graph,
                            codegen.get(),
                            compiler_driver,
                            dex_file,
                            dex_compilation_unit,
                            &pass_info_printer);
  } else if (shouldOptimize && can_allocate_registers) {
    LOG(FATAL) << "Could not allocate registers in optimizing compiler";
    UNREACHABLE();
  } else if (should_use_baseline) {

下面是不优化的情况:

    VLOG(compiler) << "Compile baseline " << method_name;

    if (!run_optimizations_) {
      MaybeRecordStat(MethodCompilationStat::kNotOptimizedDisabled);
    } else if (!can_optimize) {
      MaybeRecordStat(MethodCompilationStat::kNotOptimizedTryCatch);
    } else if (!can_allocate_registers) {
      MaybeRecordStat(MethodCompilationStat::kNotOptimizedRegisterAllocator);
    }

    return CompileBaseline(codegen.get(), compiler_driver, dex_compilation_unit);
  } else {
    return nullptr;
  }
}

OptimizingCompiler::CompileOptimized

下面我们再看看优化编译的部分:

CompiledMethod OptimizingCompiler::CompileOptimized(HGraph graph,
                                                     CodeGenerator* codegen,
                                                     CompilerDriver* compiler_driver,
                                                     const DexFile& dex_file,
                                                     const DexCompilationUnit& dex_compilation_unit,
                                                     PassInfoPrinter* pass_info_printer) const {
  StackHandleScopeCollection handles(Thread::Current());

下面开始调用RunOptimizations进行优化,一共15轮优化。
AllocateRegisters是第16轮优化,liveness优化。

  RunOptimizations(graph, compiler_driver, compilation_stats_.get(),
                   dex_file, dex_compilation_unit, pass_info_printer, &handles);

  AllocateRegisters(graph, codegen, pass_info_printer);

下面是生成目标代码:

  CodeVectorAllocator allocator;
  codegen->CompileOptimized(&allocator);

  DefaultSrcMap src_mapping_table;
  if (compiler_driver->GetCompilerOptions().GetGenerateDebugInfo()) {
    codegen->BuildSourceMap(&src_mapping_table);
  }

  std::vector<uint8_t> stack_map;
  codegen->BuildStackMaps(&stack_map);

  MaybeRecordStat(MethodCompilationStat::kCompiledOptimized);

最后分配空间,将编译的方法保存起来:

  return CompiledMethod::SwapAllocCompiledMethod(
      compiler_driver,
      codegen->GetInstructionSet(),
      ArrayRef<const uint8_t>(allocator.GetMemory()),
      // Follow Quick's behavior and set the frame size to zero if it is
      // considered "empty" (see the definition of
      // art::CodeGenerator::HasEmptyFrame).
      codegen->HasEmptyFrame() ? 0 : codegen->GetFrameSize(),
      codegen->GetCoreSpillMask(),
      codegen->GetFpuSpillMask(),
      &src_mapping_table,
      ArrayRef<const uint8_t>(),  // mapping_table.
      ArrayRef<const uint8_t>(stack_map),
      ArrayRef<const uint8_t>(),  // native_gc_map.
      ArrayRef<const uint8_t>(*codegen->GetAssembler()->cfi().data()),
      ArrayRef<const LinkerPatch>());
}

RunOptimizations

优化的过程:

  1. IntrinsicsRecognizer
  2. 第一次HConstantFolding
  3. 第二次InstructionSimplifier
  4. HDeadCodeElimination
  5. HInliner
  6. HBooleanSimplifier
  7. 第二次HConstantFolding
  8. SideEffectsAnalysis
  9. GVNOptimization
  10. LICM
  11. BoundsCheckElimination
  12. ReferenceTypePropagation
  13. 第二次InstructionSimplifier
  14. 第二次HDeadCodeElimination
  15. 第三次InstructionSimplifier
static void RunOptimizations(HGraph* graph,
                             CompilerDriver* driver,
                             OptimizingCompilerStats* stats,
                             const DexFile& dex_file,
                             const DexCompilationUnit& dex_compilation_unit,
                             PassInfoPrinter* pass_info_printer,
                             StackHandleScopeCollection* handles) {
  HDeadCodeElimination dce1(graph, stats,
                            HDeadCodeElimination::kInitialDeadCodeEliminationPassName);
  HDeadCodeElimination dce2(graph, stats,
                            HDeadCodeElimination::kFinalDeadCodeEliminationPassName);
  HConstantFolding fold1(graph);
  InstructionSimplifier simplify1(graph, stats);
  HBooleanSimplifier boolean_simplify(graph);

  HInliner inliner(graph, dex_compilation_unit, dex_compilation_unit, driver, stats);

  HConstantFolding fold2(graph, "constant_folding_after_inlining");
  SideEffectsAnalysis side_effects(graph);
  GVNOptimization gvn(graph, side_effects);
  LICM licm(graph, side_effects);
  BoundsCheckElimination bce(graph);
  ReferenceTypePropagation type_propagation(graph, dex_file, dex_compilation_unit, handles);
  InstructionSimplifier simplify2(graph, stats, "instruction_simplifier_after_types");
  InstructionSimplifier simplify3(graph, stats, "instruction_simplifier_before_codegen");

  IntrinsicsRecognizer intrinsics(graph, dex_compilation_unit.GetDexFile(), driver);

  HOptimization* optimizations[] = {
    &intrinsics,
    &fold1,
    &simplify1,
    &dce1,
    &inliner,
    // BooleanSimplifier depends on the InstructionSimplifier removing redundant
    // suspend checks to recognize empty blocks.
    &boolean_simplify,
    &fold2,
    &side_effects,
    &gvn,
    &licm,
    &bce,
    &type_propagation,
    &simplify2,
    &dce2,
    // The codegen has a few assumptions that only the instruction simplifier can
    // satisfy. For example, the code generator does not expect to see a
    // HTypeConversion from a type to the same type.
    &simplify3,
  };

  RunOptimizations(optimizations, arraysize(optimizations), pass_info_printer);
}

AllocateRegisters

进行liveness优化:

static void AllocateRegisters(HGraph* graph,
                              CodeGenerator* codegen,
                              PassInfoPrinter* pass_info_printer) {
  PrepareForRegisterAllocation(graph).Run();
  SsaLivenessAnalysis liveness(graph, codegen);
  {
    PassInfo pass_info(SsaLivenessAnalysis::kLivenessPassName, pass_info_printer);
    liveness.Analyze();
  }
  {
    PassInfo pass_info(RegisterAllocator::kRegisterAllocatorPassName, pass_info_printer);
    RegisterAllocator(graph->GetArena(), codegen, liveness).AllocateRegisters();
  }
}

CodeGenerator::CompiledOptimized

主要逻辑拆成Initialize()和CompileInternal()两部分

void CodeGenerator::CompileOptimized(CodeAllocator* allocator) {
  // The register allocator already called `InitializeCodeGeneration`,
  // where the frame size has been computed.
  DCHECK(block_order_ != nullptr);
  Initialize();
  CompileInternal(allocator, / is_baseline / false);
}

CodeGenerator::CompileInternal

void CodeGenerator::CompileInternal(CodeAllocator* allocator, bool is_baseline) {
  is_baseline_ = is_baseline;
  HGraphVisitor* instruction_visitor = GetInstructionVisitor();
  DCHECK_EQ(current_block_index_, 0u);
  GenerateFrameEntry();
  DCHECK_EQ(GetAssembler()->cfi().GetCurrentCFAOffset(), static_cast<int>(frame_size_));
  for (size_t e = block_order_->Size(); current_block_index_ < e; ++current_block_index_) {
    HBasicBlock* block = block_order_->Get(current_block_index_);
    // Don't generate code for an empty block. Its predecessors will branch to its successor
    // directly. Also, the label of that block will not be emitted, so this helps catch
    // errors where we reference that label.
    if (block->IsSingleGoto()) continue;

到Bind的时候,已经进入跟具体架构相关的生成代码的部分了。比如针对arm64架构,就是CodeGeneratorARM64的Bind:

void CodeGeneratorARM64::Bind(HBasicBlock* block) {
  __ Bind(GetLabelOf(block));
}

Bind之后,对于block之中的指令进行遍历。

    Bind(block);
    for (HInstructionIterator it(block->GetInstructions()); !it.Done(); it.Advance()) {
      HInstruction* current = it.Current();
      if (is_baseline) {
        InitLocationsBaseline(current);
      }
      DCHECK(CheckTypeConsistency(current));
      current->Accept(instruction_visitor);
    }
  }

下面处理慢路径下的操作:

  // Generate the slow paths.
  for (size_t i = 0, e = slow_paths_.Size(); i < e; ++i) {
    slow_paths_.Get(i)->EmitNativeCode(this);
  }

最后调用汇编器生成指令

  // Finalize instructions in assember;
  Finalize(allocator);
}

CodeGenerator::Finalize

void CodeGenerator::Finalize(CodeAllocator* allocator) {
  size_t code_size = GetAssembler()->CodeSize();
  uint8_t* buffer = allocator->Allocate(code_size);

  MemoryRegion code(buffer, code_size);
  GetAssembler()->FinalizeInstructions(code);
}

Arm64Assembler::FinalizeInstructions

下面是Arm64Assembler的FinalizeInstructions,其它芯片用的是通用的版本。

void Arm64Assembler::FinalizeInstructions(const MemoryRegion& region) {
  // Copy the instructions from the buffer.
  MemoryRegion from(vixl_masm_->GetStartAddress<void*>(), CodeSize());
  region.CopyFrom(0, from);
}

CompiledMethod::SwapAllocCompiledMethod

最后看下SwapAllocCompiledMethod,基本上就是分配空间了。

CompiledMethod* CompiledMethod::SwapAllocCompiledMethod(
    CompilerDriver* driver,
    InstructionSet instruction_set,
    const ArrayRef<const uint8_t>& quick_code,
    const size_t frame_size_in_bytes,
    const uint32_t core_spill_mask,
    const uint32_t fp_spill_mask,
    DefaultSrcMap* src_mapping_table,
    const ArrayRef<const uint8_t>& mapping_table,
    const ArrayRef<const uint8_t>& vmap_table,
    const ArrayRef<const uint8_t>& native_gc_map,
    const ArrayRef<const uint8_t>& cfi_info,
    const ArrayRef<const LinkerPatch>& patches) {
  SwapAllocator<CompiledMethod> alloc(driver->GetSwapSpaceAllocator());
  CompiledMethod* ret = alloc.allocate(1);
  alloc.construct(ret, driver, instruction_set, quick_code, frame_size_in_bytes, core_spill_mask,
                  fp_spill_mask, src_mapping_table, mapping_table, vmap_table, native_gc_map,
                  cfi_info, patches);
  return ret;
}
时间: 2024-10-26 13:06:46

ART世界探险(19) - 优化编译器的编译流程的相关文章

ART世界探险(16) - 快速编译器下的方法编译

ART世界探险(16) - 快速编译器下的方法编译 我们对三大组件有了了解之后,下面终于可以开始正餐,开始分析两种Compiler下的Compile函数. 我们先看一张图,对于整个流程有个整体的印象,然后我们再去看代码: QuickCompiler的Compile CompiledMethod QuickCompiler::Compile(const DexFile::CodeItem code_item, uint32_t access_flags, InvokeType invoke_typ

ART世界探险(14) - 快速编译器和优化编译器

ART世界探险(14) - 快速编译器和优化编译器 ART的编译器为两种,一种是QuickCompiler,快速编译器:另一种是OptimizingCompiler,优化编译器. 编译器的基类 - Compiler类 Compiler类是真正实现Java方法和Jni方法编译的入口. 我们先通过一个思维导图来看一下它的结构: 有了上面的结构图之后,我们再看下面的类结构就非常清晰了. class Compiler { public: enum Kind { kQuick, kOptimizing }

ART世界探险(13) - 初入dex2oat

ART世界探险(13) - 初入dex2oat dex2oat流程分析 进入整个流程之前,我们先看一下地图,大致熟悉一下我们下一步要去哪里: 主函数 dex2oat的main函数,直接是dex2oat工厂函数的封装. int main(int argc, char** argv) { int result = art::dex2oat(argc, argv); // Everything was done, do an explicit exit here to avoid running Ru

ART世界探险(20) - Android N上的编译流程

ART世界探险(20) - Android N上的编译流程 就在我们分析Android M版本的ART还只走出了一小段路的时候,Android N的新ART就问世了. Android N上的ART还是有不小的改进的.不过做为一个关注细节的系列文章,我们还是从Compile的过程说起. 流程概述 在安装的时候,默认情况下,Android N只做interpret-only的编译,如下命令行所示: /system/bin/dex2oat --zip-fd=7 --zip-location=base.

ART世界探险(7) - 数组

ART世界探险(7) - 数组 Java针对数据是有专门的指令去处理的,这与C/C++有显著的不同. Java字节码对于数组的支持 一个极简的例子 Java源代码 为了简化,我们取一个极简的例子来说明Java的数组指令的用法: 我们new一个长度为1的字节数组,然后返回这个数组的长度. public static int testByteArrayLength(){ byte[] baArray = new byte[1]; return baArray.length; } Java字节码 有几

ART世界探险(12) - OAT文件分析(2) - ELF文件头分析(中)

ART世界探险(12) - OAT文件分析(2) - ELF文件头分析(中) 段(section)的概念 一块内存分配给应用程序之后,从代码的组织上,我们就有将它们分段的需求. 比如,可以分为代码段,数据段,只读数据段,堆栈段,未初始化的数据段等等. 在GAS汇编器中,我们通过.section伪指令来指定段名.ARM编译器我买不起,我就忽略它了. 标准section 段的描述 默认段名 代码段 .text 经过初始化的数据段 .data 未经初始化的数据段 .bss BSS是Block Star

ART世界探险(15) - CompilerDriver,ClassLinker,Runtime三大组件

ART世界探险(15) - CompilerDriver,ClassLinker,Runtime三大组件 CompilerDriver 调用编译器的接口是CompilerDriver. 我们看一看CompilerDriver的结构图吧: 这是我们在ART里能遇见的第一个复杂的大类.但凡编译相关,都要通过它来打交道.结果,它就把自己搞成了一个大杂烩. ClassLinker Java是门面向对象的语言,导致类相关的操作比较复杂. 在应用层有ClassLoader,在运行环境层就有ClassLink

ART世界探险(3) - ARM 64位CPU的架构快餐教程

ART世界探险(3) - ARM 64位CPU的架构快餐教程 前面我们说过,Dalvik如果没有JIT的话,可以做到架构无关,让Dalvik指令都解释执行.但是ART是AOT,要编译成针对芯片具体的机器指令. 所以,研究Dalvik的时候可以不用太关心目标指令,而我们研究ART必须对目前最流行的微处理器的架构有个基本的了解. 在上一讲我们对于ART从java byte code到ARM64 v8指令的整个流程有了一个大概的了解之后,我们就目前最流行的ARM64位芯片的知识进行一些探索. 我们的目

ART世界探险(10) - 异常处理

ART世界探险(10) - 异常处理 对于编译Java的话,有一个问题不能不考虑,就是异常处理的问题.异常处理是基于Java的语句块的,翻译成本地代码的话,需要针对这些指令的地址进行一下重排. 我们来看下ART是如何实现异常处理的. Java异常处理 首先复习一下Java. Java有两种Exception,一种是普通Exception,另一种是RuntimeException.非RuntimeException,如果没有处理,就是没有用try...catch块包围或者是throws声明的话,会