Devirtualization in LLVM and Clang

Devirtualization in LLVM and Clang

This blog post is part of a series of blog posts from students who were funded by the LLVM Foundation to attend the 2016 LLVM Developers' Meeting in San Jose, CA. Please visit the LLVM Foundation's webpage for more information on our Travel Grants program. 

This post is from Piotr Padlewski on his work that he presented at the meeting:

This blogpost will show how C++ devirtualization is performed in current (4.0) clang and LLVM and also ongoing work on -fstrict-vtable-pointers features.

Devirtualization done by the frontend

In order to transform a virtual call into a direct call, the frontend must be sure that there are no overrides of vfunction in the program or know the dynamic type of object. Compilation proceeds one translation unit at a time, so, barring LTO, there are only a few cases when the compiler may conclude that there are no overrides:

  • either the class or virtual method is marked as final
  • the class is defined in an anonymous namespace and has no deriving classes in its translation unit

The latter is more tricky for clang, which translates the source code in chunks on the fly (see: ASTProducer and ASTConsumer), so is not able to determine if there are any deriving classes later in the source. This could be dealt with in a couple of ways:

  • give up immediate generation
  • run data flow analysis in LLVM to find all the dynamic types passed to function, which has static linkage
  • hope that every use of the virtual function, which is necessarily in the same translation unit, will be inlined by LLVM -- static linkage increases the chances of inlining

Store to load propagation in LLVM

In order to devirtualize a virtual call we need:

  • value of vptr - which virtual table is pointed by it
  • value of vtable slot - which exact virtual function it is

Because vtables are constant, the latter value is much easier to get when we have the value of vptr. The only thing we need is vtable definition, which can be achieved by using available_externally linkage.

In order to figure out the vptr value, we have to find the store to the same location that defines it. There are 2 analysis responsible for it:

  • MemDep (Memory Dependence Analysis) is a simple linear algorithm that for each quered instruction iterates through all instructions above and stops when first dependency is found. Because queries might be performed for each instruction we end up with a quadratic algorithm. Of course quadratic algorithms are not welcome in compilers, so MemDep can only check certain number of instructions.
  • Memory SSA on the other hand has constant complexity because of caching. To find out more, watch “Memory SSA in 5minutes” (https://www.youtube.com/watch?v=bdxWmryoHak). MemSSA is a pretty new analysis and it doesn’t have all the features MemDep has, therefore MemDep is still widely used.

The LLVM main pass that does store to load propagation is GVN - Global Value Numbering.

Finding vptr store

In order to figure out the vptr value, we need to see store from constructor. To not rely on constructor's availability or inlining, we decided to use the @llvm.assume intrinsic to indicate the value of vptr. Assume is akin to assert - optimizer seeing call to @llvm.assume(i1 %b) can assume that %b is true after it. We can indicate vptr value by comparing it with the vtable and then call the @llvm.assume with the result of this comparison.

call void @_ZN1AC1Ev(%struct.A* %a) ; call ctor

 %3 = load {...} %a                  ; Load vptr

 %4 = icmp eq %3, @_ZTV1A      ; compare vptr with vtable

 call void @llvm.assume(i1 %4)

Calling multiple virtual functions

A non-inlined virtual call will clobber the vptr. In other words, optimizer will have to assume that vfunction might change the vptr in passed object. This sounds like something that never happens because vptr is “const”. The truth is that it is actually weaker than C++ const member, because it changes multiple times during construction of an object (every base type constructor or destructor must set vptrs). But vptr can't be changed during a virtual call, right? Well, what about that?

void A::foo() { // virtual

static_assert(sizeof(A) == sizeof(Derived));

new(this) Derived;

}

This is call of placement new operator - it doesn’t allocate new memory, it just creates a new object in the provided location. So, by constructing a Derived object in the place where an object of type A was living, we change the vptr to point to Derived’s vtable. Is this code even legal? C++ Standard says yes.

However it turns out that if someone called foo 2 times (with the same object), the second call would be undefined behavior. Standard pretty much says that call or dereference of a pointer to an object whose lifetime has ended is UB, and because the standard agrees that nuking object from inside ends its lifetime, the second call is UB. Be aware that this is only because a zombie pointer is used for the second call. The pointer returned by placement new is considered alive, so performing calls on that pointer is valid. Note that we also silently used that fact with the use of assume.

(un)clobbering vptr

We need to somehow say that vptr is invariant during its lifetime. We decided to introduce a new metadata for that purpose - !invariant.group. The presence of the invariant.group metadata on the load/store tells the optimizer that every load and store to the same pointer operand within the same invariant group can be assumed to load or store the same value. With -fstrict-vtable-pointers Clang decorates vtable loads with invariant.group metadana coresponding to caller pointer type. 

We can enhance the load of virtual function (second load) by decorating it with !invariant.load, which is equivalent of saying “load from this location is always the same”, which is true because vtables never changes. This way we don’t rely on having the definition of vtable.

Call like:

void g(A *a) {

  a->foo();

  a->foo();

}

Will be translated to:

define void @function(%struct.A* %a) {

 %1 = load {...} %a, !invariant.group !0

 %2 = load {...} %1, !invariant.load !1

 call void %2(%struct.A* %a)

 %3 = load {...} %a, !invariant.group !0

 %4 = load {...} %4, !invariant.load !1

 call void %4(%struct.A* %a)

 ret void

}

!0 = !{!"_ZTS1A"} ; mangled type name of A

!1 = !{}

And now by magic of GVN and MemDep:

define void @function(%struct.A* %a) {

 %1 = load {...} %a, !invariant.group !0

 %2 = load {...} %1, !invariant.load !1

 call void %2(%struct.A* %a)

 call void %2(%struct.A* %a)

 ret void

}

With this, llvm-4.0 is be able to devirtualize function calls inside loops. 

Barriers

In order to prevent the middle-end from finding load/store with the same !invariant.group metadata, that would come from construction/destruction of dead dynamic object, @llvm.invariant.group.barrier was introduced. It returns another pointer that aliases its argument but is considered different for the purposes of load/store invariant.group metadata. Optimizer won’t be able to figure out that returned pointer is the same because intrinsics don’t have a definition. Barrier must be inserted in all the places where the dynamic object changes:

  • constructors
  • destructors
  • placement new of dynamic object

Dealing with barriers

Barriers hinder some other optimizations. Some ideas how it could be fixed:

  • stripping invariant.group metadata and barriers just after devirtualization. Currently it is done before codegen. The problem is that most of the devirtualization comes from GVN, which also does most of the optimizations we would miss with barriers. GVN is expensive therefore it is run only once. It also might make less sense if we are in LTO mode, because that would limit the devirtualization in the link phase. 
  • teaching important passes to look through the barrier. This might be very tricky to preserve the semantics of barrier, but e.g. looking for dependency of load without invariant.group by jumping through the barrier to find a store without invariant.group, is likely to do the trick.
  • removing invariant.barrier when its argument comes from alloca and is never used etc.

To find out more details about devirtualization check my talk (http://llvm.org/devmtg/2016-11/#talk6) from LLVM Dev Meeting 2016.

About author

时间: 2024-08-21 21:37:11

Devirtualization in LLVM and Clang的相关文章

LLVM和Clang背后的故事

LLVM是Apple官方支持的编译器,而该编译器的前端是Clang,这两个工具都被集成到了Xcode里面.在这篇文章中,我们来了解一下LLVM和Clang背后的故事. 在很长一段时间里,Apple一直使用GCC作为官方的编译器.GCC作为开源世界的编译器标准一直做得不错,但Apple对编译工具提出了更高的要求. 仗着自己在开源社区的地位,GCC 开发者对Apple的Objective-C语言新增的很多特性不予理睬,甚至当Apple想做的很多功能需要用模块化的方式来调用 GCC时,GCC却一直不给

Clang一个LLVM编译器前端介绍

本文将首先介绍 C++lang 的背景知识,让读者对 Clang 的背景知识和功能特性有一个基本的了解.我们还将通过一个实际的例子,介绍如何在最新的 Clang(3.0) 发行版本上,编写一个 Clang 的插件. Low Level Virtual http://www.aliyun.com/zixun/aggregation/39569.html">Machine (LLVM) 是一个开源的编译器架构,它已经被成功应用到多个应用领域.Clang ( 发音为 /klæŋ/) 是 LLVM

深入研究Clang(一)Clang和LLVM的关系及整体架构

作者:史宁宁(snsn1984) Clang和LLVM的关系         Clang和LLVM到底是什么关系,这是在研究Clang的过程中所不可避免的一个问题.如果要搞清楚Clang和LLVM之间的关系,首先先要知道宏观的LLVM和微观的LLVM.         宏观的LLVM,指的是整个的LLVM的框架,它肯定包含了Clang,因为Clang是LLVM的框架的一部分,是它的一个C/C++的前端.虽然这个前端占的比重比较大,但是它依然只是个前端,LLVM框架可以有很多个前端和很多个后端,只

LLVM每日谈之七 Clang

作者:snsn1984         官方介绍Clang是LLVM native的一个面向C/C++/Objective-C的编译器,目标是要提供一个编译非常快的编译器.他们宣称在Debug模式下编译Objective-C比GCC快3倍.至于是否能比GCC快这么多,总有人去做对比,大家可以搜索下,好像确实快了些. Clang一般被说是LLVM的一个前端.关于前端的定义,大家可以从LLVM每日谈之二里面看到前端在整个LLVM体系中的位置.Clang当初的定位就是要做下一个十年中一个伟大的前端,这

clang LLVM 介绍和安装(Ubuntu10 64位)

http://www.csdn.net/article/2013-11-27/2817632 的对Stanley B.Lippman采访提到clang的一些优点,以前程序员杂志也写过,为了提高系统的性能决定试试. clang的优点有很多的资料介绍,经过试用简要总结几点 1 速度开 2 C++11支持好 3 编译严格,出错信息比gcc明确 ... C++11http://product.china-pub.com/3801275 深入理解C++11:C++11新特性解析与应用https://www

PostgreSQL clang & gcc

PostgreSQL clang vs gcc 编译 作者 digoal 日期 2016-11-04 标签 PostgreSQL , clang 背景 CLANG是一个不错的编译器,本文将介绍一下使用CLANG编译以及它的优化开关,如何编译PostgreSQL,同时对比一下GCC 4.4.6版本的性能. 安装clang 安装clang,需要更高版本的gcc来进行编译. 安装gcc 找一个比较快的镜像下载源码包 https://gcc.gnu.org/mirrors.html ftp://ftp.

基于LLVM的编译原理简明教程 (1) - 写编译器越来越容易了

基于LLVM的编译原理简明教程 (1) - 写编译器越来越容易了 进入21世纪,新的编程语言如雨后春笋一样不停地冒出来.需求当然是重要的驱动力量,但是在其中起了重要作用的就是工具链的改善. 2000年,UIUC的Chris Lattner主持开发了一套称为LLVM(Low Level Virtual Machine)的编译器工具库套件.后来,LLVM的scope越来越大,Low Level Virtual Machine已经不足以表示LLVM的全部,于是,LLVM就变成了正式的名字.LLVM可以

llvm + gpdb 环境搭建

编译安装llvm 是为了更好的去研究GPDB 对llvm 的支持 一.准备阶段 使用的版本是centos 7 基础环境配置 gcc = 4.8.5 python = 2.7.5 (注释 必须 >= 2.7) perl = 5.16.3 cmake = 3.5.0 二.源码下载 使用如下命令下载源码 llvm 版本为 3.7.1 #!/bin/bash # 下载 llvm 3.7.1 wget http://releases.llvm.org/3.7.1/llvm-3.7.1.src.tar.xz

Modern source-to-source transformation with Clang and libTooling

Modern source-to-source transformation with Clang and libTooling May 1st, 2014 at 7:08 pm I couple of years ago I published a blog post named Basic source-to-source transformation with Clang, in which I presented a small but complete sample of a tool