LLVM每日谈之十六 LLVM的学习感悟

这些总结并非我自己写的,而是摘自LLVM的版本比较老的文档中。因为老版本的文档已经鲜有人关注了,而这篇总结的非常好,到现在也很有用处,所以就把这部分内容贴出来了。这只是原文档的一部分。

原文档地址:http://llvm.org/releases/1.1/docs/Stacker.html

正文内容:

Lessons I Learned About LLVM

Everything's a Value!

Although I knew that LLVM uses a Single Static Assignment (SSA) format, it wasn't obvious to me how prevalent this idea was in LLVM until I reallystarted using it. Reading the
Programmer's Manual and Language Reference,I noted that most of the important LLVM IR (Intermediate Representation) C++ classes were derived from the Value class. The full power of that simpledesign only became fully understood once I started constructing executableexpressions
for Stacker.

This really makes your programming go faster. Think about compiling codefor the following C/C++ expression:

(a|b)*((x+1)/(y+1))
. Assumingthe values are on the stack in the order a, b, x, y, this could beexpressed in stacker as:

1 + SWAP 1 + / ROT2 OR *
.You could write a function using LLVM that computes this expression like this:

Value* expression(BasicBlock*bb, Value* a, Value* b, Value* x, Value* y ){
 Instruction* tail = bb-<getTerminator();
 ConstantSInt* one = ConstantSInt::get( Type::IntTy, 1);
 BinaryOperator* or1 = 	BinaryOperator::create( Instruction::Or, a, b, "", tail );
 BinaryOperator* add1 = 	BinaryOperator::create( Instruction::Add, x, one, "", tail );
 BinaryOperator* add2 =	BinaryOperator::create( Instruction::Add, y, one, "", tail );
 BinaryOperator* div1 = 	BinaryOperator::create( Instruction::Div, add1, add2, "", tail);
 BinaryOperator* mult1 = 	BinaryOperator::create( Instruction::Mul, or1, div1, "", tail );    return mult1;}

"Okay, big deal," you say. It is a big deal. Here's why. Note that I didn'thave to tell this function which kinds of Values are being passed in. They could beInstructions,

Constant
s,
GlobalVariable
s, etc. Furthermore, if you specify Values that are incorrect for this sequence of operations, LLVM will either notice right away (at compilation time) or the LLVM Verifier will pick up the inconsistency when the compiler runs. In no case
will you make a type error that gets passed through to the generated program. This
really helps you write a compiler that always generates correct code!

The second point is that we don't have to worry about branching, registers,stack variables, saving partial results, etc. The instructions we create
are the values we use. Note that all that was created in the abovecode is a Constant value and five operators.Each of the instructions
is the resulting value of that instruction. This saves a lot of time.

The lesson is this:
SSA form is very powerful: there is no differencebetween a value and the instruction that created it. This is fullyenforced by the LLVM IR. Use it to your best advantage.

Terminate Those Blocks!

I had to learn about terminating blocks the hard way: using the debugger to figure out what the LLVM verifier was trying to tell me and begging forhelp on the LLVMdev mailing list. I hope you
avoid this experience.

Emblazon this rule in your mind:

  • AllBasicBlocks in your compiler
    must be terminated with a terminating instruction (branch, return, etc.).

Terminating instructions are a semantic requirement of the LLVM IR. Thereis no facility for implicitly chaining together blocks placed into a functionin the order they occur. Indeed, in the general
case, blocks will not beadded to the function in the order of execution because of the recursiveway compilers are written.

Furthermore, if you don't terminate your blocks, your compiler code will compile just fine. You won't find out about the problem until you're running the compiler and the module you just created
fails on the LLVM Verifier.

Concrete Blocks

After a little initial fumbling around, I quickly caught on to how blocksshould be constructed. In general, here's what I learned:

  1. Create your blocks early. While writing your compiler, you will encounter several situations where you know apriori that you will need several
    blocks. For example, if-then-else, switch, while, and for statements in C/C++ all need multiple blocks for expression in LVVM. The rule is, create them early.
  2. Terminate your blocks early. This just reduces the chances that you forget to terminate your blocks which is required (go
    here for more).
  3. Use getTerminator() for instruction insertion. I noticed early on that many of the constructors for the Instruction classes take an optional

    insert_before
    argument. At first, I thought this was a mistake because clearly the normal mode of inserting instructions would be one at a time
    after some other instruction, not
    before. However, if you hold on to your terminating instruction (or use the handy dandy

    getTerminator()
    method on a
    BasicBlock
    ), it can always be used as the
    insert_before
    argument to your instruction constructors. This causes the instruction to automatically be inserted in the RightPlace place, just before the terminating instruction. The nice thing about this design is that you can pass blocks around and
    insert new instructions into them without ever knowing what instructions came before. This makes for some very clean compiler design.

The foregoing is such an important principal, its worth making an idiom:

BasicBlock* bb = new BasicBlock();
bb-<getInstList().push_back( new Branch( ... ) );
new Instruction(..., bb-<getTerminator() );

To make this clear, consider the typical if-then-else statement(see StackerCompiler::handle_if() method). We can set this upin a single function using LLVM in the following way:

using namespace llvm;
BasicBlock*MyCompiler::handle_if( BasicBlock* bb, SetCondInst* condition ){
 // Create the blocks to contain code in the structure of if/then/else
 BasicBlock* then = new BasicBlock();
 BasicBlock* else = new BasicBlock();
 BasicBlock* exit = new BasicBlock();    // Insert the branch instruction for the "if"
bb-<getInstList().push_back( new BranchInst( then, else, condition ) );    // Set up the terminating instructions    then-<getInstList().push_back( new BranchInst( exit ) );    else-<getInstList().push_back( new BranchInst( exit ) );    // Fill in the then part .. details excised for brevity    this-<fill_in( then );    // Fill in the else part .. details excised for brevity    this-<fill_in( else );    // Return a block to the caller that can be filled in with the code    // that follows the if/then/else construct.    return exit;}

Presumably in the foregoing, the calls to the "fill_in" method would add the instructions for the "then" and "else" parts. They would use the third partof the idiom almost exclusively (inserting
new instructions before the terminator). Furthermore, they could even recurse back to

handle_if
should they encounter another if/then/else statement, and it will just work.

Note how cleanly this all works out. In particular, the push_back methods onthe

BasicBlock
's instruction list. These are lists of type
Instruction
which also happen to be
Value
s. To create the "if" branch, we merely instantiate a
BranchInst
that takes as arguments the blocks to branch to and the condition to branch on. The blocksact like branch labels! This new

BranchInst
terminatesthe
BasicBlock
provided as an argument. To give the caller a wayto keep inserting after calling

handle_if
, we create an "exit" blockwhich is returned to the caller. Note that the "exit" block is used as the terminator for both the "then" and the "else" blocks. This guarantees that nomatter what else "handle_if" or "fill_in" does, they end up at
the "exit" block.

push_back Is Your Friend

One of the first things I noticed is the frequent use of the "push_back"method on the various lists. This is so common that it is worth mentioning.The "push_back" inserts a value into an STL list,
vector, array, etc. at theend. The method might have also been named "insert_tail" or "append".Although I've used STL quite frequently, my use of push_back wasn't veryhigh in other programs. In LLVM, you'll use it all the time.

The Wily GetElementPtrInst

It took a little getting used to and several rounds of postings to the LLVMmailing list to wrap my head around this instruction correctly. Even though I hadread the Language Reference and Programmer's
Manual a couple times each, I stillmissed a few
very
key points:

  • GetElementPtrInst gives you back a Value for the last thing indexed.
  • All global variables in LLVM are
    pointers
    .
  • Pointers must also be dereferenced with the GetElementPtrInst instruction.

This means that when you look up an element in the global variable (assumingit's a struct or array), you
must deference the pointer first! For manythings, this leads to the idiom:

std::vector index_vector;
index_vector.push_back( ConstantSInt::get( Type::LongTy, 0 );// ... push other indices ...
GetElementPtrInst* gep = new GetElementPtrInst( ptr, index_vector );

For example, suppose we have a global variable whose type is [24 x int]. Thevariable itself represents a
pointer to that array. To subscript thearray, we need two indices, not just one. The first index (0) dereferences thepointer. The second index subscripts the array. If you're a "C" programmer,
thiswill run against your grain because you'll naturally think of the global arrayvariable and the address of its first element as the same. That tripped me upfor a while until I realized that they really do differ .. by
type.Remember that LLVM is a strongly typed language itself. Everythinghas a type. The "type" of the global variable is [24 x int]*. That is, it'sa pointer to an array of 24 ints. When you
dereference that global variable witha single (0) index, you now have a "[24 x int]" type. Althoughthe pointer value of the dereferenced global and the address of the zero'th elementin the array will be the same, they differ in their type. The zero'th element
hastype "int" while the pointer value has type "[24 x int]".

Get this one aspect of LLVM right in your head, and you'll save yourselfa lot of compiler writing headaches down the road.

Getting Linkage Types Right

Linkage types in LLVM can be a little confusing, especially if your compilerwriting mind has affixed very hard concepts to particular words like "weak","external", "global", "linkonce", etc. LLVM
does not use the precisedefinitions of, say, ELF or GCC, even though they share common terms. To be fair,the concepts are related and similar but not precisely the same. This can leadyou
to think you know what a linkage type represents but in fact it is slightlydifferent. I recommend you read the
Language Reference on this topic very carefully. Then, read it again.

Here are some handy tips that I discovered along the way:

  • Uninitialized means external. That is, the symbol is declared in the current module and can be used by that module but it is not defined by that module.
  • Setting an initializer changes a global's linkage type from whatever it was to a normal, defined global (not external). You'll need to call the setLinkage() method to reset it if you specify the initializer after the
    GlobalValue has been constructed. This is important for LinkOnce and Weak linkage types.
  • Appending linkage can be used to keep track of compilation information at runtime. It could be used, for example, to build a full table of all the C++ virtual tables or hold the C++ RTTI data, or whatever. Appending
    linkage can only be applied to arrays. The arrays are concatenated together at link time.

Constants Are Easier Than That!

Constants in LLVM took a little getting used to until I discovered a few utilityfunctions in the LLVM IR that make things easier. Here's what I learned:

  • Constants are Values like anything else and can be operands of instructions
  • Integer constants, frequently needed, can be created using the static "get" methods of the ConstantInt, ConstantSInt, and ConstantUInt classes. The nice thing about these is that you can "get" any kind of integer quickly.
  • There's a special method on Constant class which allows you to get the null constant for
    any type. This is really handy for initializing large arrays or structures, etc.

这几条都让我感触非常深。但是给我感觉收获最大的还是第一条。等于让我对LLVM的整体认识一下子清晰了很多,对LLVM的理解跟原来马上就不在一个档次之上了。一切都是值!

强烈推荐认真读一读这篇感悟。目前以我对LLVM的理解,写不出这个层次的感悟,但是也会尝试去写一篇,这样可以加深自己的理解的同时,发现自己的不足之处。

时间: 2024-08-03 20:12:20

LLVM每日谈之十六 LLVM的学习感悟的相关文章

LLVM每日谈之十九 LLVM的第一本系统的书&amp;lt;Getting Started with LLVM Core Libraries&amp;gt;

作者:史宁宁(snsn1984) LLVM终于有了一本系统的书了--<Getting Started with LLVM Core Libraries>.这本书号称是LLVM的第一本书,但是据说日本早就有两本日文的关于LLVM的书,这个了解的不多.不过可以肯定的是,这本书是英文表述的第一本书. 这本书的覆盖范围很广,从简单的如何安装LLVM一直到各个部分的介绍,以及如何使用这些部分去创建自己的工具,都有所介绍.对于想使用LLVM去创建自己的工具的人,完全是够用的,而且是非常不错的一本书.这也是

LLVM每日谈之十五 LLVM自带的examples

 作者:snsn1984            在LLVM源码的目录下,有一个目录叫做examples,这个目录下边有几个LLVM的例子,初学者一般不会太关注这些例子,但是这些例子确实是精华中的精华,在LLVM的学习过程中不可避免的要对这些例子所涉及内容要精通.所以希望大家一定要对这些例子重视起来,它们都是经典中的经典.从我个人而言,从这些例子中学到了很多,文档说了很多都没能让人明白的东西,在这里,一个简单的例子就讲的很清楚了.         首先,要想执行这些例子,可以在cmake的时候加上

LLVM每日谈之十三 使用LLVM自带的PASS

作者:snsn1984 PS:最近一段时间,投入在LLVM上的时间有些减少.差点把对它的研究断掉,今天开始继续.对LLVM的研究需要很长一段时间的坚持不懈才可以彻底搞明白. 前面已经介绍过如何写自己的PASS,并且也针对一个简单的PASS进行了分析和介绍.相信大家也可以从LLVM源码中找到很多的PASS,这些PASS太多,他们每个到底是做什么用?如何去调用这些系统已经有的PASS?这就是这次每日谈要关注的问题. 在文档 http://llvm.org/docs/Passes.html 中,列出了

LLVM每日谈之十四 如何给Clang添加一个属性

每次内容主要出自文档: "Clang" CFE Internals Manual 地址: http://clang.llvm.org/docs/InternalsManual.html 在这个文档中,以简明的的一个例子说明了如何按照这个方式在Clang中添加一个属性.讲的很清楚,我在这里不再累述,只是将例子中的部分代码和文档结合起来,一起展现给大家. How to add an attribute¶ To add an attribute, you'll have to add it

LLVM每日谈之十八 GEP Instruction的几点总结

In summary, here's some things to always remember about the GetElementPtr instruction: The GEP instruction never accesses memory, it only provides pointer computations. The first operand to the GEP instruction is always a pointer and it must be index

LLVM每日谈之十 LLVM Test简介

作者:snsn1984 就像很多优秀的软件都有自己的测试框架一样,LLVM也有自己的测试框架.LLVM的测试主要分为两 部分:regression tests 和 whole programs.其中,regression tests主要位于LLVM源码/test目录下,主要是 用来测试LLVM的一些特性或者是用来测试一些bug的.我们在平时的工作中,一般使用的就是这个类型的test. whole programs这个类型的测试,不在LLVM的源码中,在SVN库上有自己的单独目录,一般称之为tes

LLVM每日谈之二十三 LLVM/Clang编译Linux内核资料

作者:史宁宁(snsn1984) 之前有朋友问到这个问题,是否有使用LLVM/Clang编译Linux内核的,随手找了一些相关资料,在这里贴出来,与大家共享. 网址:http://llvm.linuxfoundation.org 项目简介: LLVMLinux Project Overview This project aims to fully build the Linux kernel using Clang which is the C front end for the LLVM co

LLVM每日谈之七 Clang

作者:snsn1984         官方介绍Clang是LLVM native的一个面向C/C++/Objective-C的编译器,目标是要提供一个编译非常快的编译器.他们宣称在Debug模式下编译Objective-C比GCC快3倍.至于是否能比GCC快这么多,总有人去做对比,大家可以搜索下,好像确实快了些. Clang一般被说是LLVM的一个前端.关于前端的定义,大家可以从LLVM每日谈之二里面看到前端在整个LLVM体系中的位置.Clang当初的定位就是要做下一个十年中一个伟大的前端,这

LLVM每日谈之四 Pass初探

作者:snsn1984 LLVM 的Pass框架是LLVM系统的一个很重要的部分.每个Pass都是做优化或者转变的工作,LLVM的优化和转换工作就是由很多个Pass来一起完成的. 所以按照我的理解,Pass就是LLVM系统转化和优化的工作的一个节点,每个节点做一些工作,这些工作加起来就构成了LLVM整个系统的优化和转化.Pass架构这么做的话,可重用性非常好,你可以选择已有的一些Pass,自己去构建出自己想要的优化和转化效果.并且自己也可以重新写Pass去做自己想要的优化和转变,因为每个Pass