Clang AST parsing for automated code generation


Syntax traversal is a powerful tool. With it you can automate repetitive tasks, search for semantic errors, generate wrappers, and so much more.  A few months ago I hit a hump (read: a f***ing mountain) of an issue with some legacy code that has
been on my plate for awhile now.

Having killed a small forest’s worth of paper I decided that manually tracing paths through code was an inefficient use of my time.  Instead I went in search an automatic method for generating an abstract
syntax tree
(AST) for C++ code.  My idea was that I could use the AST to generate something like a direct graph to better visualize code flow.

There are a few flavors of readable syntax generation out there (and likely more):

I’ve been a fan of Clang for awhile now and they have a very robust and active community making it a natural choice for my AST generation needs.  Clang also has decent articles on getting
started in both Windows and Linux
.  If you don’t have Clang installed, I suggest reading that linked article.  You’ll need compiled versions of clang.exe and libclang.dll to follow along with the Python binding below.


Clang at revision 183352 (2013-06-05)  has a slight issue in that it won’t identify Linkage specifications (e.g. extern “C” void foo()).  To fix
this issue
, follow these steps from my SO answer:

//Bit of a necroanswer but if you go in to \llvm\tools\clang\lib\Sema\SemaCodeComplete.cpp and add the following line:
case Decl::LinkageSpec:  return CXCursor_LinkageSpec;
//To the switch in:
CXCursorKind clang::getCursorKindForDecl(const Decl *D)
//It should resolve the issue of clang's Python binder
//returning UNEXPOSED_DECL instead of the correct LINKAGE_SPEC.
//This change was made at revision 183352(2013-06-05).
//Example from my version:
CXCursorKind clang::getCursorKindForDecl(const Decl *D) {
if (!D)
    return CXCursor_UnexposedDecl;
switch (D->getKind()) {
    case Decl::Enum:               return CXCursor_EnumDecl;
    case Decl::LinkageSpec:  return CXCursor_LinkageSpec;
   // ......



Libclang is Clang’s dynamic binding that is used in conjunction w/ Python to allow for interpreted code evaluation.  Eli Bendersky has a great
 on using libclang that I referenced frequently while writing code.  Clang documentation can be very lacking in some areas and Eli’s post does a good job of explaining the steps to getting libclang working with Python.  If you follow his
steps the basic pipeline is:

  • Compile libclang
  • Add libclang to your PATH environment variable
    • On *Nix it’s LD_LIBRARY_PATH
    • On Windows it’s the standard PATH
    • Or do it in python: os.environ['PATH'] = ‘/path/to/libclang’
  • Copy the Clang/Python bindings from /llvm/tools/clang/bindings/python to your python installation or however you’d prefer to install it.
  • Verify it works by opening a python console and typing: improt clang.cindex
  • Squee when it works


Once libclang is tied to Python it’s time to test your code.  When I got to this step I had trouble finding any good examples.  There are really only 2 and they can be found in your Clang installation folder: llvm\tools\clang\bindings\python\examples\cindex.
 Others can be gleaned from blog posts and StackOverflow.  Here is a simple example I adapted that looks specifically for the LINKAGE_SPEC cursor type. LINKAGE_SPEC refers to code like `extern “C”`

#!/usr/bin/env python
import os
import sys
from pprint import pprint
import clang.cindex
os.environ['PATH'] = os.environ['PATH']  + os.getcwd()
def get_info(node, depth=0):
	return { 'kind' : node.kind,
             'usr' : node.get_usr(),
             'spelling' : node.spelling,
             'location' : node.location,
             'extent.start' : node.extent.start,
             'extent.end' : node.extent.end,
             'is_definition' : node.is_definition()}
def output_cursor_and_children(cursor, level=0):
	#Represents code of the type:  extern "C" void foo()
	if cursor.kind == clang.cindex.CursorKind.LINKAGE_SPEC:
		pprint(('nodes', get_info(cursor)))
	# Recurse for children of this cursor
	has_children = False;
	for c in cursor.get_children():
		if not has_children:
			has_children = True
		output_cursor_and_children(c, level+1)
def main():
	from clang.cindex import Index
	from pprint import pprint
	from optparse import OptionParser, OptionGroup
	global opts
	parser = OptionParser("usage: %prog {filename} [clang-args*]")
	(opts, args) = parser.parse_args()
	if len(args) == 0:
		print 'invalid number arguments'
	index = Index.create()
	tu = index.parse(None, args)
	if not tu:
		print "unable to load input"
if __name__ == '__main__':
#include "test.h"
int main(){
	Foo f;
	return 0;
#ifndef TEST_H
#define TEST_H
class Foo
	int data_;
	void bar(int data){data_ = data;}
extern "C" __declspec( dllexport )void test1(){}

How to run:

python test.cpp


There are so many other ways to make use of ASTs and I wish I had more time to include some of them.  Suffice it to say I’ll probably end up posting about ASTs a few more times.  At least until I work through enough examples to meet my immediate

时间: 2024-08-30 11:54:29

Clang AST parsing for automated code generation的相关文章

MaxCompute 中的Code Generation技术简介

前言       在<数据库系统中的Code Generation技术介绍>中,我们简单介绍了一下Code Generation技术及其在大规模OLAP系统,特别是大规模分布式OLAP系统中的重要性.MaxCompute采用了Code Generation技术来提高计算效率.在MaxCompute2.0中,我们又引入了基于LLVM的JIT(Just In Time) Code Generation技术.结合向量化的执行引擎,基于SIMD技术的执行效率优化等方式,较之MaxCompute 1.0

[转]T4 Code Generation

[原文:]Rob beat me to it. Blogging about T4 (the Text Template Transformation Toolkit) had been on my list literally for a year. He and I were

(转)Awesome Courses

  Awesome Courses  Introduction There is a lot of hidden treasure lying within university pages scattered across the internet. This list is an attempt to bring to light those awesome courses which make their high-quality material i.e. assignments, le

Generate C interface from C++ source code using Clang libtooling

原文地址 Generate C interface from C++ source code using Clang libtooling Dec 6, 2016 · 24 minute read · Comments clangllvmCC++I developed a concurrent user-thread library using C++, and was looking for a way to provide a C interface and make the functio

“Clang” CFE Internals Manual---中文版---&quot;Clang&quot;C语言前端内部手册

原文地址: 译者:史宁宁(snsn1984)                                                                                           "Clang"C语言前端内部手册 简介 这个文档描述了比较重要的API中的一部分API,还表述了Clang C语言前端中的一些内部设计想法.这个文档的目的是既把握住高层次

Modern source-to-source transformation with Clang and libTooling

Modern source-to-source transformation with Clang and libTooling May 1st, 2014 at 7:08 pm I couple of years ago I published a blog post named Basic source-to-source transformation with Clang, in which I presented a small but complete sample of a tool

Devirtualization in LLVM and Clang

Devirtualization in LLVM and Clang This blog post is part of a series of blog posts from students who were funded by the LLVM Foundation to attend the 2016 LLVM Developers' Meeting in San Jose, CA. Please visit the LLVM Foundation's webpage for more

Path to a Trusted Front-end Code Protection

Introduction As an appealing objective in the information security field, a trusted system refers to a system that achieves a certain degree of trustworthiness by implementing specific security policies. For computers, the Trusted Platform Module (TP

Entity Framework 5.0系列之自动生成Code First代码

在前面的文章中我们提到Entity Framework的"Code First"模式也同样可以基于现有数据库进行开发.今天就让我们一起看一下使用Entity Framework Power Tools如何基于现有数据库生成数据类和数据库上下等. Entity Framework Power Tools 基于现有数据库生成POCO数据类和数据库上下文需要借助Visual Studio一个扩展插件-- Entity Framework Power Tools(一个Code First反向工