GSoC/GCI Archive
Google Summer of Code 2013

LLVM Compiler Infrastructure

Web Page:

Mailing List:

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Despite its name, LLVM has little to do with traditional virtual machines, though it does provide helpful libraries that can be used to build them.

LLVM began as a research project at the University of Illinois, with the goal of providing a modern, SSA-based compilation strategy capable of supporting both static and dynamic compilation of arbitrary programming languages. Since then, LLVM has grown to be an umbrella project consisting of a number of different subprojects, many of which are being used in production by a wide variety of commercial and open sourceprojects as well as being widely used in academic research. Code in the LLVM project is licensed under the "UIUC" BSD-Style license.

The primary sub-projects of LLVM are:

  1. The LLVM Core libraries provide a modern source- and target-independent optimizer, along with code generation support for many popular CPUs (as well as some less common ones!) These libraries are built around a well specified code representation known as the LLVM intermediate representation ("LLVM IR"). The LLVM Core libraries are well documented, and it is particularly easy to invent your own language (or port an existing compiler) to use LLVM as an optimizer and code generator.

  2. Clang is an "LLVM native" C/C++/Objective-C compiler, which aims to deliver amazingly fast compiles (e.g. about 3x faster than GCCwhen compiling Objective-C code in a debug configuration), extremely useful error and warning messages and to provide a platform for building great source level tools. The Clang Static Analyzer is a tool that automatically finds bugs in your code, and is a great example of the sort of tool that can be built using the Clang frontend as a library to parse C/C++ code.

  3. dragonegg integrates the LLVM optimizers and code generator with the GCC 4.5 parsers. This allows LLVM to compile Ada, Fortran, and other languages supported by the GCC compiler frontends, and access to C features not supported by Clang (such as OpenMP).

  4. The LLDB project builds on libraries provided by LLVM and Clang to provide a great native debugger. It uses the Clang ASTs and expression parser, LLVM JIT, LLVM disassembler, etc so that it provides an experience that "just works". It is also blazing fast and much more memory efficient than GDB at loading symbols.

  5. The libc++ and libc++ ABI projects provide a standard conformant and high-performance implementation of the C++ Standard Library, including full support for C++'0x.

  6. The compiler-rt project provides highly tuned implementations of the low-level code generator support routines like "__fixunsdfdi" and other calls generated when a target doesn't have a short sequence of native instructions to implement a core IR operation.

  7. The vmkit project is an implementation of the Java and .NET Virtual Machines that is built on LLVM technologies.

  8. The polly project implements a suite of cache-locality optimizations as well as auto-parallelism and vectorization using a polyhedral model.

  9. The libclc project aims to implement the OpenCL standard library.

  10. The klee project implements a "symbolic virtual machine" which uses a theorem prover to try to evaluate all dynamic paths through a program in an effort to find bugs and to prove properties of functions. A major feature of klee is that it can produce a testcase in the event that it detects a bug.

  11. The SAFECode project is a memory safety compiler for C/C++ programs. It instruments code with run-time checks to detect memory safety errors (e.g., buffer overflows) at run-time. It can be used to protect software from security attacks and can also be used as a memory safety error debugging tool like Valgrind.

In addition to official subprojects of LLVM, there are a broad variety of other projects that use components of LLVM for various tasks. Through these external projects you can use LLVM to compile Ruby, Python, Haskell, Java, D, PHP, Pure, Lua, and a number of other languages. A major strength of LLVM is its versatility, flexibility, and reusability, which is why it is being used for such a wide variety of different tasks: everything from doing light-weight JIT compiles of embedded languages like Lua to compiling Fortran code for massive super computers.

As much as everything else, LLVM has a broad and friendly community of people who are interested in building great low-level tools. If you are interested in getting involved, a good first place is to skim the LLVM Blog and to sign up for the LLVM Developer mailing list. For information on how to send in a patch, get commit access, and copyright and license topics, please see the LLVM Developer Policy.


  • Enhancing Giri: Dynamic Slicing in LLVM Dynamic program slicing has been used in many applications. Giri was a re- search project from UIUC, which im- plemented the dynamic backward slicing in LLVM. I think it’s a good idea to extend this project in several ways: 1) Update the code to LLVM mainline and make it robust, 2) Improve the performance of giri run-time, 3) Reduce the trace size, etc.
  • FastPolly: Reducing LLVM-Polly Compile-Time Overhead LLVM-Polly is a promising polyhedral optimizer for data-locality and parallelism. However, experimental results show that Polly analysis and optimization can lead to significant compile-time overhead. On average, Polly optimization will increase the compile time by 393% for PolyBench benchmarks and by 53% for MediaBench benchmarks. That means if you want to gain from Polly, you have to pay 4 times extra compile-time overhead. Even if you do not want to gain much from Polly, you still have to pay 53% compile-time overhead. Such expensive compile-time overhead would make Polly much less attractive to LLVM users. I argue that maintaining fast compile time when Polly is enabled is very important, especially if we think of enabling Polly in default for all LLVM users. Based on this assumption, this project tries to reduce Polly compile-time overhead by revising a large number of Polly passes. First, I will revising some hot Polly passes that dominate the total compile-time overhead; Second, I will revisit Polly canonicalization passes and try to let the Polly bail out early, so Polly will not cause much overhead when it cannot optimize program; Third, I will revisit and improve Polly optimization passes and code generation passes, so Polly can be much faster when it can optimize program. I hope this project can benefit both LLVM users and Polly users. For LLVM users who care more about compile-time overhead, it enables Polly to provide extra performance gains within little extra compile-time overhead. For Polly users who care more about code quality, this project will significantly reduce the compile-time overhead without performance loss.
  • Flang. Flang is a frontend for the Fortran programming language.
  • Improving Clang C++ Modernizer (f.k.a C++11 Migrator) The purpose of the C++11 Migrator is to do source-to-source translation to migrate existing C++ code to use C++11 features to enhance maintainability, readability, runtime performance, and compile-time performance. -- The migrator is a young tool who still requires a lot of development. The proposed project consists of adding new transformations as well as to improve existing functionality.