Tag Archives: GCC Plugin

Building GCC Plugins – Part 3 C++ Libraries

As discussed in the prior post, I have started a set of C++ libraries to reduce the complexity of writing GCC Plugins and interpreting the GCC Abstract Syntax Tree.  In this post I will provide a high level description of the libraries and walk through the dependencies and directory structures.  The libraries are available on Github: ‘stephanfr/GCCPlugin’.

NB – At the time of writing, I am going through successive revisions and refactoring passes on the library, so expect anything you build now to break with my next commit to GitHub.  The interfaces will settle down in time and I will ‘chill’ them at some point in hopefully the not too distant future.

Licensing and Dependencies

All of the libraries with the exception of the unit test library link directly with the GCC source code, therefore they are all licensed with GPL V3.0.  The libraries are built with the C++11 language features and have dependencies on the Standard Library shipped with GCC and Boost libraries.  The unit testing framework depends on the Google Test libraries.

Programming Style

For what it is worth, I’ve been writing C++ code for a long, long time and am somewhat opinionated regarding some development practices.  First, I use anything in the standard c++ library – in particular I do not write containers.  Second, I use the std::string class in preference of char* strings almost exclusively.  For external interfaces I may expose a char* type but under the interface any char* will almost always map straight back to a std::string instance.  Third, I use anything from the Boost library that suits my needs.  The Boost libraries are excellent, don’t waste your time re-inventing a component in that library; in all likelihood your component will not be as good anyway.  Fourth, there are some naked pointers in these libraries but in general I try to use a std::unique_ptr or std::shared_ptr in any code written today (I will fix any naked pointers in this library as I refactor).  The standard library smart pointers are a bit more difficult to use than naked pointers, but that difficulty is a result of them enforcing the semantics necessary to know when to delete a pointer they wrap.  Finally, I really like C++ 11 – I’d strongly suggest cutting over to it.

With regard to my coding format, it is idiosyncratic.  Indentation and spacing don’t quite adhere to any standard, but at least I no longer use Hungarian notation – though that was a hard habit to kick.

Project and Directory Structure

The project is currently composed of seven directories, each with a single Eclipse CDT C++ project:

  1. CPPLanguageModel – a compiler-neutral class library of C++ language elements
  2. GCCInternalsTools – a set of classes and functions tailored specifically to the GCC g++ compiler to build a CPPLanguageModel representation of the code being compiled and to enable insertion of new code into the AST
  3. GCCInternalsUTFixture – a test fixture providing an abstraction of the GCCInternalsTools designed to permit the creation of unit tests for the library without any dependency on the GCC specific libraries themselves
  4. GCCInternalsUnitTest – a set of unit tests for key features of the GCC Plugins libraries
  5. TestExtensions – a collection of test framework ‘plugins’ that rely on GCCInternalsTools and the GCC headers; a separate project is used to prevent dependencies on GCC internals to leak into the main Unit Test framework
  6. GCCPlugin – a ‘HelloWorld’ style plugin for GCC Plugins using this framework
  7. Utility – Various utility classes to simply coding and implement design patterns I like

The most up-to-date examples of using the libraries will be in the unit test projects.  Similarly, if you go wandering through the code you will frequently see blocks of code commented out.  I tend to leave code I have refactored in place for a revision or two just in case a bug crawls out.  I find it is a bit easier than going back through prior revisions in source code control but it can make the code a little messy at points.  When I get to a version I am happy with, I go through a couple cleaning passes and knock out dead or legacy code.

Design Philosophy

The innards of GCC are absolutely not for the faint of heart.  A primary design goal of this framework is to insulate someone wanting to produce a GCC Plugin from the complexity of the compiler and its design paradigms.  At present, only a single GCC header file is required to build a plugin with this framework and all functionality exposed through the framework’s API is abstracted from GCC itself.  The framework is built for manipulating the Abstract Syntax Tree for C++ language programs but could be modified to match other languages.

To use the framework, you ought to only include header files from the CPPLanguageModel project.  Actually, the ASTDictionary.h and PluginManager.h header files will pull in most of the declarations needed to build your plugin.  Two header files from the gcc distribution are also needed: config.h and gcc-plugin.h

The object model exposed by the framework is that of a Dictionary of all the types and declarations in the code being compiled by g++ with the plugin loaded.  The dictionary is indexed by namespace, entry fully qualified name, entry source code location, entry UID and and an identity field.  All of the indices are exposed by the ASTDictionary class and can be used for searching the dictionary for a specific entry.  The identity, UID and fully qualified name indices are unique whereas the namespace and source location indices are non unique and may return a range of results.

The dictionary contains entries for different types and declarations.  Entries will be one of the following ‘kinds’: CLASS, UNION, FUNCTION, GLOBAL_VAR, TEMPLATE or UNRECOGNIZED.  The UNRECOGNIZED kind is simply a catch-all for any AST tree elements that have not yet been added to the tree parser.  Dictionary entries are effectively stubs from which the actual definition of the entry may be extracted.  Definitions contain the detailed, ‘kind’ specific information about the entry.  For example, the ClassDefinition object contains the base classes, fields, methods, template methods and friends for the class type.  Source location, namespace, UID, static and extern flags and a list of attributes are available for all dictionary entries and those values are copied into the more detailed definitions as well.

I’ve tried to insure that the AST tree parser will pass through the tree adding dictionary entries for elements it recognizes and ignoring everything else.  My intent is that it should not crash on encountering some language element it does not recognize in the AST but I have not run the parser over a whole lot of code so I will stick to ‘intent’ for now.  At present, the parser recognizes unions but does not yet provide a detailed definition of union types.  I figured it was more valuable to get some code injection functionality in place before sweating through the details of union representations in the GCC AST.

Current Supported Versions of GCC

The internals of GCC are constantly in flux and functionally there are no ‘frozen’ APIs or data structures that one can depend upon remaining static release over release.  The changes are unlikely to be significant release over release but there is a high probability of breaking changes associated with any release.

The code currently compiles and runs with GCC 4.8.0.  I can make no guarantees that it will compile and run with later releases, though hopefully nothing should break between double dot releases.

Example Plugin

An example ‘HelloWorld’ plugin appears below.  The four header files appear at the top.  The plugin_is_GPL_compatible symbol is needed for licensing compliance with the GCC suite.

There exists an implementation of the CPPModel::CallbackIfx interface which is used by the framework to call back into the plugin at specific times in the compilation process.  There are entry points for when the AST is ready, for a point at which namespaces may be declared and a point at which code may be injected.  For the sample plugin, all that happens is that the contents of the TestNamespace inside the code being compiled is dumped to cerr.  The plugin_init function is part of the GCC plugin framework and is rather straightforward when using these abstraction libraries.


/*-------------------------------------------------------------------------------
Copyright (c) 2013 Stephan Friedl.

All rights reserved. This program and the accompanying materials
are made available under the terms of the GNU Public License v3.0
which accompanies this distribution, and is available at
http://www.gnu.org/licenses/gpl.html

Contributors:
 Stephan Friedl
-------------------------------------------------------------------------------*/

#include "config.h"

#include "ASTDictionary.h"
#include "PluginManager.h"

#include "gcc-plugin.h"

int plugin_is_GPL_compatible;

class Callbacks : public CPPModel::CallbackIfx
{
public :

 Callbacks()
 {}

 virtual ~Callbacks()
 {}

 void ASTReady()
 {
 std::list<std::string> namespacesToDump( { "TestNamespace::" } );

 CPPModel::GetPluginManager().GetASTDictionary().DumpASTXMLByNamespaces( std::cerr, namespacesToDump );
 };

 void CreateNamespaces()
 {
 };

 void InjectCode()
 {
 };

};

Callbacks g_pluginCallbacks;

int plugin_init( plugin_name_args* info, plugin_gcc_version* ver )
{
 std::cerr << "Starting Plugin: "<< info->base_name << std::endl;

 CPPModel::GetPluginManager().Initialize( "HelloWorld Plugin", &g_pluginCallbacks );

 return( 0 );
}

 

Example Output

A sample program to be compiled appears below.  This code has the TestNamespace declared and it is the contents of that namespace that will be dumped by the plugin above.

#include <iostream>

namespace TestNamespace
{
	class TestClass
	{
	public :

		int			publicInt;

		int			getPublicInt() const
		{
			return( publicInt );
		}

	protected :

		double		getPrivateDouble() const
		{
			return( privateDouble );
		}

	private :

		double		privateDouble;
	};

	char*		globalString = "This is a global string";

	TestClass	globalTestClassInstance;
}

int main()
{
	std::cout << "!!!Hello World!!!" << std::endl; // prints !!!Hello World!!!

	return 0;
}

The command line required to invoke g++ with the plugin and compile the above file follows:

/usr/gcc-4.8.0/bin/gcc-4.8.0 -c -std=c++11 -fplugin=libGCCPlugin.so HelloWorld.cpp

When g++ initializes, it loads the sample plugin and when the AST is ready, the plugin dumps the following to the standard output.  It isn’t prefect XML but ought to be good enough to analyze the program being compiled.

8: 2014-09-23 21:03:42   [LoggingInitialization] [NORMAL]  Logging Initiated
Starting Plugin: libGCCPlugin
HelloWorld.cpp:34:24: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings]
  char*  globalString = "This is a global string";
                        ^
<ast>
    <dictionary>
        <namespace name="TestNamespace::">
            <dictionary_entry>
                <namespace>
                    <name>TestNamespace::</name>
                </namespace>
                <name>TestClass</name>
                <uid>20720</uid>
                <source-info>
                    <file>HelloWorld.cpp</file>
                    <line>9</line>
                    <char-count>1</char-count>
                    <location>6451683</location>
                </source-info>
            </dictionary_entry>
            <dictionary_entry>
                <namespace>
                    <name>TestNamespace::</name>
                </namespace>
                <name>globalString</name>
                <uid>28506</uid>
                <source-info>
                    <file>HelloWorld.cpp</file>
                    <line>34</line>
                    <char-count>1</char-count>
                    <location>6454884</location>
                </source-info>
                <static>true</static>
            </dictionary_entry>
            <dictionary_entry>
                <namespace>
                    <name>TestNamespace::</name>
                </namespace>
                <name>globalTestClassInstance</name>
                <uid>28507</uid>
                <source-info>
                    <file>HelloWorld.cpp</file>
                    <line>36</line>
                    <char-count>1</char-count>
                    <location>6455143</location>
                </source-info>
                <static>true</static>
            </dictionary_entry>
        </namespace>
    </dictionary>
    <elements>
        <namespace name="TestNamespace::">
            <class type="class">
                <name>TestClass</name>
                <uid>20720</uid>
                <source-info>
                    <file>HelloWorld.cpp</file>
                    <line>9</line>
                    <char-count>1</char-count>
                    <location>6451683</location>
                </source-info>
                <namespace>
                    <name>TestNamespace::</name>
                </namespace>
                <compiler_specific>
                    </artificial>
                </compiler_specific>
                <base-classes>
                </base-classes>
                <friends>
                </friends>
                <fields>
                    <field>
                        <name>publicInt</name>
                        <source-info>
                            <file>HelloWorld.cpp</file>
                            <line>13</line>
                            <char-count>1</char-count>
                            <location>6452196</location>
                        </source-info>
                        <type>
                            <kind>fundamental</kind>
                            <declaration>int</declaration>
                        </type>
                        <access>PUBLIC</access>
                        <static>false</static>
                        <offset_info>
                            <size>4</size>
                            <alignment>4</alignment>
                            <offset>0</offset>
                            <bit_offset_alignment>128</bit_offset_alignment>
                            <bit_offset>0</bit_offset>
                        </offset_info>
                    </field>
                    <field>
                        <name>privateDouble</name>
                        <source-info>
                            <file>HelloWorld.cpp</file>
                            <line>30</line>
                            <char-count>1</char-count>
                            <location>6454374</location>
                        </source-info>
                        <type>
                            <kind>fundamental</kind>
                            <declaration>double</declaration>
                        </type>
                        <access>PRIVATE</access>
                        <static>false</static>
                        <offset_info>
                            <size>8</size>
                            <alignment>8</alignment>
                            <offset>0</offset>
                            <bit_offset_alignment>128</bit_offset_alignment>
                            <bit_offset>64</bit_offset>
                        </offset_info>
                    </field>
                </fields>
                <methods>
                    <method>
                        <name>getPublicInt</name>
                        <uid>28497</uid>
                        <source-info>
                            <file>HelloWorld.cpp</file>
                            <line>15</line>
                            <char-count>1</char-count>
                            <location>6452452</location>
                        </source-info>
                        <access>PUBLIC</access>
                        <static>false</static>
                        <result>
                            <type>
                                <kind>fundamental</kind>
                                <declaration>int</declaration>
                            </type>
                        </result>
                        <parameters>
                            <parameter>
                                <name>this</name>
                                <type>
                                    <kind>derived</kind>
                                    <declaration>
                                        <operator>pointer</operator>
                                        <type>
                                            <kind>class-or-struct</kind>
                                            <declaration>TestNamespace::TestClass</declaration>
                                            <namespace>
                                                <name>TestNamespace::</name>
                                            </namespace>
                                        </type>
                                    </declaration>
                                </type>
                                <compiler_specific>
                                    </artificial>
                                </compiler_specific>
                            </parameter>
                        </parameters>
                    </method>
                    <method>
                        <name>getPrivateDouble</name>
                        <uid>28499</uid>
                        <source-info>
                            <file>HelloWorld.cpp</file>
                            <line>22</line>
                            <char-count>1</char-count>
                            <location>6453350</location>
                        </source-info>
                        <access>PROTECTED</access>
                        <static>false</static>
                        <result>
                            <type>
                                <kind>fundamental</kind>
                                <declaration>double</declaration>
                            </type>
                        </result>
                        <parameters>
                            <parameter>
                                <name>this</name>
                                <type>
                                    <kind>derived</kind>
                                    <declaration>
                                        <operator>pointer</operator>
                                        <type>
                                            <kind>class-or-struct</kind>
                                            <declaration>TestNamespace::TestClass</declaration>
                                            <namespace>
                                                <name>TestNamespace::</name>
                                            </namespace>
                                        </type>
                                    </declaration>
                                </type>
                                <compiler_specific>
                                    </artificial>
                                </compiler_specific>
                            </parameter>
                        </parameters>
                    </method>
                </methods>
                <template_methods>
                </template_methods>
            </class>
            <global_var_entry>
                <namespace>
                    <name>TestNamespace::</name>
                </namespace>
                <name>globalString</name>
                <uid>28506</uid>
                <source-info>
                    <file>HelloWorld.cpp</file>
                    <line>34</line>
                    <char-count>1</char-count>
                    <location>6454884</location>
                </source-info>
                <static>true</static>
                <type>
                    <kind>derived</kind>
                    <declaration>
                        <operator>pointer</operator>
                        <type>
                            <kind>fundamental</kind>
                            <declaration>char</declaration>
                        </type>
                    </declaration>
                </type>
            </global_var_entry>
            <global_var_entry>
                <namespace>
                    <name>TestNamespace::</name>
                </namespace>
                <name>globalTestClassInstance</name>
                <uid>28507</uid>
                <source-info>
                    <file>HelloWorld.cpp</file>
                    <line>36</line>
                    <char-count>1</char-count>
                    <location>6455143</location>
                </source-info>
                <static>true</static>
                <type>
                    <kind>class-or-struct</kind>
                    <declaration>TestNamespace::TestClass</declaration>
                    <namespace>
                        <name>TestNamespace::</name>
                    </namespace>
                </type>
            </global_var_entry>
        </namespace>
    </elements>
</ast>
Declaring Globals

Conclusion

It has taken a while to get this far but I will dive into the internals of the framework and provide examples of code injection in future posts.

 

Building GCC Plugins – Part 2: Introduction to GCC Internals

Once the basic scaffolding is in place for a GCC Plugin, the next step is to analyze and perhaps modify the Abstract Syntax Tree (AST) created by GCC as a result of parsing the source code.  GCC is truly a marvel of software engineering, it is the de-facto compiler for *nix environments and supports a variety of front ends for different langauages (even Ada…).  That said, the GCC AST is complex to navigate for a number of reasons.  First, parsing and representing a variety of languages in a common syntax tree is a complex problem so the solution is going to be complex.  Second, history – looking at the GCC internals is a bit like walking down memory lane; this is the way we wrote high-performance software when systems had limited memory (think 64k) and CPUs had low throughput (think 16Mhz clock cycles).  Prior to GCC 4.8.0, GCC was compiled with the C compiler, so don’t bother looking for C++ constructs in the source code.

The AST Tree

The primary element in the GCC AST is the ‘tree’ structure.  An introduction to the tree structure appears in the GCC Internals Documentation.  Figure 1 is extracted from the tree.h header file and provides a good starting place for a discussion of the GCC tree and how to approach programming with it.


union GTY ((ptr_alias (union lang_tree_node),
 desc ("tree_node_structure (&%h)"), variable_size)) tree_node {
 struct tree_base GTY ((tag ("TS_BASE"))) base;
 struct tree_typed GTY ((tag ("TS_TYPED"))) typed;
 struct tree_common GTY ((tag ("TS_COMMON"))) common;
 struct tree_int_cst GTY ((tag ("TS_INT_CST"))) int_cst;
 struct tree_real_cst GTY ((tag ("TS_REAL_CST"))) real_cst;
 struct tree_fixed_cst GTY ((tag ("TS_FIXED_CST"))) fixed_cst;
 struct tree_vector GTY ((tag ("TS_VECTOR"))) vector;
 struct tree_string GTY ((tag ("TS_STRING"))) string;
 struct tree_complex GTY ((tag ("TS_COMPLEX"))) complex;
 struct tree_identifier GTY ((tag ("TS_IDENTIFIER"))) identifier;
 struct tree_decl_minimal GTY((tag ("TS_DECL_MINIMAL"))) decl_minimal;
 struct tree_decl_common GTY ((tag ("TS_DECL_COMMON"))) decl_common;
 struct tree_decl_with_rtl GTY ((tag ("TS_DECL_WRTL"))) decl_with_rtl;
 struct tree_decl_non_common GTY ((tag ("TS_DECL_NON_COMMON"))) decl_non_common;
 struct tree_parm_decl GTY ((tag ("TS_PARM_DECL"))) parm_decl;
 struct tree_decl_with_vis GTY ((tag ("TS_DECL_WITH_VIS"))) decl_with_vis;
 struct tree_var_decl GTY ((tag ("TS_VAR_DECL"))) var_decl;
 struct tree_field_decl GTY ((tag ("TS_FIELD_DECL"))) field_decl;
 struct tree_label_decl GTY ((tag ("TS_LABEL_DECL"))) label_decl;
 struct tree_result_decl GTY ((tag ("TS_RESULT_DECL"))) result_decl;
 struct tree_const_decl GTY ((tag ("TS_CONST_DECL"))) const_decl;
 struct tree_type_decl GTY ((tag ("TS_TYPE_DECL"))) type_decl;
 struct tree_function_decl GTY ((tag ("TS_FUNCTION_DECL"))) function_decl;
 struct tree_translation_unit_decl GTY ((tag ("TS_TRANSLATION_UNIT_DECL")))
 translation_unit_decl;
 struct tree_type_common GTY ((tag ("TS_TYPE_COMMON"))) type_common;
 struct tree_type_with_lang_specific GTY ((tag ("TS_TYPE_WITH_LANG_SPECIFIC")))
 type_with_lang_specific;
 struct tree_type_non_common GTY ((tag ("TS_TYPE_NON_COMMON")))
 type_non_common;
 struct tree_list GTY ((tag ("TS_LIST"))) list;
 struct tree_vec GTY ((tag ("TS_VEC"))) vec;
 struct tree_exp GTY ((tag ("TS_EXP"))) exp;
 struct tree_ssa_name GTY ((tag ("TS_SSA_NAME"))) ssa_name;
 struct tree_block GTY ((tag ("TS_BLOCK"))) block;
 struct tree_binfo GTY ((tag ("TS_BINFO"))) binfo;
 struct tree_statement_list GTY ((tag ("TS_STATEMENT_LIST"))) stmt_list;
 struct tree_constructor GTY ((tag ("TS_CONSTRUCTOR"))) constructor;
 struct tree_omp_clause GTY ((tag ("TS_OMP_CLAUSE"))) omp_clause;
 struct tree_optimization_option GTY ((tag ("TS_OPTIMIZATION"))) optimization;
 struct tree_target_option GTY ((tag ("TS_TARGET_OPTION"))) target_option;
};

Figure 1: The tree_node structure extracted from the GCC code base.

Fundamentally, a tree_node is a big union of structs.  The union contains a handful of common or descriptive members, but the majority of union members are specific types of tree nodes.  The first tree union member: tree_base is common to all tree nodes and provides the basic descriptive information about the node to permit one to determine the precise kind of node being examined or manipulated.  There is a bit of an inheritance model introduced with tree_base being the foundation and tree_typed and tree_common adding another layer of customization for specific categories of tree nodes to inherit but from there on out the remainder of the union members are specific types of tree nodes.  For example, tree_int_cst is an integer constant node whereas tree_field_decl is a field declaration.

Tree nodes are typed but not in the C language sense of ‘typed’.  One way to think about it is that the tree_node structure is a memory-efficient way to model a class in C prior to C++.  Instead of member functions or methods, there is a large library of macros which act on tree nodes.  In general, macros will fall into two categories: predicate macros which will usually have a ‘_P’ suffix and return a value which can be compared to zero to indicate a false result and transformation macros which take a tree node and usually return another tree node.  Despite the temtpation to dip directly into the public tree_node structure and access or modify the data members directly – don’t do it.  Treat tree nodes like a C++ classes in which all the data members are private and rely on the tree macros to query or manipulate tree nodes.

Relying on the macros to work with the tree_node structure is the correct approach per GCC documentation but will also simply make your life easier.  GCC tree_node structures are ‘strongly typed’ in the sense that they are distinct in the GCC tree type-system and many of the macros expect a specific tree_node type.  For example the INT_CST_LT(A, B) macro expects to have two tree_int_cst nodes passed as arguments – even though the C++ compiler cannot enforce the typing at compile time.  If you pass in the wrong  tree_node type, you will typically get a segmentation violation.  An alternative approach is to compile GCC with the –enable-checking flag set which will enforce runtime checking of node types.

In terms of history, this type of modelling was common back in the day when machines were limited in memory and compute cycles.  This approach is very efficient in terms of memory as the union overlays all the types and there are no virtual tables or other C++ class overhead that consumes memory or requires compute overhead.  The price paid though is that it is 100% incumbent on the developer to keep the type-system front-of-mind and insure that they are invoking the right macros with the right arguments.  The strategy of relying on the compiler to advise one about type mis-matches does not work in this kind of code.

Basics of AST Programming

There are 5 key macros that can be invoked safely on any tree structure.  These three are: TREE_CODE, TREE_TYPE, TREE_CHAIN, TYPE_P and DECL_P.  In general after obtaining a ‘generic’ tree node, the first step is to use the TREE_CODE macro to determine the ‘type’ (in the GCC type-system) of the node.  The TREE_TYPE macro returns the source code ‘type’ associated with the node.  For example, the node result type of a method declaration returning an interger value will have a TREE_TYPE with a TREE_CODE equal to INTEGER_TYPE.  The code for that statement would look like:


TREE_CODE( TREE_TYPE( DECL_RESULT( <em>methodNode</em> ))) == INTEGER_TYPE

Within the AST structure, lists are generally represented as singly-linked lists with the link to the next list member returned by the TREE_CHAIN macro.  For example, the DECL_ARGUMENTS macro will return a pointer to the first parameter for a function or method.  If this value is NULL_TREE, then there are no parameters, otherwise the tree node for the first parameter is returned.  Using TREE_CHAIN on that node will return NULL_TREE if it is the only parameter or will return a tree instance for the next parameter.  There also exists a vector data structure within GCC and it is accessed using a different set of macros.

The TYPE_P and DECL_P macros are predicates which will return non-zero values if the tree passed as an argument is a type specification or a code declaration.  Knowing this distinction is important as it then quickly partitions the macros which can be used with node.  Many macros will have a prefix of ‘TYPE_’ for type nodes and ‘DECL_’ for declaration nodes.  Frequently there will be two sets of identical macros, for instance TYPE_UID will return the GCC generated, internal numeric unique identifier for a type node whereas DECL_UID is needed for a declaration node.  In general, I have found that calling a TYPE_ macro on a declaration or a DECL_ macro on a type specification will result in a segmentation violation.

Other frequently used macros include: DECL_NAME and TYPE_NAME to return a tree node that contains the source code name for a given element.  IDENTIFIER_POINTER can then be used on that tree to return a pointer to the char* for the name.  DECL_SOURCE_FILE, DECL_SOURCE_LINE and DECL_SOURCE_LOCATION are available to map an AST declaration back to the source code location.  As mentioned above, DECL_UID and TYPE_UID return numeric unique identifiers for elements in the source code.

In addition to the above, for C++ source code fed to g++, the compiler will inject methods and  fields not explicitly declared in the c++ source code.  These elements can be identified with the DECL_IS_BUILTIN and DECL_ARTIFICIAL macros.  If as you traverse the AST you trip across oddly named elements, check the node with those macros to determine if the nodes have been created by the compiler.

Beyond this simple introduction, sifting through the AST will require a lot of time reviewing the tree.h and other header files to look for macros that you will useful for your application.  Fortunately, the naming is very consistent and quite good which eases the hunt for the right macro.  Once you think you have the right macro for a given task, try it in your plugin and see if you get the desired result.  Be prepared for a lot of trial-and-error investigation in the debugger.  Also, though there are some GDB scripts to pretty-print AST tree instances, looking at these structure in the debugger will also require some experience, as again the debugger isn’t able to infer much about GCC’s internal type system.

Making the AST Easier to Navigate and Manipulate

I have started a handful of C++ libraries which bridge the gap between the implicit type system in the GCC tree_node structure and explicit C++ classes modelling distinct tree_node types.  For example, a snippet from my TypeTree class appears below in Figure 2.


class TypeTree : public DeclOrTypeBaseTree
 {
 public :

TypeTree( const tree& typeTree )
 : DeclOrTypeBaseTree( typeTree )
 {
 assert( TYPE_P( typeTree ) );
 }

TypeTree& operator= ( const tree& typeTree )
 {
 assert( TYPE_P( typeTree ) );

(tree&)m_tree = typeTree;

return( *this );
 }

 const CPPModel::UID UID() const
 {
 return( CPPModel::UID( TYPE_UID( TYPE_MAIN_VARIANT( m_tree ) ), CPPModel::UID::UIDType::TYPE ) );
 }

 const std::string Namespace() const;

std::unique_ptr<const CPPModel::Type> type( const CPPModel::ASTDictionary& dictionary ) const;

CPPModel::TypeInfo::Specifier typeSpecifier() const;

CPPModel::ConstListPtr<CPPModel::Attribute> attributes();
 };

Figure 2: TypeTree wrapper class for GCC tree_node.

Within this library I make extensive use of the STL, Boost libraries and a number of C++ 11 features.  For example, ConstListPtr<> is a template alias for a std::unique_ptr to a boost::ptr_list class.


template <class T> using ListPtr = std::unique_ptr<boost::ptr_list<T>>;
 template <class T> using ConstListPtr = std::unique_ptr<const boost::ptr_list<T>>;

template <class T> using ListRef = const boost::ptr_list<T>&;

template <class T> ConstListPtr<T> MakeConst( ListPtr<T>& nonConstList ) { return( ConstListPtr<T>( std::move( nonConstList ) ) ); }

Figure 3: Template aliases for lists.

At present the library is capable of walking through the GCC AST and creating a dictionary of all the types in the code being compiled.  Within this dictionary, the library is also able to provide detailed information on classes, structs, unions, functions and global variables.  It will scrape out C++ 11 generalized attributes on many source code elements (not all of the yet though) and return proper declarations with parameters and return types for functions and methods.  The ASTDictionary and the specific language model classes have no dependency on GCC Internals themselves.

The approach I followed for developing the library thus far was to get enough simple code running using the GCC macros that I could then start to refactor into C++ classes.  Along the way, I used Boost strong typedefs to start making sense of the GCC type system at compile time.  Once the puzzle pieces started falling into place and the programming patterns took shape, developing a plugin on top of the libraries is fairly straightforward.  That said, there is a long and painful learning curve associated with GCC internals and the AST itself.

Getting the Code and Disclaimers

The library code is available on Github: ‘stephanfr/GCCPlugin’.  All of the code is under GPL V3.0 which is absolutely required as it runs within GCC itself.  I do not claim that the library is complete, stable, usable or rational – but hopefully some will find it useful if for nothing more than providing some insight into the GCC AST.  For the record, this is not my job nor is it my job to enrich or bug fix the library so you can get your compiler theory class project done in time.  That said, if you pick up the code and either enrich it or fix some bugs – please return the code to me and I will merge what makes sense.

The code should ‘just run’ if you have a GCC Plugin build environment configured per my prior posts.  One detail is that the ‘GCCPlugin Debug.launch’ file will need to be moved to the ‘.launches’ directory of Eclipse’s ‘org.eclipse.debug.core’ plugin directory.  If the ‘.launches’ directory does not exist, then create it.

Building GCC Plugins – Part 1: C++ 11 Generalized Attributes

Historically with C and C++ compilers, you get what you get and you don’t get upset.  There was little or no facility for extending the compiler or for the kind of meta-programming models available in other languages.  Macro or template meta-programming and source code generation has been an option for many, many years but annotation based meta-programming which is a prominent feature of many popular Java frameworks has been very difficult to replicate in C++.

Starting with version 4.5.0, the GCC compiler supports ‘plugins’ which are dynamically loaded modules which make it possible for developers to enrich the compiler without having to modify the GCC source code itself.  There is a bit of information on the GCC Wiki and an excellent set of articles by Boris Kolpackov on the basics of writing GCC plugins.  Beyond those references, I found little else.  I spent a lot of time digging through GCC header files and using trial-and-error to work through decoding the GCC internal data structures.

Starting with GCC version 4.8.0, the compiler supports the C++ 11 standard for ‘generalized attributes’.  GCC (and many other C/C++ compilers) have had attributes for quite some time, but C++ 11 now specifies a standard syntax for both attributes and attribute namespaces.

Taken together, plugins and C++ 11 generalized attributes provide a framework within which annotation based meta-programming may be approached in the GCC compiler.  To be clear, it is not necessarily easy to do – there is a long learning curve for GCC internals – but at least it is achievable without requiring pragmas, code generators or direct modification of the GCC compiler itself.

In this and a series of followup posts, I’ll walk through creating a GCC plugin, adding custom attributes, decoding and traversing the Abstract Syntax Tree (AST) and simple modifications to the AST.

C++11 Generalized Attributes

The GCC compiler has had attributes for quite some time, primarily to provide hints to the compiler or for injecting debugging code.  The GCC syntax appears below:

__attribute__ ((aligned (16)))

Contrast the above with the equivalent C++ 11 syntax:

[[gnu::aligned (16)]]

In the C++ specification, the __attribute__ keyword is gone and double brackets are used to surround the attribute.  Also, the new specification introduces namespaces for attributes.  In the above example, the ‘gnu’ namespace is implicit in the GCC style attribute but must be called out explicitly when using the C++ standard syntax.

Development Environment

In general, I start by creating a VM for the project I will be working on – essentially one VM per project.  In a series of prior posts I walk through creating an Ubuntu development VM, the process required to build a debug version of GCC and how to debug GCC in the Eclipse CDT IDE.  The rest of this post assumes that base environment but there is no reason why the process presented herein could not be modified for a different but functionally similar environment.

For generalized attributes, you will need GCC version 4.8.0 or later.

Creating a GCC Plugin

Step 1: Create a pair of C++ Projects in Eclipse CDT

Perhaps the most straightforward way to build and debug a plugin in Eclipse is to create one  C++ project for the plugin itself and a second ‘dummy’ C++ project which is used to hold the source files to be compiled by a debug instance of GCC with the plugin loaded.  Figure 1 shows the Eclipse Project Explorer window for this simple project.  The ‘GCCAttributesAndPlugin’ project should be an empty shared object and a ‘HelloWorld’ executable is fine for the ‘TestProject’.

Two C++ Projects

Figure 1: Two C++ Projects in Eclipse Explorer

Step 2: Modify the Compiler Settings for the Plugin Project

A small number of modifications to the plugin project C++ compiler settings are necessary to insure the correct version of the compiler is used and the plugin header files are found.  Figure 2 contains an image of the Eclipse C++ Settings dialog with the ‘Command’ changed to point to the 4.8.0 version of the g++ compiler built previously.  Other changes are found in the ‘All Options’ but those will actually be introduced in the next steps.

Figure 2: C++ Compiler Settings

Figure 2: C++ Compiler Settings

Next, the include path for the GCC plugin header files needs to be added to the project.  For an environment configured per my prior post, the correct path is: ‘/usr/gcc-4.8.0/lib/gcc/x86_64-linux-gnu/4.8.0/plugin/include’ and can be seen in the C++ Settings Includes Dialog in Figure 3.

Figure 3: C++ Settings Include Dialog

Figure 3: C++ Settings Include Dialog

Next, a couple of options need top be added to the Miscellaneous dialog as shown in Figure 4.  The two that will need to be added to a vanilla project are: ‘-std=c++0x’ to indicate the C++ 11 language specification should be used and ‘-gdwarf-3’ to force the compiler to emit debugging symbols in dwarf-3 format, as the latest version of the gdb debugger will not accept the default compiler dwarf formation.

Figure 4: C++ Settings Miscellaneous

Figure 4: C++ Settings Miscellaneous

Finally, just as the ‘Command’ was changed for the C++ compiler, the same needs to be done for the Linker as shown in Figure 5.

Figure 5: Linker Settings Dialog

Figure 5: Linker Settings Dialog

Step 3: Modify the Discovery Options for the Plugin Project

To insure that the Eclipse IDE finds the correct include path paths and indexes the project properly, the ‘Discovery Options’ need to be changed to reflect the compiler being used for the project itself.  Figure 6 shows the modification to the ‘Compiler Invocation Command’ for the discovery function.

Figure 6: Discovery Options Dialog

Figure 6: Discovery Options Dialog

Step 4: Source Code for the Plugin

Little source code is required to register custom attributes and build a plugin.  The code for a very simple plugin with attributes appears below.

/*
 * gccplugin.cpp
 *
 * Created on: May 17, 2013
 * Author: steve
 */
#include <iostream>
#include "config.h"
#include "gcc-plugin.h"
#include "tree.h"
#include "cp/cp-tree.h"
#include "diagnostic.h"
#include "plugin.h"

//
// The following global int is needed to let the compiler know that this plugin is//     GPL licensed
//

int plugin_is_GPL_compatible;

static tree HandleAttribute( tree* node,
                             tree attrName,
                             tree attrArguments,
                             int flags,
                             bool* no_add_attrs )
{
    std::cerr << "Encountered Attribute: " << IDENTIFIER_POINTER( attrName );

    // Print the arguments

    std::string separator = " ";
    for( tree& itrArgument = attrArguments; itrArgument != NULL_TREE; itrArgument = TREE_CHAIN( itrArgument ) )
    {
        std::cerr << separator << TREE_STRING_POINTER( TREE_VALUE ( itrArgument ));
        separator = ", ";
    }

    std::cerr << std::endl;

    // Just return a null tree now.

    return( NULL_TREE );
}

static struct attribute_spec g_GeneralizedAttribute1 =
{
 "generalized_attribute_1", 0, -1, false, true, false, HandleAttribute, false
};

static struct attribute_spec g_GeneralizedAttribute2 =
{
 "generalized_attribute_2", 0, -1, false, false, false, HandleAttribute, false
};

//    The array of attribute specs passed to register_scoped_attributes must be NULL terminated
attribute_spec demoScopedAttributes[] = { g_GeneralizedAttribute1, g_GeneralizedAttribute2, NULL };

static void RegisterAttributes( void* eventData,
                                void* userData )
{
 register_scoped_attributes( demoScopedAttributes, "demo" );
}

static void GateCallback( void* eventData, void* userData )
{
     // If there has been an error, fall through and let the compiler handle it
    if( errorcount || sorrycount )
    {
        return;
    }
    std::cerr << "IPA Passes Starting for File: " << main_input_filename << std::endl;
}

int plugin_init( plugin_name_args*   info,
                 plugin_gcc_version* ver )
{
    std::cerr << "Starting Plugin: "<< info->base_name << std::endl;
    register_callback( info->base_name, PLUGIN_ATTRIBUTES, &RegisterAttributes, NULL );
    register_callback( info->base_name, PLUGIN_ALL_IPA_PASSES_START, &GateCallback, NULL );
    std::cerr << "Plugin Initialized, attribute registered" << std::endl;
    return( 0 );
}

The set of includes are pretty much the bare minimum needed for a plugin.  The ‘config.h’ file is the compiler configuration which is generated during the build process and can be found with the compiler includes.  Aside from the inclusion of ‘<iostream>’ to provide output to the console, the remaining includes are for plugin and attribute support.  Since GCC 4.8.0 is compiled using the g++ compiler, it is no longer necessary to wrap the plugin includes with ‘extern C{}’ to denote the difference in name mangling.  Also of note is the global symbol ‘plugin_is_GPL_compatible’ which the compiler checks for in the plugin library when it loads the plugin.  If the compiler does not find this global symbol, it will not finish loading the plugin.

The ‘HandleAttribute()’ function is a callback that is invoked by the compiler when it encounters a custom attribute registered by the plugin.  A separate function pointer to a callback is associated with each attribute registered, so it would be completely reasonable to have a separate callback for each custom attribute.  Within the handler, all we do is print out the attribute name and the attribute arguments.  The arguments are a GCC tree list of constant values – more on how to interpret GCC trees will appear in followup posts.

Next are static specifications of the two custom attributes.  This structure is defined in ‘tree.h’ which also contains good descriptions of the meanings of the fields.  I will not re-iterate that documentation here, but be sure to read through the definition to insure you are  passing the right values.  The ‘RegisterAttributes()’ callback function appears next.  The namespace for scoped attributes appears as the second argument to the ‘register_scoped_attributes()’ – for this case the namespace is ‘demo’.

The ‘GateCallback()’ function is invoked by the plugin framework in response to registrations for notification when specific passes have been completed by the compiler.

Finally, the ‘plugin_init()’ function is the entry point for the plugin.  After the compiler loads the plugin shared object then it will call this function.  There is an argument block and a version information block passed to the function, neither of those arguments are used in this simple example.  This function registers two callbacks: the first to register the custom attributes and the second to register a gate callback on the PLUGIN_ALL_IPA_PASSES_START event.  Once again, for further detail on the parameters for the ‘plugin_init()’ function, the best bet is to refer to the inline documentation in the GCC source code.

Step 5: Source Code for the Test Project

The source code for a test project with a pair of classes with attributes appears below.  This file is just the auto-generated ‘HelloWorld’ project source code with the two demo classes added.  Note the C++ 11 syntax for the attributes with the ‘demo’ namespace.


/============================================================================
// Name : TestProject.cpp
// Author :
// Version :
// Copyright : Your copyright notice
// Description : Hello World in C++, Ansi-style
//============================================================================

#include <iostream>
using namespace std;

class [[demo::generalized_attribute_1( "arg1", "arg2" )]] ClassWithAttribute1
{
};

class [[demo::generalized_attribute_2( "arg3" )]] ClassWithAttribute2
{
};

int main() {
 cout << "!!!Hello World!!!" << endl; // prints !!!Hello World!!!
 return 0;
}

Step 6: Create the .gdbinit file

The ‘.gdbinit’ file contains configuration information for the GDB debugger when it is invoked by the Eclipse IDE.  For debugging a GCC plugin, the file should have at least the following contents:

set schedule-multiple
dir ~/gcc_build/4.8.0/build/gcc
dir ~/gcc_build/4.8.0/gcc
dir ~/gcc_build/4.8.0/gcc/cp
dir ~/gcc_build/4.8.0/gcc/lto
source ~/gcc_build/4.8.0/build/gcc/gdbinit.in

The ‘.gdbinit’ file may be placed in the root directory of the plugin project.

Step 7: Create a Debugger Profile for the Plugin Project

To debug the plugin inside of Eclipse, the approach I use is to adjust the debugging launch profile for the plugin project so that it launches GCC and loads the plugin to compile the source code in the ‘Test Project’.  The first step is to set the ‘C/C++ Application’ to the compiler itself in the ‘Main’ dialog of the Debug Launch Configuration Properties, as shown in Figure 7.

Figure 7: Debug Profile Main Dialog

Figure 7: Debug Profile Main Dialog

The next step is to tell the compiler to load the plugin, specify C++ 11 semantics and point the compiler at the source code file in ‘TestProject’, as shown in Figure 8.

Figure 8: Arguments Tab for the Debug Launch Profile

Figure 8: Arguments Tab for the Debug Launch Profile

Next, the LD_LIBRARY_PATH and the PATH environment variables need to be enriched to add the paths to the GCC 4.8.0 compiler executables and libraries.  This is done on the ‘Environment’ tab as shown in Figure 9.  Additional detail is shown in Figures 10 and 11.

Figure 9: Debug Profile Environment Tab

Figure 9: Debug Profile Environment Tab

Figure 10: LD_LIBRARY_PATH setting

Figure 10: LD_LIBRARY_PATH setting

Figure 11: PATH environment variable setting

Figure 11: PATH environment variable setting

Finally, on the ‘Debugger’ tab, insure that the ‘GDB Command Line’ points to your ‘.gdbinit’ file created above and check the ‘Automatically debug forked process’ box.  GCC forks the g++ compiler from a main controller process, so if this checkbox is blank, then GDB will not debug the forked g++ process where the plugin actually gets loaded.

Figure 12: Debug Profile Debugger Tab

Figure 12: Debug Profile Debugger Tab

Step 8: Pass arguments to the plugin

 The command line syntax to pass one or more arguments to a plugin is a little tricky.  Given that a multiplicity of plugins may be simultaneously loaded into GCC, the disambiguation of which arguments are associated with which plugin are embedded in the command line.  The syntax to pass an argument to a plugin is:


-fplugin-arg-'plugin name'-'argument name'='argument value'

For this example, the program arguments tab would contain the following:

Figure 9 : Debug Arguments Tab with Plugin Arguments

Figure 13: Debug Arguments Tab with Plugin Arguments

Upon entering plugin_init(), the info->argc argument will contain the count of plugin arguments and the info->argv->key and info->argv->value arrays will contain the key value pairs passed on the command line.

Not that the attached example code does not include plugin arguments.

Step 9: Run the code

If you run the debug profile just created for the plugin project you should see output like this in the console:

Figure 13: Debug Console Output

Figure 14: Debug Console Output

That should do it, you have a basic GCC plugin and attribute framework in place.

Prepackaged Projects

If you have a development environment build as described in my prior posts, then you should be able to take the contents of the attached zip file and simply extract them into your workspace to get the two projects.  Also in the zip file is the debug profile which is in the ‘.launches’ directory.  This directory needs to be placed under the ‘.metadata/.plugins/org.eclipse.debug.core’ directory for Eclipse to recognize the profile.  If you drop in the directory while Eclipse is running, then you will need to re-start Eclipse.

Project Files