Tag Archives: GCC Abstract Syntax Tree

Building GCC Plugins – Part 3 C++ Libraries

As discussed in the prior post, I have started a set of C++ libraries to reduce the complexity of writing GCC Plugins and interpreting the GCC Abstract Syntax Tree.  In this post I will provide a high level description of the libraries and walk through the dependencies and directory structures.  The libraries are available on Github: ‘stephanfr/GCCPlugin’.

NB – At the time of writing, I am going through successive revisions and refactoring passes on the library, so expect anything you build now to break with my next commit to GitHub.  The interfaces will settle down in time and I will ‘chill’ them at some point in hopefully the not too distant future.

Licensing and Dependencies

All of the libraries with the exception of the unit test library link directly with the GCC source code, therefore they are all licensed with GPL V3.0.  The libraries are built with the C++11 language features and have dependencies on the Standard Library shipped with GCC and Boost libraries.  The unit testing framework depends on the Google Test libraries.

Programming Style

For what it is worth, I’ve been writing C++ code for a long, long time and am somewhat opinionated regarding some development practices.  First, I use anything in the standard c++ library – in particular I do not write containers.  Second, I use the std::string class in preference of char* strings almost exclusively.  For external interfaces I may expose a char* type but under the interface any char* will almost always map straight back to a std::string instance.  Third, I use anything from the Boost library that suits my needs.  The Boost libraries are excellent, don’t waste your time re-inventing a component in that library; in all likelihood your component will not be as good anyway.  Fourth, there are some naked pointers in these libraries but in general I try to use a std::unique_ptr or std::shared_ptr in any code written today (I will fix any naked pointers in this library as I refactor).  The standard library smart pointers are a bit more difficult to use than naked pointers, but that difficulty is a result of them enforcing the semantics necessary to know when to delete a pointer they wrap.  Finally, I really like C++ 11 – I’d strongly suggest cutting over to it.

With regard to my coding format, it is idiosyncratic.  Indentation and spacing don’t quite adhere to any standard, but at least I no longer use Hungarian notation – though that was a hard habit to kick.

Project and Directory Structure

The project is currently composed of seven directories, each with a single Eclipse CDT C++ project:

  1. CPPLanguageModel – a compiler-neutral class library of C++ language elements
  2. GCCInternalsTools – a set of classes and functions tailored specifically to the GCC g++ compiler to build a CPPLanguageModel representation of the code being compiled and to enable insertion of new code into the AST
  3. GCCInternalsUTFixture – a test fixture providing an abstraction of the GCCInternalsTools designed to permit the creation of unit tests for the library without any dependency on the GCC specific libraries themselves
  4. GCCInternalsUnitTest – a set of unit tests for key features of the GCC Plugins libraries
  5. TestExtensions – a collection of test framework ‘plugins’ that rely on GCCInternalsTools and the GCC headers; a separate project is used to prevent dependencies on GCC internals to leak into the main Unit Test framework
  6. GCCPlugin – a ‘HelloWorld’ style plugin for GCC Plugins using this framework
  7. Utility – Various utility classes to simply coding and implement design patterns I like

The most up-to-date examples of using the libraries will be in the unit test projects.  Similarly, if you go wandering through the code you will frequently see blocks of code commented out.  I tend to leave code I have refactored in place for a revision or two just in case a bug crawls out.  I find it is a bit easier than going back through prior revisions in source code control but it can make the code a little messy at points.  When I get to a version I am happy with, I go through a couple cleaning passes and knock out dead or legacy code.

Design Philosophy

The innards of GCC are absolutely not for the faint of heart.  A primary design goal of this framework is to insulate someone wanting to produce a GCC Plugin from the complexity of the compiler and its design paradigms.  At present, only a single GCC header file is required to build a plugin with this framework and all functionality exposed through the framework’s API is abstracted from GCC itself.  The framework is built for manipulating the Abstract Syntax Tree for C++ language programs but could be modified to match other languages.

To use the framework, you ought to only include header files from the CPPLanguageModel project.  Actually, the ASTDictionary.h and PluginManager.h header files will pull in most of the declarations needed to build your plugin.  Two header files from the gcc distribution are also needed: config.h and gcc-plugin.h

The object model exposed by the framework is that of a Dictionary of all the types and declarations in the code being compiled by g++ with the plugin loaded.  The dictionary is indexed by namespace, entry fully qualified name, entry source code location, entry UID and and an identity field.  All of the indices are exposed by the ASTDictionary class and can be used for searching the dictionary for a specific entry.  The identity, UID and fully qualified name indices are unique whereas the namespace and source location indices are non unique and may return a range of results.

The dictionary contains entries for different types and declarations.  Entries will be one of the following ‘kinds’: CLASS, UNION, FUNCTION, GLOBAL_VAR, TEMPLATE or UNRECOGNIZED.  The UNRECOGNIZED kind is simply a catch-all for any AST tree elements that have not yet been added to the tree parser.  Dictionary entries are effectively stubs from which the actual definition of the entry may be extracted.  Definitions contain the detailed, ‘kind’ specific information about the entry.  For example, the ClassDefinition object contains the base classes, fields, methods, template methods and friends for the class type.  Source location, namespace, UID, static and extern flags and a list of attributes are available for all dictionary entries and those values are copied into the more detailed definitions as well.

I’ve tried to insure that the AST tree parser will pass through the tree adding dictionary entries for elements it recognizes and ignoring everything else.  My intent is that it should not crash on encountering some language element it does not recognize in the AST but I have not run the parser over a whole lot of code so I will stick to ‘intent’ for now.  At present, the parser recognizes unions but does not yet provide a detailed definition of union types.  I figured it was more valuable to get some code injection functionality in place before sweating through the details of union representations in the GCC AST.

Current Supported Versions of GCC

The internals of GCC are constantly in flux and functionally there are no ‘frozen’ APIs or data structures that one can depend upon remaining static release over release.  The changes are unlikely to be significant release over release but there is a high probability of breaking changes associated with any release.

The code currently compiles and runs with GCC 4.8.0.  I can make no guarantees that it will compile and run with later releases, though hopefully nothing should break between double dot releases.

Example Plugin

An example ‘HelloWorld’ plugin appears below.  The four header files appear at the top.  The plugin_is_GPL_compatible symbol is needed for licensing compliance with the GCC suite.

There exists an implementation of the CPPModel::CallbackIfx interface which is used by the framework to call back into the plugin at specific times in the compilation process.  There are entry points for when the AST is ready, for a point at which namespaces may be declared and a point at which code may be injected.  For the sample plugin, all that happens is that the contents of the TestNamespace inside the code being compiled is dumped to cerr.  The plugin_init function is part of the GCC plugin framework and is rather straightforward when using these abstraction libraries.


/*-------------------------------------------------------------------------------
Copyright (c) 2013 Stephan Friedl.

All rights reserved. This program and the accompanying materials
are made available under the terms of the GNU Public License v3.0
which accompanies this distribution, and is available at
http://www.gnu.org/licenses/gpl.html

Contributors:
 Stephan Friedl
-------------------------------------------------------------------------------*/

#include "config.h"

#include "ASTDictionary.h"
#include "PluginManager.h"

#include "gcc-plugin.h"

int plugin_is_GPL_compatible;

class Callbacks : public CPPModel::CallbackIfx
{
public :

 Callbacks()
 {}

 virtual ~Callbacks()
 {}

 void ASTReady()
 {
 std::list<std::string> namespacesToDump( { "TestNamespace::" } );

 CPPModel::GetPluginManager().GetASTDictionary().DumpASTXMLByNamespaces( std::cerr, namespacesToDump );
 };

 void CreateNamespaces()
 {
 };

 void InjectCode()
 {
 };

};

Callbacks g_pluginCallbacks;

int plugin_init( plugin_name_args* info, plugin_gcc_version* ver )
{
 std::cerr << "Starting Plugin: "<< info->base_name << std::endl;

 CPPModel::GetPluginManager().Initialize( "HelloWorld Plugin", &g_pluginCallbacks );

 return( 0 );
}

 

Example Output

A sample program to be compiled appears below.  This code has the TestNamespace declared and it is the contents of that namespace that will be dumped by the plugin above.

#include <iostream>

namespace TestNamespace
{
	class TestClass
	{
	public :

		int			publicInt;

		int			getPublicInt() const
		{
			return( publicInt );
		}

	protected :

		double		getPrivateDouble() const
		{
			return( privateDouble );
		}

	private :

		double		privateDouble;
	};

	char*		globalString = "This is a global string";

	TestClass	globalTestClassInstance;
}

int main()
{
	std::cout << "!!!Hello World!!!" << std::endl; // prints !!!Hello World!!!

	return 0;
}

The command line required to invoke g++ with the plugin and compile the above file follows:

/usr/gcc-4.8.0/bin/gcc-4.8.0 -c -std=c++11 -fplugin=libGCCPlugin.so HelloWorld.cpp

When g++ initializes, it loads the sample plugin and when the AST is ready, the plugin dumps the following to the standard output.  It isn’t prefect XML but ought to be good enough to analyze the program being compiled.

8: 2014-09-23 21:03:42   [LoggingInitialization] [NORMAL]  Logging Initiated
Starting Plugin: libGCCPlugin
HelloWorld.cpp:34:24: warning: deprecated conversion from string constant to ‘char*’ [-Wwrite-strings]
  char*  globalString = "This is a global string";
                        ^
<ast>
    <dictionary>
        <namespace name="TestNamespace::">
            <dictionary_entry>
                <namespace>
                    <name>TestNamespace::</name>
                </namespace>
                <name>TestClass</name>
                <uid>20720</uid>
                <source-info>
                    <file>HelloWorld.cpp</file>
                    <line>9</line>
                    <char-count>1</char-count>
                    <location>6451683</location>
                </source-info>
            </dictionary_entry>
            <dictionary_entry>
                <namespace>
                    <name>TestNamespace::</name>
                </namespace>
                <name>globalString</name>
                <uid>28506</uid>
                <source-info>
                    <file>HelloWorld.cpp</file>
                    <line>34</line>
                    <char-count>1</char-count>
                    <location>6454884</location>
                </source-info>
                <static>true</static>
            </dictionary_entry>
            <dictionary_entry>
                <namespace>
                    <name>TestNamespace::</name>
                </namespace>
                <name>globalTestClassInstance</name>
                <uid>28507</uid>
                <source-info>
                    <file>HelloWorld.cpp</file>
                    <line>36</line>
                    <char-count>1</char-count>
                    <location>6455143</location>
                </source-info>
                <static>true</static>
            </dictionary_entry>
        </namespace>
    </dictionary>
    <elements>
        <namespace name="TestNamespace::">
            <class type="class">
                <name>TestClass</name>
                <uid>20720</uid>
                <source-info>
                    <file>HelloWorld.cpp</file>
                    <line>9</line>
                    <char-count>1</char-count>
                    <location>6451683</location>
                </source-info>
                <namespace>
                    <name>TestNamespace::</name>
                </namespace>
                <compiler_specific>
                    </artificial>
                </compiler_specific>
                <base-classes>
                </base-classes>
                <friends>
                </friends>
                <fields>
                    <field>
                        <name>publicInt</name>
                        <source-info>
                            <file>HelloWorld.cpp</file>
                            <line>13</line>
                            <char-count>1</char-count>
                            <location>6452196</location>
                        </source-info>
                        <type>
                            <kind>fundamental</kind>
                            <declaration>int</declaration>
                        </type>
                        <access>PUBLIC</access>
                        <static>false</static>
                        <offset_info>
                            <size>4</size>
                            <alignment>4</alignment>
                            <offset>0</offset>
                            <bit_offset_alignment>128</bit_offset_alignment>
                            <bit_offset>0</bit_offset>
                        </offset_info>
                    </field>
                    <field>
                        <name>privateDouble</name>
                        <source-info>
                            <file>HelloWorld.cpp</file>
                            <line>30</line>
                            <char-count>1</char-count>
                            <location>6454374</location>
                        </source-info>
                        <type>
                            <kind>fundamental</kind>
                            <declaration>double</declaration>
                        </type>
                        <access>PRIVATE</access>
                        <static>false</static>
                        <offset_info>
                            <size>8</size>
                            <alignment>8</alignment>
                            <offset>0</offset>
                            <bit_offset_alignment>128</bit_offset_alignment>
                            <bit_offset>64</bit_offset>
                        </offset_info>
                    </field>
                </fields>
                <methods>
                    <method>
                        <name>getPublicInt</name>
                        <uid>28497</uid>
                        <source-info>
                            <file>HelloWorld.cpp</file>
                            <line>15</line>
                            <char-count>1</char-count>
                            <location>6452452</location>
                        </source-info>
                        <access>PUBLIC</access>
                        <static>false</static>
                        <result>
                            <type>
                                <kind>fundamental</kind>
                                <declaration>int</declaration>
                            </type>
                        </result>
                        <parameters>
                            <parameter>
                                <name>this</name>
                                <type>
                                    <kind>derived</kind>
                                    <declaration>
                                        <operator>pointer</operator>
                                        <type>
                                            <kind>class-or-struct</kind>
                                            <declaration>TestNamespace::TestClass</declaration>
                                            <namespace>
                                                <name>TestNamespace::</name>
                                            </namespace>
                                        </type>
                                    </declaration>
                                </type>
                                <compiler_specific>
                                    </artificial>
                                </compiler_specific>
                            </parameter>
                        </parameters>
                    </method>
                    <method>
                        <name>getPrivateDouble</name>
                        <uid>28499</uid>
                        <source-info>
                            <file>HelloWorld.cpp</file>
                            <line>22</line>
                            <char-count>1</char-count>
                            <location>6453350</location>
                        </source-info>
                        <access>PROTECTED</access>
                        <static>false</static>
                        <result>
                            <type>
                                <kind>fundamental</kind>
                                <declaration>double</declaration>
                            </type>
                        </result>
                        <parameters>
                            <parameter>
                                <name>this</name>
                                <type>
                                    <kind>derived</kind>
                                    <declaration>
                                        <operator>pointer</operator>
                                        <type>
                                            <kind>class-or-struct</kind>
                                            <declaration>TestNamespace::TestClass</declaration>
                                            <namespace>
                                                <name>TestNamespace::</name>
                                            </namespace>
                                        </type>
                                    </declaration>
                                </type>
                                <compiler_specific>
                                    </artificial>
                                </compiler_specific>
                            </parameter>
                        </parameters>
                    </method>
                </methods>
                <template_methods>
                </template_methods>
            </class>
            <global_var_entry>
                <namespace>
                    <name>TestNamespace::</name>
                </namespace>
                <name>globalString</name>
                <uid>28506</uid>
                <source-info>
                    <file>HelloWorld.cpp</file>
                    <line>34</line>
                    <char-count>1</char-count>
                    <location>6454884</location>
                </source-info>
                <static>true</static>
                <type>
                    <kind>derived</kind>
                    <declaration>
                        <operator>pointer</operator>
                        <type>
                            <kind>fundamental</kind>
                            <declaration>char</declaration>
                        </type>
                    </declaration>
                </type>
            </global_var_entry>
            <global_var_entry>
                <namespace>
                    <name>TestNamespace::</name>
                </namespace>
                <name>globalTestClassInstance</name>
                <uid>28507</uid>
                <source-info>
                    <file>HelloWorld.cpp</file>
                    <line>36</line>
                    <char-count>1</char-count>
                    <location>6455143</location>
                </source-info>
                <static>true</static>
                <type>
                    <kind>class-or-struct</kind>
                    <declaration>TestNamespace::TestClass</declaration>
                    <namespace>
                        <name>TestNamespace::</name>
                    </namespace>
                </type>
            </global_var_entry>
        </namespace>
    </elements>
</ast>
Declaring Globals

Conclusion

It has taken a while to get this far but I will dive into the internals of the framework and provide examples of code injection in future posts.