Tag Archives: GCC

Serial and SIMD implementation of the Xoshiro256+ random number generator – Part 1 Implementation and Usage

The Xoshiro256PlusSIMD project provides a C++ implementation of Xoshiro256+ random number generator that matches the performance of the reference C implementation of David Blackman and Sebastiano Vigna (https://prng.di.unimi.it/). Xoshiro256+ combines high speed, small memory space requirements for stored state and excellent statistical quality. For cryptographic use cases or use cases where absolutely the best statistical quality is required – maybe consider a different RNG like the Mersenne Twist. For any any other conventional simulation or testing use case, Xoshiro256+ should be perfectly fine statistically and better than a whole lot of other slower alternatives.

This implementation is a header-only library and provides the following capabilities:

  • Single 64 bit unsigned random value
  • Single 64 bit unsigned random value reduced to a [lower, upper) range
  • Four 64 bit unsigned random values
  • Four 64 bit unsigned random values reduced to a [lower, upper) range
  • Single double length real random value in a range of (0,1)
  • Single double length real random value in a (lower, upper) range
  • Four double length real random values in a range of (0,1)
  • Four double length real random values in a (lower, upper) range

Implementation Details

For platforms supporting the AVX2 instruction set, the RNG can be configured to use AVX2 instructions or not on an instance by instance basis. AVX2 instructions are only used for the four-wide operations, there is no advantage using them for single value generation.

The four-wide operations use a different random seed per value and the the seed for single value generation is distinct as well. The same stream of values will be returned by the serial and AVX2 implementations. It might be faster for the serial implementation to use only a single seed across all the four values – each increasing index being the next value in a single series, instead of each of the four values having its unique series. The downside of that approach is that the serial implementation would return different four wide values than the AVX2 implementation. The AVX2 implementation must use distinct seeds for each of the four values.

The random series for each of the four-wide values are separated by 2^192 values – i.e. a Xoshiro256+ ‘long jump’ separates the seed for each of the four values. For clarity, the Xoshiro256+ has a state space of 2^256.

The reduction of the uint64s to an integer range takes uint32 bounds. This is a significant reduction in the size of the random values but permits reduction while avoiding taking a modulus. If you have a need for random integer values beyond uint32 sizes, I’d suggest taking the full 64 bit values and applying your own reduction algorithm. The modulus approach to reduction is slower than the approach in the code which uses shifts and a multiply.

Finally, the AVX versions are coded explicitly with AVX intrinsics, there is no reliance on the vageries of compiler vectorization. The SIMD version could be written such that gcc should unroll loops and vectorize but others have reported that it is necessary to tweak optimization flags to get the unrolling to work. For these implementations, all that is needed is to have the -mavx2 compiler option and the AVX2_AVAILABLE symbol defined.

Usage

The class Xoshiro256Plus is a template class and takes an SIMDInstructionSet enumerated value as its only template parameter. SIMDInstructionSet may be ‘NONE’, ‘AVX’ or ‘AVX2’. The SIMD acceleration requires the AVX2 instruction set and uses ‘if contexpr’ to control code generation at compile time. There is also a preprocessor symbol AVX2_AVAILABLE which must be defined to permit AVX2 instances of the RNG to be created. It it completely reasonable to have the AVX2 instruction set available but still use an RNG instance with no SIMD acceleration.

#define __AVX2_AVAILABLE__

#include "Xoshiro256Plus.h"

constexpr size_t NUM_SAMPLES = 1000;
constexpr uint64_t SEED = 1;

typedef SEFUtility::RNG::Xoshiro256Plus Xoshiro256PlusSerial;
typedef SEFUtility::RNG::Xoshiro256Plus Xoshiro256PlusAVX2;

bool InsureFourWideRandomStreamsMatch()
{
    Xoshiro256PlusSerial serial_rng(SEED);
    Xoshiro256PlusAVX2 avx_rng(SEED);

    for (auto i = 0; i < NUM_SAMPLES; i++)
    {
        auto next_four_serial = serial_rng.next4( 200, 300 );
        auto next_four_avx = avx_rng.next4( 200, 300 );

        if(( next_four_serial[0] != next_four_avx[0] ) ||
           ( next_four_serial[1] != next_four_avx[1] ) ||
           ( next_four_serial[2] != next_four_avx[2] ) ||
           ( next_four_serial[3] != next_four_avx[3] ))
        { return false; }
    }

    return true;
}

HeapWatcher : Memory Leak Detector for Automated Testing


This project provides a simple tool for tracking heap allocations between start/finish points in C++ code. It is intended for use in unit test and perhaps some feature tests. It is not a replacement for Valgrind or other memory debugging tools – the primary intent is to provide an easy-to-use tool that can be added to unit tests built with GoogleTest or Catch2 to find leaks and provide partial or full stack dumps of leaked allocations.

The project can be found in github at: https://github.com/stephanfr/HeapWatcher

Design

The C standard library functions of malloc(), calloc(), realloc() and free() are ‘weak symbols‘ in glibc and can be replaced by user-supplied functions with the same signatures supplied in a user static library or shared object. This tool wraps the c standard library calls and then tracks all allocations and frees in a map. The ‘book-keeping’ is performed in a separate thread to (1) limit the need for mutexes or critical sections to protect shared state and (2) limit the run-time performance impact on the code under test. The functions in HeapWatcher are not intrusive in that they simply delegate to the glibc functions and then track allocations in a separate data structure. Allocation tracking can be paused in any thread being tracked and there is a facility to capture stack traces for ‘intentional leaks’ and then ignore those for tracking purposes.

There exists a single global static instance of HeapWatcher which can be accessed with the SEFUtility::HeapWatcher::get_heap_watcher() function.

Additionally, there are a pair of multi-threaded test fixtures provided in the project. One fixture launches workload threads and requires the user to manage the heap watcher. The second test fixture integrates the heap watcher and tracks all allocations made while the instance of the fixture itself is in scope.

For memory intensive applications running on many cores, the single tracker thread may be insufficient. All allocation records go into a queue, will not be lost and will eventually be processed. Potential problems can arise if the application allocates faster than the single thread can keep up and the queue used for passing the records to the tracker thread grows to the point that it exhausts system memory. When the HeapWatcher stops, the memory snapshot it returns is the result of processing all allocation records – so it should be correct.

Including into a Project

Probably the easiest way to use HeapWatcher is to include it through the fetch mechanism provided by CMake:

FetchContent_Declare(
    heapwatcher
    GIT_REPOSITORY "https://github.com/stephanfr/HeapWatcher.git" )

FetchContent_MakeAvailable(heapwatcher)

include_directories(
    ${heapwatcher_SOURCE_DIR}/include
    ${heapwatcher_BIN_DIR}
)

The CMake specification for HeapWatcher will build the library which muct be linked into your peoject. In addition, for the call stack decoding to work properly, the following linker option must be included in your project as well:

SET(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -rdynamic")


HeapWatcher is not a header-only project, the linker must have concrete instances of malloc(), calloc(), realloc() and free() to link to the rest of the code under test. Given the ease of including the library with CMake, this doesn’t present much of a problem overall.

Using HeapWatcher


Only a single header file HeapWatcher.hpp must be included in any file wishing to use the tool. This header contains all the data structures and classes needed to use the tool. The HeapWatcher class itself is fairly simple and the call to retrieve the global instance is trivial :

namespace SEFUtility::HeapWatcher
{
    class HeapWatcher
    {
        public:
            virtual void start_watching() = 0;
            virtual HeapSnapshot stop_watching() = 0;

            [[nodiscard]] virtual PauseThreadWatchGuard pause_watching_this_thread() = 0;
            
            virtual uint64_t capture_known_leak(std::list<std::string>& leaking_symbols, std::function<void()> function_which_leaks) = 0;
            [[nodiscard]] virtual const KnownLeaks known_leaks() const = 0;

            [[nodiscard]] virtual const HeapSnapshot snapshot() = 0;
            [[nodiscard]] virtual const HighLevelStatistics high_level_stats() = 0;
    };

    HeapWatcher& get_heap_watcher();
}


Note the namespace declaration. There are a number of other classes declared in the HeapWatcher.cpp header for the HeapSnapshot and to provide the pause watching capability. A simple example of using HeapWatcher in a Catch2 test appears below:

void OneLeak() { int* new_int = static_cast(malloc(sizeof(int))); }

void OneLeakNested() { OneLeak(); }
   
TEST_CASE("Basic HeapWatcher Tests", "[basic]")
{
    SECTION("One Leak Nested", "[basic]")
    {
        SEFUtility::HeapWatcher::get_heap_watcher().start_watching();

        OneLeakNested();

        auto leaks(SEFUtility::HeapWatcher::get_heap_watcher().stop_watching());

        REQUIRE(leaks.open_allocations().size() == 1);

        REQUIRE_THAT(leaks.open_allocations()[0].stack_trace()[0].function(), Catch::Matchers::Equals("OneLeak()"));
        REQUIRE_THAT(leaks.open_allocations()[0].stack_trace()[1].function(),
                    Catch::Matchers::Equals("OneLeakNested()"));

        REQUIRE(leaks.high_level_statistics().number_of_mallocs() == 1);
        REQUIRE(leaks.high_level_statistics().number_of_frees() == 0);
        REQUIRE(leaks.high_level_statistics().number_of_reallocs() == 0);
        REQUIRE(leaks.high_level_statistics().bytes_allocated() == sizeof(int));
        REQUIRE(leaks.high_level_statistics().bytes_freed() == 0);
    }
}

Capturing Known Leaks

In various third party libraries there exist intentional leaks. A good example is the leak of a pointer for thread local storage for each thread created by the pthread library. There is a leak from the symbol ‘dl_allocate_tls‘ that appears to remain even after std::thread::join() is called. This appears not infrequently in Valgrind reports as well. Given the desire to make this a library for automated testing, there is the capability to capture and then ignore allocations from certain functions or methods. An example appears below:

SECTION("Known Leak", "[basic]")
{
    std::list<std::string> leaking_symbol({"KnownLeak()"});

    REQUIRE( SEFUtility::HeapWatcher::get_heap_watcher().capture_known_leak(leaking_symbol, []() { KnownLeak(); }) == 1 );

    REQUIRE(SEFUtility::HeapWatcher::get_heap_watcher().known_leaks().addresses().size() == 2);
    REQUIRE_THAT(SEFUtility::HeapWatcher::get_heap_watcher().known_leaks().symbols()[0].function(),
                 Catch::Matchers::Equals("_dl_allocate_tls"));
    REQUIRE_THAT(SEFUtility::HeapWatcher::get_heap_watcher().known_leaks().symbols()[1].function(),
                 Catch::Matchers::Equals("KnownLeak()"));

    SEFUtility::HeapWatcher::get_heap_watcher().start_watching();

    OneLeakNested();
    KnownLeak();
    OneLeak();

    auto leaks(SEFUtility::HeapWatcher::get_heap_watcher().stop_watching());

    REQUIRE(leaks.open_allocations().size() == 2);
}

The capture_known_leak() method takes two arguments: 1) a std::list<std::string> containing one or more symbols which if located in a stack trace will cause the allocation associated with the trace to be ignored and 2) a function (or lambda) which will evoke one or more leaks associated with the symbols passed in the first argument. The leaking function need not be just adjacent to the malloc, it may be further up the call stack but the allocation will only be ignored if it appears at the same number of frames above the memory allocation as at the time the leak was captured.

This approach of actively capturing the leak at runtime is effective for dealing with ASLR (Address Space Layout Randomization) and does not require loading of shared libraries or other linking or loading gymnastics.

Pausing Allocation Tracking


The PauseThreadWatchGuard instance returned by a call to HeapWatcher::pause_watching_this_thread() is a scope based mechanism for suspending heap activity tracking in a thread. For example, the above snippet can be modified to not log the leak in OneLeakNested() by obtaining a guard and putting the leaking call into the same scope as the guard:

    SEFUtility::HeapWatcher::get_heap_watcher().start_watching();

    {
      auto pause_watching = SEFUtility::HeapWatcher::get_heap_watcher().pause_watching_this_thread();

      OneLeakNested();
    }

    auto leaks(SEFUtility::HeapWatcher::get_heap_watcher().stop_watching());

    REQUIRE(leaks.open_allocations().size() == 0);

Once the guard instance goes out of scope, HeapWatcher will again start tracking allocations in the thread.

Test Fixtures

Two test fixtures are included with HeapWatcher and both are intended to ease the creation of multi-threaded unit test cases, which are useful for detecting race conditions or dead locks. The test fixtures feature the ability to add functions or lambdas for ‘workload functions’ and then start all of those ‘workload functions’ simultaneously. Alternatively, ‘workload functions’ may be given a random start delay in seconds (as a double so it may be fractions of a second as well). This permits stress testing with a lot of load started at one time or allows for load to ramp over time.

The SEFUtility::HeapWatcher::ScopedMultithreadedTestFixture class starts watching the heap on creation and takes a function or lambda which will be called with a HeapSnapshot when all threads have completed, to permit testing the final heap state. This test fixture effectively hides the HeapWatcher instructions whereas the SEFUtility::HeapWatcher::MultithreadedTestFixture class requires the user to wrap the test fixture with the HeapWatcher start and stop.

Examples of both test fixtures appear below. First is an example of MultithreadedTestFixture :

    SECTION("Torture Test, One Leak", "[basic]")
    {
        constexpr int64_t num_operations = 2000000;
        constexpr int NUM_WORKERS = 20;

        SEFUtility::HeapWatcher::MultithreadedTestFixture test_fixture;

        SEFUtility::HeapWatcher::get_heap_watcher().start_watching();

        test_fixture.add_workload(NUM_WORKERS,
                                  std::bind(&RandomHeapOperations, num_operations));  //  NOLINT(modernize-avoid-bind)
        test_fixture.add_workload(1, &OneLeak);

        std::this_thread::sleep_for(10s);

        test_fixture.start_workload();
        test_fixture.wait_for_completion();

        auto leaks = SEFUtility::HeapWatcher::get_heap_watcher().stop_watching();

        REQUIRE(leaks.open_allocations().size() == 1);
    }

An example of ScopedMultiThreadedTestFixture follows :

    SECTION("Two Workloads, Few Threads, one Leak", "[basic]")
    {
        constexpr int NUM_WORKERS = 5;

        SEFUtility::HeapWatcher::ScopedMultithreadedTestFixture test_fixture(
            [](const SEFUtility::HeapWatcher::HeapSnapshot& snapshot) { REQUIRE(snapshot.numberof_leaks() == 5); });

        test_fixture.add_workload(NUM_WORKERS, &BuildBigMap);
        test_fixture.add_workload(NUM_WORKERS, &OneLeak);

        std::this_thread::sleep_for(1s);

        test_fixture.start_workload();
    }

Conclusion

HeapWatcher and the multithreaded test fixture classes are intended to help developers create tests which check for memory leaks either in simple procedural test cases written with GoogleTest or Catch2 or in more complex multi-threaded tests in those same base frameworks.

https://github.com/stephanfr/HeapWatcher

Building GCC Plugins – Part 2: Introduction to GCC Internals

Once the basic scaffolding is in place for a GCC Plugin, the next step is to analyze and perhaps modify the Abstract Syntax Tree (AST) created by GCC as a result of parsing the source code.  GCC is truly a marvel of software engineering, it is the de-facto compiler for *nix environments and supports a variety of front ends for different langauages (even Ada…).  That said, the GCC AST is complex to navigate for a number of reasons.  First, parsing and representing a variety of languages in a common syntax tree is a complex problem so the solution is going to be complex.  Second, history – looking at the GCC internals is a bit like walking down memory lane; this is the way we wrote high-performance software when systems had limited memory (think 64k) and CPUs had low throughput (think 16Mhz clock cycles).  Prior to GCC 4.8.0, GCC was compiled with the C compiler, so don’t bother looking for C++ constructs in the source code.

The AST Tree

The primary element in the GCC AST is the ‘tree’ structure.  An introduction to the tree structure appears in the GCC Internals Documentation.  Figure 1 is extracted from the tree.h header file and provides a good starting place for a discussion of the GCC tree and how to approach programming with it.


union GTY ((ptr_alias (union lang_tree_node),
 desc ("tree_node_structure (&%h)"), variable_size)) tree_node {
 struct tree_base GTY ((tag ("TS_BASE"))) base;
 struct tree_typed GTY ((tag ("TS_TYPED"))) typed;
 struct tree_common GTY ((tag ("TS_COMMON"))) common;
 struct tree_int_cst GTY ((tag ("TS_INT_CST"))) int_cst;
 struct tree_real_cst GTY ((tag ("TS_REAL_CST"))) real_cst;
 struct tree_fixed_cst GTY ((tag ("TS_FIXED_CST"))) fixed_cst;
 struct tree_vector GTY ((tag ("TS_VECTOR"))) vector;
 struct tree_string GTY ((tag ("TS_STRING"))) string;
 struct tree_complex GTY ((tag ("TS_COMPLEX"))) complex;
 struct tree_identifier GTY ((tag ("TS_IDENTIFIER"))) identifier;
 struct tree_decl_minimal GTY((tag ("TS_DECL_MINIMAL"))) decl_minimal;
 struct tree_decl_common GTY ((tag ("TS_DECL_COMMON"))) decl_common;
 struct tree_decl_with_rtl GTY ((tag ("TS_DECL_WRTL"))) decl_with_rtl;
 struct tree_decl_non_common GTY ((tag ("TS_DECL_NON_COMMON"))) decl_non_common;
 struct tree_parm_decl GTY ((tag ("TS_PARM_DECL"))) parm_decl;
 struct tree_decl_with_vis GTY ((tag ("TS_DECL_WITH_VIS"))) decl_with_vis;
 struct tree_var_decl GTY ((tag ("TS_VAR_DECL"))) var_decl;
 struct tree_field_decl GTY ((tag ("TS_FIELD_DECL"))) field_decl;
 struct tree_label_decl GTY ((tag ("TS_LABEL_DECL"))) label_decl;
 struct tree_result_decl GTY ((tag ("TS_RESULT_DECL"))) result_decl;
 struct tree_const_decl GTY ((tag ("TS_CONST_DECL"))) const_decl;
 struct tree_type_decl GTY ((tag ("TS_TYPE_DECL"))) type_decl;
 struct tree_function_decl GTY ((tag ("TS_FUNCTION_DECL"))) function_decl;
 struct tree_translation_unit_decl GTY ((tag ("TS_TRANSLATION_UNIT_DECL")))
 translation_unit_decl;
 struct tree_type_common GTY ((tag ("TS_TYPE_COMMON"))) type_common;
 struct tree_type_with_lang_specific GTY ((tag ("TS_TYPE_WITH_LANG_SPECIFIC")))
 type_with_lang_specific;
 struct tree_type_non_common GTY ((tag ("TS_TYPE_NON_COMMON")))
 type_non_common;
 struct tree_list GTY ((tag ("TS_LIST"))) list;
 struct tree_vec GTY ((tag ("TS_VEC"))) vec;
 struct tree_exp GTY ((tag ("TS_EXP"))) exp;
 struct tree_ssa_name GTY ((tag ("TS_SSA_NAME"))) ssa_name;
 struct tree_block GTY ((tag ("TS_BLOCK"))) block;
 struct tree_binfo GTY ((tag ("TS_BINFO"))) binfo;
 struct tree_statement_list GTY ((tag ("TS_STATEMENT_LIST"))) stmt_list;
 struct tree_constructor GTY ((tag ("TS_CONSTRUCTOR"))) constructor;
 struct tree_omp_clause GTY ((tag ("TS_OMP_CLAUSE"))) omp_clause;
 struct tree_optimization_option GTY ((tag ("TS_OPTIMIZATION"))) optimization;
 struct tree_target_option GTY ((tag ("TS_TARGET_OPTION"))) target_option;
};

Figure 1: The tree_node structure extracted from the GCC code base.

Fundamentally, a tree_node is a big union of structs.  The union contains a handful of common or descriptive members, but the majority of union members are specific types of tree nodes.  The first tree union member: tree_base is common to all tree nodes and provides the basic descriptive information about the node to permit one to determine the precise kind of node being examined or manipulated.  There is a bit of an inheritance model introduced with tree_base being the foundation and tree_typed and tree_common adding another layer of customization for specific categories of tree nodes to inherit but from there on out the remainder of the union members are specific types of tree nodes.  For example, tree_int_cst is an integer constant node whereas tree_field_decl is a field declaration.

Tree nodes are typed but not in the C language sense of ‘typed’.  One way to think about it is that the tree_node structure is a memory-efficient way to model a class in C prior to C++.  Instead of member functions or methods, there is a large library of macros which act on tree nodes.  In general, macros will fall into two categories: predicate macros which will usually have a ‘_P’ suffix and return a value which can be compared to zero to indicate a false result and transformation macros which take a tree node and usually return another tree node.  Despite the temtpation to dip directly into the public tree_node structure and access or modify the data members directly – don’t do it.  Treat tree nodes like a C++ classes in which all the data members are private and rely on the tree macros to query or manipulate tree nodes.

Relying on the macros to work with the tree_node structure is the correct approach per GCC documentation but will also simply make your life easier.  GCC tree_node structures are ‘strongly typed’ in the sense that they are distinct in the GCC tree type-system and many of the macros expect a specific tree_node type.  For example the INT_CST_LT(A, B) macro expects to have two tree_int_cst nodes passed as arguments – even though the C++ compiler cannot enforce the typing at compile time.  If you pass in the wrong  tree_node type, you will typically get a segmentation violation.  An alternative approach is to compile GCC with the –enable-checking flag set which will enforce runtime checking of node types.

In terms of history, this type of modelling was common back in the day when machines were limited in memory and compute cycles.  This approach is very efficient in terms of memory as the union overlays all the types and there are no virtual tables or other C++ class overhead that consumes memory or requires compute overhead.  The price paid though is that it is 100% incumbent on the developer to keep the type-system front-of-mind and insure that they are invoking the right macros with the right arguments.  The strategy of relying on the compiler to advise one about type mis-matches does not work in this kind of code.

Basics of AST Programming

There are 5 key macros that can be invoked safely on any tree structure.  These three are: TREE_CODE, TREE_TYPE, TREE_CHAIN, TYPE_P and DECL_P.  In general after obtaining a ‘generic’ tree node, the first step is to use the TREE_CODE macro to determine the ‘type’ (in the GCC type-system) of the node.  The TREE_TYPE macro returns the source code ‘type’ associated with the node.  For example, the node result type of a method declaration returning an interger value will have a TREE_TYPE with a TREE_CODE equal to INTEGER_TYPE.  The code for that statement would look like:


TREE_CODE( TREE_TYPE( DECL_RESULT( <em>methodNode</em> ))) == INTEGER_TYPE

Within the AST structure, lists are generally represented as singly-linked lists with the link to the next list member returned by the TREE_CHAIN macro.  For example, the DECL_ARGUMENTS macro will return a pointer to the first parameter for a function or method.  If this value is NULL_TREE, then there are no parameters, otherwise the tree node for the first parameter is returned.  Using TREE_CHAIN on that node will return NULL_TREE if it is the only parameter or will return a tree instance for the next parameter.  There also exists a vector data structure within GCC and it is accessed using a different set of macros.

The TYPE_P and DECL_P macros are predicates which will return non-zero values if the tree passed as an argument is a type specification or a code declaration.  Knowing this distinction is important as it then quickly partitions the macros which can be used with node.  Many macros will have a prefix of ‘TYPE_’ for type nodes and ‘DECL_’ for declaration nodes.  Frequently there will be two sets of identical macros, for instance TYPE_UID will return the GCC generated, internal numeric unique identifier for a type node whereas DECL_UID is needed for a declaration node.  In general, I have found that calling a TYPE_ macro on a declaration or a DECL_ macro on a type specification will result in a segmentation violation.

Other frequently used macros include: DECL_NAME and TYPE_NAME to return a tree node that contains the source code name for a given element.  IDENTIFIER_POINTER can then be used on that tree to return a pointer to the char* for the name.  DECL_SOURCE_FILE, DECL_SOURCE_LINE and DECL_SOURCE_LOCATION are available to map an AST declaration back to the source code location.  As mentioned above, DECL_UID and TYPE_UID return numeric unique identifiers for elements in the source code.

In addition to the above, for C++ source code fed to g++, the compiler will inject methods and  fields not explicitly declared in the c++ source code.  These elements can be identified with the DECL_IS_BUILTIN and DECL_ARTIFICIAL macros.  If as you traverse the AST you trip across oddly named elements, check the node with those macros to determine if the nodes have been created by the compiler.

Beyond this simple introduction, sifting through the AST will require a lot of time reviewing the tree.h and other header files to look for macros that you will useful for your application.  Fortunately, the naming is very consistent and quite good which eases the hunt for the right macro.  Once you think you have the right macro for a given task, try it in your plugin and see if you get the desired result.  Be prepared for a lot of trial-and-error investigation in the debugger.  Also, though there are some GDB scripts to pretty-print AST tree instances, looking at these structure in the debugger will also require some experience, as again the debugger isn’t able to infer much about GCC’s internal type system.

Making the AST Easier to Navigate and Manipulate

I have started a handful of C++ libraries which bridge the gap between the implicit type system in the GCC tree_node structure and explicit C++ classes modelling distinct tree_node types.  For example, a snippet from my TypeTree class appears below in Figure 2.


class TypeTree : public DeclOrTypeBaseTree
 {
 public :

TypeTree( const tree& typeTree )
 : DeclOrTypeBaseTree( typeTree )
 {
 assert( TYPE_P( typeTree ) );
 }

TypeTree& operator= ( const tree& typeTree )
 {
 assert( TYPE_P( typeTree ) );

(tree&)m_tree = typeTree;

return( *this );
 }

 const CPPModel::UID UID() const
 {
 return( CPPModel::UID( TYPE_UID( TYPE_MAIN_VARIANT( m_tree ) ), CPPModel::UID::UIDType::TYPE ) );
 }

 const std::string Namespace() const;

std::unique_ptr<const CPPModel::Type> type( const CPPModel::ASTDictionary& dictionary ) const;

CPPModel::TypeInfo::Specifier typeSpecifier() const;

CPPModel::ConstListPtr<CPPModel::Attribute> attributes();
 };

Figure 2: TypeTree wrapper class for GCC tree_node.

Within this library I make extensive use of the STL, Boost libraries and a number of C++ 11 features.  For example, ConstListPtr<> is a template alias for a std::unique_ptr to a boost::ptr_list class.


template <class T> using ListPtr = std::unique_ptr<boost::ptr_list<T>>;
 template <class T> using ConstListPtr = std::unique_ptr<const boost::ptr_list<T>>;

template <class T> using ListRef = const boost::ptr_list<T>&;

template <class T> ConstListPtr<T> MakeConst( ListPtr<T>& nonConstList ) { return( ConstListPtr<T>( std::move( nonConstList ) ) ); }

Figure 3: Template aliases for lists.

At present the library is capable of walking through the GCC AST and creating a dictionary of all the types in the code being compiled.  Within this dictionary, the library is also able to provide detailed information on classes, structs, unions, functions and global variables.  It will scrape out C++ 11 generalized attributes on many source code elements (not all of the yet though) and return proper declarations with parameters and return types for functions and methods.  The ASTDictionary and the specific language model classes have no dependency on GCC Internals themselves.

The approach I followed for developing the library thus far was to get enough simple code running using the GCC macros that I could then start to refactor into C++ classes.  Along the way, I used Boost strong typedefs to start making sense of the GCC type system at compile time.  Once the puzzle pieces started falling into place and the programming patterns took shape, developing a plugin on top of the libraries is fairly straightforward.  That said, there is a long and painful learning curve associated with GCC internals and the AST itself.

Getting the Code and Disclaimers

The library code is available on Github: ‘stephanfr/GCCPlugin’.  All of the code is under GPL V3.0 which is absolutely required as it runs within GCC itself.  I do not claim that the library is complete, stable, usable or rational – but hopefully some will find it useful if for nothing more than providing some insight into the GCC AST.  For the record, this is not my job nor is it my job to enrich or bug fix the library so you can get your compiler theory class project done in time.  That said, if you pick up the code and either enrich it or fix some bugs – please return the code to me and I will merge what makes sense.

The code should ‘just run’ if you have a GCC Plugin build environment configured per my prior posts.  One detail is that the ‘GCCPlugin Debug.launch’ file will need to be moved to the ‘.launches’ directory of Eclipse’s ‘org.eclipse.debug.core’ plugin directory.  If the ‘.launches’ directory does not exist, then create it.

Debugging GCC in Eclipse CDT 4.2

My favored C++ development environment on Linux is Eclipse CDT.  There are a number of different IDEs available for Linux but I use the Eclipse IDE for Java development and find it easier to stick to that tool for C++.  Eclipse 4.2 CDT is mature and many of the rough edges found in prior releases have been filed off in Juno.

As I am working on a GCC plugin, I needed to create a debug build of GCC and then figure out how to debug it in the IDE.  The procedure to build a debug version of GCC 4.7.2 in Ubuntu 12.04 can be found in my post here.  Once you have a debug GCC built, adding Eclipse CDT and configuring a project for GCC debugging is a straightforward process – but there are a few details that can be added to the environment that make GCC development in Eclipse much more tractable.

Step 1:  Install Java 1.7

Eclipse is supported with the Oracle, IBM and OpenJDK packages.  In the past I’ve typically relied on the Sun JDKs and feel most comfortable using those JDKs for Eclipse.  I use the Web Upd8 PPA repository for Ubuntu to install the JDK.

$ sudo add-apt-repository -y ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install -y oracle-jdk7-installer
$ sudo update-java-alternatives -s java-7-oracle

You can check that the JDK installed correctly by asking for the java version

$ java -version

Step 2: Install Eclipse 4.2

Eclipse 4.2 is not yet in the official Ubuntu repository, so it must be installed manually.  I haven’t been able to find a way to use wget to pull the Eclipse archive directly, so I use the version of Firefox bundled with Ubuntu 12.04 to download the archive.  The Eclipse archives can be found at: http://www.eclipse.org/downloads/?osType=linux.  For Eclipse 4.2 CDT you want to download: eclipse-cpp-juno-linux-gtk-x86_64.tar.gz, assuming you are using 64 bit Ubuntu.

$ mkdir ~/eclipse
$ cd ~/eclipse
$ mkdir 4.2
$ cd 4.2

#
# Download Eclipse 4.2 from: http://www.eclipse.org/downloads/?osType=linux
# The archive you want should be: eclipse-cpp-juno-linux-gtk-x86_64.tar.gz
#

$ tar -xzvf eclipse-cpp-juno-linux-gtk-x86_64.tar.gz
$ sudo cp -r eclipse /usr/lib/
$ sudo cp eclipse/icon.xpm /usr/share/pixmaps/eclipse.xpm

If this is a first install of Eclipse, you will probably want to create a script in /usr/bin that simply launches Eclipse from /usr/lib/eclipse.

$ sudo sh -c "echo '/usr/lib/eclipse/eclipse' >> /usr/bin/eclipse"
$ sudo chmod +x /usr/bin/eclipse

If you want to create a menu entry for the IDE, the best bet is to use the menu management tool: Applications > System Tools > Preferences > Main Menu.  It should automatically pick up the icon from the pixmaps directory.

Step 3: Create two Simple C++ Projects to Debug GCC

From here on, the example assumes that GCC was built with –prefix=/usr/gcc-4.7.2 and –program-suffix=-4.7.2.  If your debug version of GCC was built with different values, then substitute accordingly.

Inside Eclipse, create two new ‘Hello World’ C++ Projects.  Use the ‘Executable’ project – not a ‘GNU Autotools’ project.  For this example I labelled the first project ‘GCC Debug Test’ and the second ‘Project To Build’.  ‘GCC Debug Test’ will be configured to build ‘Project to Build’ using the debug version of GCC.  For clarity, ‘GCC Debug Test’ isn’t even built, it is needed to configure the debug settings for GCC.

First, create a custom .gdbinit file in the ‘GCC Debug Test’ working directory.  Do this by copying the .gdbinit file from the gcc build directory into the project working directory and then fixup the paths in the file to point back to the build directory. The .gdbinit file created during the gcc build process provides a collection of scripts that can be used to print gcc data structures while debugging – this will prove invaluable.   FInally, add ‘set schedule-multiple’ as the first line of the  file – this option causes gdb to track multiple processes in a single debugging session.  The .gdbinit file should look something like this:

set schedule-multiple

dir ~/gcc_build/4.7.2/build/gcc
dir ~/gcc_build/4.7.2/gcc
dir ~/gcc_build/4.7.2/gcc/cp
dir ~/gcc_build/4.7.2/gcc/lto
source ~/gcc_build/4.7.2/build/gcc/gdbinit.in

In the ‘GCC Debug Test’ project, go to the ‘Run > Debug Configurations’ dialog and create a new ‘C/C++ Application’ debug configuration.  On the ‘Main’ tab, enter the path to the ‘gcc-4.7’ debug executable in the ‘/usr/gcc-4.7.2/bin’ directory in the ‘C/C++ Application:’ field in the dialog and click ‘Apply’.

After completing the ‘Main’ tab dialog, click on the ‘Arguments’ tab and enter the path to the test file to be compiled and click ‘Apply’.

Next, click on the ‘Environment’ tab and create two variables: LD_LIBRARY_PATH and PATH.  For LD_LIBRARY_PATH, add the paths to the ‘lib’, ‘lib64’ and ‘lib64/debug’ directories for the GCC build to the front of the environment variable.  For PATH, add the path to the ‘bin’ directory to the front of the path as well.  For both environment variables, add the original paths to the end of the variable.  Make sure the ‘Replace native environment with specified environment’ radio button is selected.

Finally, click on the ‘Debugger’ tab.  In that dialog, insure the ‘Stop on startup at: main’ checkbox is checked.  Enter the path to the .gdbinit file created above into the ‘GDB command file’field.  Finally, check the ‘Automatically debug forked processes’ checkbox.  Since the gcc-4.7.2 application is just a driver for the actual compiler, unless this field is selected the debugger will not debug into the actual compiler when that process is forked by gcc.  Click ‘Apply’ and the configuration is complete.

 

With the debug configuration finished, click ‘Debug’ and gdb should launch and stop at the ‘main’ function for gcc-4.7.

Step 4 : Making Debug GCC the version of GCC to use for Builds

This step is not *strictly* necessary, though building a plugin or modifying the gcc suite and compiling those modifications with a different version of gcc is ill advised for all sorts of good reasons.  Changing compilers in GCC is straightforward, though a multi-step process.

First, navigate to the Project->Properties->Settings dialog and select ‘GCC C++ Compiler’.  Set the ‘Command’ field to the debug version of g++.  I also set the CXX0X experimental symbol and the -std=c++0x option.

Set the ‘Command’ field for the ‘GCC C++ Linker’ as well.

After pressing ‘OK’, the newly built compiler will not be used by Eclipse for compiling this project.  There doesn’t appear to be a way to set these options globally, so the same changes will have to be made for each project you wish to compile with the the debug GCC suite.