Category Archives: C++

New Revision of Cork Computational Geometry Library – runs on Linux !

I have had enough free time lately to return to Cork and have made a couple key improvements to the build :

  • Builds and runs on Linux !
  • Generally 10% faster !
  • Moved to CMake build system
  • Script available to build 3rd Party dependencies
  • Updated to C++20
  • Updated to most recent Boost, TBB and MPIR libraries
  • Started vectorization with AVX2 SIMD instruction set
  • A few improvements to the regression test app
  • Added a few more unit tests
  • Faster OFF file output

Combined this makes for a much smoother ‘getting started’ experience. I will publish a Packer script that can be used to create a Ubuntu Mate 20.04 VM in VirtualBox or Proxmox for development.

The Github repository is here : https://github.com/stephanfr/Cork.git At present I am working in the v0.9.0 branch.

I plan to move forward and bring the 3rd party dependencies up to date and build out more unit tests while working on performance improvements. I believe there are a number of places in the code that will benefit from AVX2 vectorization.

Serial and SIMD implementation of the Xoshiro256+ random number generator – Part 1 Implementation and Usage

The Xoshiro256PlusSIMD project provides a C++ implementation of Xoshiro256+ random number generator that matches the performance of the reference C implementation of David Blackman and Sebastiano Vigna (https://prng.di.unimi.it/). Xoshiro256+ combines high speed, small memory space requirements for stored state and excellent statistical quality. For cryptographic use cases or use cases where absolutely the best statistical quality is required – maybe consider a different RNG like the Mersenne Twist. For any any other conventional simulation or testing use case, Xoshiro256+ should be perfectly fine statistically and better than a whole lot of other slower alternatives.

This implementation is a header-only library and provides the following capabilities:

  • Single 64 bit unsigned random value
  • Single 64 bit unsigned random value reduced to a [lower, upper) range
  • Four 64 bit unsigned random values
  • Four 64 bit unsigned random values reduced to a [lower, upper) range
  • Single double length real random value in a range of (0,1)
  • Single double length real random value in a (lower, upper) range
  • Four double length real random values in a range of (0,1)
  • Four double length real random values in a (lower, upper) range

Implementation Details

For platforms supporting the AVX2 instruction set, the RNG can be configured to use AVX2 instructions or not on an instance by instance basis. AVX2 instructions are only used for the four-wide operations, there is no advantage using them for single value generation.

The four-wide operations use a different random seed per value and the the seed for single value generation is distinct as well. The same stream of values will be returned by the serial and AVX2 implementations. It might be faster for the serial implementation to use only a single seed across all the four values – each increasing index being the next value in a single series, instead of each of the four values having its unique series. The downside of that approach is that the serial implementation would return different four wide values than the AVX2 implementation. The AVX2 implementation must use distinct seeds for each of the four values.

The random series for each of the four-wide values are separated by 2^192 values – i.e. a Xoshiro256+ ‘long jump’ separates the seed for each of the four values. For clarity, the Xoshiro256+ has a state space of 2^256.

The reduction of the uint64s to an integer range takes uint32 bounds. This is a significant reduction in the size of the random values but permits reduction while avoiding taking a modulus. If you have a need for random integer values beyond uint32 sizes, I’d suggest taking the full 64 bit values and applying your own reduction algorithm. The modulus approach to reduction is slower than the approach in the code which uses shifts and a multiply.

Finally, the AVX versions are coded explicitly with AVX intrinsics, there is no reliance on the vageries of compiler vectorization. The SIMD version could be written such that gcc should unroll loops and vectorize but others have reported that it is necessary to tweak optimization flags to get the unrolling to work. For these implementations, all that is needed is to have the -mavx2 compiler option and the AVX2_AVAILABLE symbol defined.

Usage

The class Xoshiro256Plus is a template class and takes an SIMDInstructionSet enumerated value as its only template parameter. SIMDInstructionSet may be ‘NONE’, ‘AVX’ or ‘AVX2’. The SIMD acceleration requires the AVX2 instruction set and uses ‘if contexpr’ to control code generation at compile time. There is also a preprocessor symbol AVX2_AVAILABLE which must be defined to permit AVX2 instances of the RNG to be created. It it completely reasonable to have the AVX2 instruction set available but still use an RNG instance with no SIMD acceleration.

#define __AVX2_AVAILABLE__

#include "Xoshiro256Plus.h"

constexpr size_t NUM_SAMPLES = 1000;
constexpr uint64_t SEED = 1;

typedef SEFUtility::RNG::Xoshiro256Plus Xoshiro256PlusSerial;
typedef SEFUtility::RNG::Xoshiro256Plus Xoshiro256PlusAVX2;

bool InsureFourWideRandomStreamsMatch()
{
    Xoshiro256PlusSerial serial_rng(SEED);
    Xoshiro256PlusAVX2 avx_rng(SEED);

    for (auto i = 0; i < NUM_SAMPLES; i++)
    {
        auto next_four_serial = serial_rng.next4( 200, 300 );
        auto next_four_avx = avx_rng.next4( 200, 300 );

        if(( next_four_serial[0] != next_four_avx[0] ) ||
           ( next_four_serial[1] != next_four_avx[1] ) ||
           ( next_four_serial[2] != next_four_avx[2] ) ||
           ( next_four_serial[3] != next_four_avx[3] ))
        { return false; }
    }

    return true;
}

Managing Privileges for Automated Raspberry Pi GPIO Testing

Many RPi libraries manipulate the GPIO pins by mapping various GPIO control memory blocks into the process address space. For GPIO input/output pins only, the Raspberry Pi OS kernel supports the /dev/gpiomem device which can be accessed from user space. For any other GPIO functions, such as setting up a PWM output, other memory blocks must be accessed through /dev/mem.

Typically, the base address of the desired control block is mapped into the process address space using the mmap command which takes a file descriptor for a /dev/mem device. User space processes cannot open the /dev/mem device, so a common workaround is to run the process as root using sudo. Additionally, GPIO ISR handling typically has much higher fidelity when the ISR dispatching thread runs at one of the ‘real-time’ thread schedules. This too requires elevated privileges.

For many use cases, like automated CI/CD running a test process under sudo is a less than optimal approach and certainly violates the ‘Principle of Least Privilege’. Typically, these kinds of impediments result in skipping automated testing -or- using workarounds like putting root passwords in response files.

Bluntly, there is no way to provide elevated privileges to a process without incurring some security risk and for clarity the approach described below is not strictly *secure* – I feel it is better and more constrained than most of the alternatives I have found. Certainly for hobbyists and individuals working on a benchtop, this is probably more than ‘good enough’.

Background

The RPi maps the GPIO controls into physical memory at 0x3F000000 for BCM2835 (Models 2 &3) and 0x7E200000 for BCM2711 (Model 4) based RPis. To do this, a snippet of code like the following is used :

constexpr uint32_t MAPPED_MEMORY_PROTECTION = PROT_READ | PROT_WRITE | PROT_EXEC;
constexpr uint32_t MAPPED_MEMORY_FLAGS = MAP_SHARED | MAP_LOCKED;

uint32_t peripheral_base = 0xFE000000;
uint32_t gpio_offset = 0x200000;
uint32_t gpio_block_size = 0xF4;

int dev_mem_fd = open("/dev/mem", O_RDWR | O_SYNC );
void*  gpios = mmap( nullptr, gpio_block_size, MAPPED_MEMORY_PROTECTION, MAPPED_MEMORY_FLAGS, dev_mem_fd, peripheral_base + gpio_offset );

The /dev/mem device cannot be opened if the process does not have the CAP_SYS_RAWIO capability. There are a lot of other operations that are permitted for processes with that capability – but an ability to map physical memory into a virtual address space opens up a whole plethora of potential compromises.

Unfortunately, without this privilege any app needing to mount /dev/mem will have to be run with sudo – which is difficult to manage in an automated pipeline or even when running unit tests within an IDE, like using Catch2 in VSCode.

Workaround

Linux permits capabilities to be assigned to files, so it is possible to provide the CAP_SYS_RAWIO capability to specific files – for instance the unit test app created by a makefile. To do this, the following will suffice:

sudo setcap cap_sys_rawio tests

However, every time the tests file is rebuilt, the capabilities must be re-assigned – so we have not really made much progress, there is still a need for the user to intervene and provide a root password after every build.

To workaround this, the interactive user could grant herself the CAP_SETFCAP capability and then the snippet above can be run without requiring sudo. Giving a process CAP_SETFCAP capability is just one small step away from simply running as root, so we should strive for something better.

It is possible to permit a user or group to execute commands with sudo but without requiring a password by adding entries to a sudoers file. In fact, this capability can be fairly tightly constrained to very specific command patterns. These files can be placed under /etc/sudoers.d/ and will be picked up by the sudo processor. An example appears below :

steve ALL=(root) NOPASSWD:  /sbin/setcap cap_sys_rawio+eip /home/sef/dev/unit_tests/tests
steve ALL=(root) NOPASSWD: !/sbin/setcap cap_sys_rawio+eip /home/sef/dev/unit_tests/tests*..*
steve ALL=(root) NOPASSWD: !/sbin/setcap cap_sys_rawio+eip /home/sef/dev/unit_tests/tests*[ ]*

In the example, the first line permits user steve to run /sbin/setcap cap_sys_rawio+eip /home/sef/dev/unit_tests/tests without having to supply a password. The next two lines have an exclamation point to negate the operation and effectively eliminate any permutations of the prior line which could be used to grant the capability to a different, unintended file.

This combination puts us in a place where any process running under steve can provide the CAP_SYS_RAWIO capability to only the /home/sef/dev/unit_tests/tests file without having to supply the root password. Clearly, if steve‘s account is compromised it would possible for someone to gain root – but the attacker would have to do a lot more work to get there if privileges had been provided indiscriminately or if the root password were placed in a response file.

Doing the above gets us close, but there is one more step needed. The /dev/mem file is owned by root and can only be accessed by root. Assigning capabilities granularly elevates privileges in the interactive process but that process is still not root. To resolve this final stumbling block, we can modify the ACL for /dev/mem to permit the interactive user to access it. An example of how to do this appears below :

sudo setfacl -m u:steve:rw /dev/mem

This command will not persist through reboots, but needs to be executed only once after a reboot. It would be possible to make this assignment persistent if desired.

Putting It All Together

The good news is that all of the above can be *mostly* automated as part of a CMake specification. Only two infrequent manual steps are required.

The following example uses a pair of template files and some CMake specifications to create a specialized sudoers file and a specialized shell script for setting the ACLs properly for /dev/mem. Peronally, I put the templates in the misc subdirectory of my unit tests folder and the CMakeLists.txt file is in the unit test folder itself. For the purposes of this example, the templates must simply be in the subdirectory misc of the directory holding the CMakeLists.txt file.

#
#   Allow setcap execute without a password only for the CAP_SYS_RAWIO capability on
#       the tests file.  The negative patterns are intended to reduce the risk of anything 
#       other than just 'tests' being modified
#
#   Copy the generated file with the variables replaced into the /etc/sudoers.d directory
#

$ENV{USER} ALL=(root) NOPASSWD:  /sbin/setcap cap_sys_rawio+eip ${CMAKE_CURRENT_BINARY_DIR}/tests
$ENV{USER} ALL=(root) NOPASSWD: !/sbin/setcap cap_sys_rawio+eip ${CMAKE_CURRENT_BINARY_DIR}/tests*..*
$ENV{USER} ALL=(root) NOPASSWD: !/sbin/setcap cap_sys_rawio+eip ${CMAKE_CURRENT_BINARY_DIR}/tests*[ ]*

Adding the following to the CMakeLists.txt file will generate the final sudoers file. The CMake file command copies the generated file back into the source directory next to the template whilst also setting the file permissions appropriately. After the file is generated, it must be manually copied with the right file permissions to the /etc/sudoers.d/ directory, as that operation requires root privilege.

configure_file( ./misc/020_setcap_rawio_on_test_app.in ${CMAKE_CURRENT_BINARY_DIR}/misc/020_setcap_rawio_on_test_app )
file( COPY ${CMAKE_CURRENT_BINARY_DIR}/misc/020_setcap_rawio_on_test_app
      DESTINATION ${CMAKE_CURRENT_SOURCE_DIR}/misc
      FILE_PERMISSIONS OWNER_READ GROUP_READ WORLD_READ )

Finally, adding the following to the CMakeLists.txt file will assign the CAP_SYS_RAWIO capability to the tests file every time it is generated.

add_custom_command(TARGET tests POST_BUILD
                   COMMAND sudo setcap cap_sys_rawio+eip ${CMAKE_CURRENT_BINARY_DIR}/tests)

To make the ACL assignment easier, a similar process is used. First, a template file which will be processed by CMake is needed :

#!/bin/bash
sudo setfacl -m u:$ENV{USER}:rw /dev/mem

Then, the right magic in CMakeLists.txt to process the template file :

configure_file( ./misc/set_devmem_acl.in ${CMAKE_CURRENT_BINARY_DIR}/misc/set_devmem_acl.sh )
file( COPY ${CMAKE_CURRENT_BINARY_DIR}/misc/set_devmem_acl.sh
      DESTINATION ${CMAKE_CURRENT_SOURCE_DIR}/misc
      FILE_PERMISSIONS OWNER_READ OWNER_WRITE OWNER_EXECUTE GROUP_READ GROUP_EXECUTE WORLD_READ WORLD_EXECUTE )

This will create a shell script with the proper substitutions for the interactive user. This script needs to be executed once per session which seems a reasonable compromise. Alternatively, the sudoers file could be enriched to permit the command to be executed without a password and then even the process of permitting the interactive user access to /dev/mem can be used in automated scripts.

Adding CAP_SYS_NICE

As mentioned in the introduction, ISRs servicing GPIO interrupts will typically need to run with realtime scheduling for reasonable performance. The main risk that concerns me is *missing interrupts* and beyond a couple kilohertz on an RPi4 it is easy to lose interrupts. Realtime scheduling in Linux can be applied at the thread level using pthread_setschedparam in something like the following:

if (realtime_scheduling_)
{
    struct sched_param thread_scheduling;

    thread_scheduling.sched_priority = 5;

    int result = pthread_setschedparam(pthread_self(), SCHED_FIFO, &thread_scheduling);

    if( result != 0 )
    {
        SPDLOG_ERROR( "Unable to set ISR Thread scheduling policy - file may need cap_sys_nice capability.  Error: {}", result );
    }
}

Using realtime scheduling is something of a risk as poorly designed code can starve the rest of the system, or perhaps more frequently a user looking for more responsivity can ‘nice’ their processes to the detriment of other processes. Therefore, the CAP_SYS_NICE capability is required to execute the above snippet.

The templates above can be enriched to include CAP_SYS_NICE, but there are a few details that *really matter*. The nastiest little complication is the difference in how the comma (i.e. ‘,’) is used by the setcap command and how the comma is interpreted in a sudoers file. In both cases it is a separator, to separate multiple capabilities for setcap and to separate different commands in the sudoers file. Therefore, within the setcap command in the sudoers file, the comma must be escaped with a backslash. The following is the template from above including the CAP_SYS_NICE capability.

#
#   Allow setcap execute without a password only for the CAP_SYS_RAWIO and CAP_SYS_NICE capabilities
#       on on the tests file.  The negative patterns are intended to reduce the risk of anything
#       other than just 'tests' being modified
#
#   Copy the generated file with the variables replaced into the /etc/sudoers.d directory
#

$ENV{USER} ALL=(root) NOPASSWD:  /sbin/setcap cap_sys_rawio\,cap_sys_nice+eip ${CMAKE_CURRENT_BINARY_DIR}/tests
$ENV{USER} ALL=(root) NOPASSWD: !/sbin/setcap cap_sys_rawio\,cap_sys_nice+eip ${CMAKE_CURRENT_BINARY_DIR}/tests*..*
$ENV{USER} ALL=(root) NOPASSWD: !/sbin/setcap cap_sys_rawio\,cap_sys_nice+eip ${CMAKE_CURRENT_BINARY_DIR}/tests*[ ]*

As shown above, the comma separating capabilities is escaped with a backslash. Similarly, the setcap command used to assign capabilities to the tests file will have to be modified in the CMakeLists.txt specification.

Conclusion

Hopefully the content in this post will help not only manage permissions necessary for developing GPIO applications on the RPi but also provide some insight into how CMake can be used to generate various kinds of files from templates for specific use cases. I use these in my CMake files in VSCode while developing remotely on RPi 3s and 4s and it is certainly a lot more fluid a development experience than having to enter the root password all the time.