Some examples of using containers (only 2D matrices for now)

Since BackpropTools is a header-only library the compiler only needs to know where its include folder is located (cloned or mounted at /usr/local/include/backprop_tools in the docker image). This is a standard location for header files and the C_INCLUDE_PATH is set to include it in the Dockerfile.

Most operations in BackpropTools are generic and work on any device that supports a C++ 17 compiler (standard library support not required). But there are some device-specific functions like random number generation that are device dependent and hence might require specific implementations that are and often can only be included on that particular device (e.g. Intel CPU, CUDA GPU) hence we include the CPU implementations in this example. In this case, the CPU implementations entail a dependency on a few standard library objects (size_t, random number generation, logging, etc.). At the same time also all the basic generic functions that operate e.g. over containers are included.

#include <backprop_tools/operations/cpu.h>

All objects in BackpropTools are encapsulated in the backprop_tools namespace and there is no global state (not even for logging etc.). In programs using BackpropTools we usually abbreviate the namespace backprop_tools to bpt and define three shorthands for frequently used types. Firstly, DEVICE is the selected device type, T is the floating point type used (usually float or double, where float can e.g. be preferable for vastly better performance on accelerators). Moreover, we define TI as the index type which usually should be the size_t for the device (to match the device’s hardware and provide the best performance). All algorithms and data structures in BackpropTools are agnostic to these types by using the template metaprogramming capabilities of C++. Additionally the DEVICE type is usually used for a static, compile-time version of multiple dispatch to dispatch certain functions (like e.g. a neural network layer forward pass) to code that is optimized for a particular device. Through this design, the same higher-level algorithms can be executed on all sorts of devices from HPC clusters over workstations and laptops to smartphones, smartwatches, and microcontrollers without sacrificing performance. Through template metaprogramming e.g. all the matrix dimensions and the number of for-loop iterations are known a priori at compile time and can be used by the compiler to heavily optimize the code through loop unrolling, inlining etc.

namespace bpt = backprop_tools;
using DEVICE = bpt::devices::DefaultCPU;
using T = float;
using TI = typename DEVICE::index_t;

In the following we instantiate a device struct. The DEVICE struct can be empty and hence have no overhead but facilitate tag dispatch. It can also be used as a carrier for additional context that would otherwise be implemented as global state (e.g. logging through a Tensorboard logger). In the first example we will create a matrix and fill it with random numbers (from an isotropic, standard normal distribution) hence we define the initial seed for our random number generator which is instantiated depending on the device type. This allows us to easily change the DEVICE definition and have all downstream entities be appropriate for the particular device. Finally, we are creating a matrix. Particularly a dynamic (heap allocated) 10x10 matrix. The static, compile-time configuration of the matrix is defined by a specification type (bpt::matrix::Specification<ELEMENT_TYPE, INDEX_TYPE, ROWS, COLS>) that carries the types and compile-time constants. Compiling these attributes into a separate specification instead of having numerous template parameters on the bpt::MatrixDynamic type brings the benefit that writing functions that take matrices as input becomes easier as we just have to add a typename SPEC parameter to the template. We can still constrain the usage of a function with only matrices having particular attributes through e.g. static_assert and SFINAE. Moreover we can add attributes without breaking functions that are written this way.

DEVICE device;
TI seed = 1;
auto rng = bpt::random::default_engine(DEVICE::SPEC::RANDOM(), seed);
bpt::MatrixDynamic<bpt::matrix::Specification<T, TI, 3, 3>> m;

Since we created a dynamic matrix (which just consists of a pointer to the beginning of a memory space) we need to allocate it which is done using bpt::malloc. As with all functions in BackpropTools it takes the device as an input because it provides the (global) context and in this case can be helpful to e.g. align the allocated memory space to certain boundaries to allow for maximum read-write performance for a particular device.

bpt::malloc(device, m);

The memory space is usually not initialized hence we fill it with random numbers (from a standard normal distribution):

bpt::randn(device, m, rng);

Now we can print the allocated and filled matrix:

bpt::print(device, m);
   -0.259093    -1.498961     0.119264
    0.458181     0.394975     0.044197
   -0.636256     1.731264     0.703151

We can access elements using the get and set commands:

bpt::get(m, 0, 0)
bpt::set(m, 0, 0, 1);
bpt::print(device, m);
    1.000000    -1.498961     0.119264
    0.458181     0.394975     0.044197
   -0.636256     1.731264     0.703151

get returns a reference so we could technically also set or increment it through the reference:

bpt::get(m, 0, 0) += 10;
bpt::print(device, m);
   11.000000    -1.498961     0.119264
    0.458181     0.394975     0.044197
   -0.636256     1.731264     0.703151

Writing through the reference is not very intuitive so we prefer set and increment:

bpt::increment(m, 0, 0, -10);
bpt::print(device, m);
    1.000000    -1.498961     0.119264
    0.458181     0.394975     0.044197
   -0.636256     1.731264     0.703151