Generic Interfaces with Generic Lambdas with C++ and SYCL

Posted on June 11, 2019 by Georgi Mirazchiyski.

Lambdas in SYCL

C++ Lambdas, first introduced in C++11, are an important part of the way that the SYCL standard is defined and implemented. SYCL is required to handle different types and pass around functions so lambdas are a good fit allowing anonymous function objects to be passed to SYCL kernels. We talk about how we use lambdas in our guides and documentation, but never about how lambdas work or even how to use them in SYCL, so in this blog post we will examine how they can be used in SYCL.

The C++ standard library allows us to pass lambdas to functions in order to provide a generic interface to the algorithms in the library so the users can have the ability to define their desired behavior. In this sense, lambdas allow us to program high-level APIs for generic use. I found a great language feature resource from the Standard C++ Organization which showcases such uses of lambdas, and describes the general differences between C++11 lambdas and C++14 generic lambdas.

C++ templates and lambda functions are really helpful tools for writing high-level application software. SYCL, on the other hand, is an interface that aims to enable developers to write clean code for optimized acceleration of software applications across an extensive range of hardware. To achieve that, SYCL interacts almost seamlessly with modern C++ features such as template code and lambda functions, aiming to make developing complex high-level and high-performance applications easier that it has ever been.

A good example of how both can be used together to great effect can be seen in the Khronos Parallel STL implementation of the Technical Specification for C++ Extensions for Parallelism. It enables acceleration of standard C++ algorithms where the operation carried out by the SYCL kernel can be a user-defined lambda function (or a class functor) that is provided by the developer calling the algorithm.

Introducing generic lambdas

C++14 has improved on C++11 lambdas with the introduction of generic lambda expressions and this means that developers can now utilize the auto type-specifier to indicate a generic lambda parameter.

Note: The auto type deduction follows the rules of template argument deduction.

Declaring a generic lambda is as easy as this:

auto lambda = [](auto a, auto b, auto c) { return a + b + c; };

and it can them be used with code like this:

auto res = lambda(1, 2, 3); 

or another way to do it would be:

auto res = lambda(1, 2.2f, 3.3);

This definition gives a feeling of having a lot of “sugar sprinkled over the underlying C++ syntax”. Under the hood, the C++ implementation uses the closure type’s operator() to overload a template function.
Now here's some code that defines a type with equivalent behavior to the generic lambda defined above:

struct lambda { 
template<typename T1, typename T2, typename T3>
auto operator()(T1 a, T2 b, T3 c) const { return a + b + c; }
};

This can be used like this:

auto res = lambda{}(1, 2.2f, 3.3);

Both will give us the same results, however, using a generic lambda is generally the easier and preferred choice since it was added to C++14.

And what makes generic lambdas such an interesting feature of C++ to use, is that it is a pure extension to the language and compiler implementations will output same sized binary for generic lambdas compared to the equivalent of using the template functor. This means there is no overhead. You can verify this by exploring the examples on Compiler Explorer.

Note: Although it seems logical, you may have already guessed that not all parameters in a generic lambda need to be defined with auto , but in most cases you would use generic lambdas like that to simplify type deduction.

Furthermore, lambdas have a capture clause, where we can specify which outside variables are available for the lambda function and whether they should be captured by value or by reference.

Capture by value

int scalar = 10; 
auto lambda = [scalar](auto a, auto b, auto c) { return (a + b + c) * scalar; };

Capture by reference

int result; 
auto lambda = [&result](auto a, auto b, auto c) {
int scalar = 10;
return result = (a + b + c) * scalar;
};

Capture by value and reference

int result, scalar = 10; 
auto lambda = [&result, scalar](auto a, auto b, auto c) {
return result = (a + b + c) * scalar;
};

You can also have the following capture: [=] , which is a default capture by value for everything from the global scope. Similarly, [&] will capture by reference. These are C++11 captures.

C++14 introduces generalized capture that allows the initialization of new variables in the capture clause, without the need to have those variables exist in the lambda function’s enclosing scope.
An example of initialization inside the capture block can be seen below.

auto lambda = [scalar = 5](auto a, auto b, auto c) { return (a + b + c) * scalar; };

In summary, we can now use C++14 to express something we couldn’t express before - a generic lambda that will work with any suitable type and just do the right thing.

Using generic lambdas in SYCL

Now let’s talk about how generic lambdas can be used in SYCL kernel code, and more specifically, within the context of Codeplay’s SYCL implementation - ComputeCpp.

ComputeCpp has been developed purely in C++ with full support for C++11 and partial support for C++14 with a view to support newer features of C++. As of now, generic lambdas as can be part of SYCL kernel code.

Note that there is no guarantee that C++14 language features which are supported by ComputeCpp will also work accross other SYCL implementations because the minimum requirement of SYCL is a C++11 compiler.

Defining kernels in SYCL through generic lambdas can come in handy every so often, depending on how one would like to structure the application being developed.

This is how a standard SYCL kernel functor would look, defined as a C++ class.

template <typename T> class MyFunctor { 
public:
MyFunctor(accessor_in<T>, accessor_out<T>) {
<<< initialize accessors >>>
}

void operator()(item<1> item) {
auto id = item.get_id(0);
accessor_out[id] = foo(accessor_in[id]);
}

T foo() { return T{ <<< compute a result >>> }; }

private:
sycl_accessor<T> _accessor_in;
sycl_accessor<T> _accessor_out;
};

Here T foo() { return T{<<< compute a result >>>}; } is part of the SYCL kernel functor MyFunctor and does some computation, the result of which is assigned to the global accessor_out.

When developing high-level APIs or libraries, we may sometimes be required to allow the users to define a certain behavior of the feature provided by these APIs or libraries. For example, this is a common approach in the implementation of the C++ STL.
This is where generic lambdas are so handy to use and thankfully the ComputeCpp implementation supports them. Taking the above code block as an example, instead of implementing foo() inside the MyFunctor class, we can declare, by using templates, that this function will be provided by the user or the caller of the said class functor.

Enough abstract talk, let's dive into a real SYCL example program that defines a behavior that can be implemented using generic lambdas.

SYCL application example

To introduce how generic lambdas can be used with SYCL, we will build a simple application.
Let's write a sequence generator that could generate a desired sequence through a user-defined lambda that will be passed in the SYCL kernel that carries out the computation.

Before we begin, let's clearly define the goals and requirements for building this application feature:


Goal
: A sequence generation function that works with C++ vectors and is accelerated using SYCL.
Requirement: User-defined operations within the SYCL kernel code via the use of generic lambda functions.

To begin with, set up the host containers that will hold the sequence. These consist of a vector holding the positions of the elements for the sequence and a vector for the numbers in each position.

constexpr size_t num_elements = SOME_BIG_VALUE; 
std::vector<int> input(num_elements);
std::iota(input.begin(), input.end(), 0);
std::vector<int> output(num_elements);
std::fill(output.begin(), output.end(), 0);

Before we get into implementation, we have to define our kernel type.

namespace kernels { 
class generator_kernel {};
}

Now we want a generic function that will generate a sequence of elements based on what we request. It could factorial, fibonacci, incremental, or whatever we need and it also doesn't need to be strictly numeric. To provide this behavior, we have to define a template function, where we will have the data type and the operation that we will define by using a lambda. The following code isn't "production ready" but gives an idea of what could be done using SYCL and generic lambdas together.

template <typename T, class Operation> 
void generate_seq(const std::vector<T> &input,
std::vector<T> &output,
size_t num_elements, Operation op) {
buffer<T, 1> input_buf(input.data(), range<1>(num_elements));
buffer<T, 1> output_buf(output.data(), range<1>(num_elements));

// define and submit the SYCL queue
queue queue(default_selector{});
queue.submit([&](handler &cgh) {
auto input_acc = input_buf.template get_access<access::mode::read>(cgh);
auto output_acc = output_buf.template get_access<access::mode::write>(cgh);

<< Execute the kernel here >>
});
}

class Operation here is a template parameter which will be replaced with the type of the lambda function provided when generate_seq gets called. Currently it is designed, or more accurately, required to take in an index which indicates how many iterations will be done to generate the sequence. A smarter approach may be designed, however this is quick enough to showcase the generic behavior via lambda functions that we aim to achieve with generate_seq.

Next up, the use of our generic lambda operation can be seen in the SYCL kernel execution part.

cgh.parallel_for<kernels::generator_kernel>( 
range<1>(num_elements), [=](id<1> idx) mutable {
output_acc[idx] = op(input_acc[idx]);
});

The kernel becomes rather simple, as all of the computational logic is now defined by the lambda passed as op.

Let's now define a generic lambda function that generates a sequence. Here is a simple one, that increments (by 1) every number in the sequence, starting from 0 .

auto increment = [res = 0](auto idx) mutable { 
for (size_t i = 0; i < idx; i++) {
res++;
}
return res;
};

[res = 0] is an int value initialized in the lambda capture list. As already mentioned in the introduction, C++14 allows us to not only capture variables from the outer scope, but also initialize expressions inside the lambda capture.

As another example of what we can do with our generic sequence generator, we can provide a lambda called increment_and_multiply_by_scalar that demonstrates how we use our input indexes to generate more specific sequences. Furthermore, we can also capture the previously created increment generic lambda and use it to do the increment, while also avoiding copy-pasting the same code. Then, the incremented value is multiplied by the captured scalar.

auto increment_and_multiply_by_scalar = [increment, res = 0, 
scalar = 10](auto idx) mutable {
res = increment(idx);
res *= scalar;
return res;
};

auto idx is the work-item id or our index derived from the input array of index values.

Finally, to prove how flexible our SYCL generator is, and to make things a little bit more interesting, we can even provide a lambda that gives us the fibonacci sequence.

auto fibonacci = [res = 0, a = 1, b = 1](auto idx) mutable { 
if (idx <= 0) {
return 0;
}
if (idx > 0 && idx < 3) {
return 1;
}
for (size_t i = 2; i < idx; i++) {
res = a + b; a = b; b = res;
}
return res;
};

And we could keep going with all sorts of sequence types. As we have already mentioned, the SYCL implementation, ComputeCpp, is written in modern C++ and enables application developers to create high-level software by utilizing templates and features such as generic lambdas that can be invoked directly inside the SYCL kernel code.

We have introduced lambdas, and more importantly generic lambdas, and how they can interact with SYCL kernel code to aid the design and implementation of generic interfaces. What the "generic" extension of the C++ lambda functions offer is more flexible and easier use in complex code bases. With C++14 support we can get a simple type deduction using auto in the lambda parameter list, and we finally have a way for expressing specific functionality when interacting with high-level application interfaces.
Thanks to the SYCL's vision for the future, we can get the best of both worlds, developing flexible, generic applications with no-tradeoffs in performance and also very importantly, minimizing development time and complexity.

If you are eager to read more, here is an extensive blog post by Bartlomiej Filipek explaining what can be done with lambdas from C++11 to 20, providing good examples and explanations on features you may keep in mind for future language support in releases of ComputeCpp.

You can download ComputeCpp, Codeplay's implementation of SYCL on our developer website.