An anecdote from benchmarking the compile time of mpark/variant, an implementation of
std::variant
for C++11/14/17.
Introduction
std::variant
has a converting constructor that looks like this:
template <typename... Ts>
class variant {
template <typename T>
constexpr variant(T &&);
};
For this constructor, we need to determine which of the Ts...
will be
initialized with T
. For std::variant
, the behavior is defined as follows:
template <class T> constexpr variant(T&& t) noexcept(see below);
Let
Tj
be a type that is determined as follows: build an imaginary functionFUN(Ti)
for each alternative typeTi
. The overloadFUN(Tj)
selected by overload resolution for the expressionFUN(std::forward<T>(t))
defines the alternativeTj
which is the type of the contained value after construction.
For example, given variant<int, std::string> v("hello");
, we want to create
an overload set FUN
with int
and std::string
then manually perform
overload resolution with an argument of type const char (&)[6]
.
The following is the initial approach I took for FUN
:
template <typename Head, typename... Tail>
struct FUN;
// Base case.
template <typename Head>
struct FUN<Head> {
Head operator()(Head) const;
};
// Recursive case.
template <typename Head, typename... Tail>
struct FUN : FUN<Tail...> {
using FUN<Tail...>::operator();
Head operator()(Head) const;
};
Then the winner of overload resolution can be determined using FUN
like this:
using best_overload = std::result_of_t<FUN<int, std::string>(const char (&)[6])>;
The compile-time benchmark of variant
’s converting constructor using this
implementation looked like this on GCC 5:
A minimal example on Wandbox shows (via -ftime-report
) that the cost of
using this particular implementation of FUN
at N = 350
is ~20s!
What’s so expensive…?
Determining the Source
The first thing I found is that commenting out the using
declaration from
the above example takes the compile time down to ~0.15s. So the recursive
inheritance seems fine, and the using
declaration could be the issue,
but by not pulling in the parent operator()
we’re also no longer performing
overload resolution. Could it be the cost of overload resolution?
Well, the non-recursive, hand-rolled version of FUN
which does perform
overload resolution takes ~0.1s. Okay, it must be the using
declaration.
Now what…?
Surrogate Call Function
When considering the overload set for a call to object of a class type,
it’s not just the operator()
s that get involved in the action.
For example, an implicit conversion operator to R (*)(P1, ..., PN)
is also
considered. They participate in overload resolution as surrogate call function
with the unique name call-function and has the form:
using F = R (*)(P1, ..., PN);
R call-function(F f, P1 a1, ..., PN aN) { return f(a1, ..., aN); }
This is an example taken basically verbatim from [over.call.object]:
int f1(int);
int f2(float);
typedef int (*fp1)(int);
typedef int (*fp2)(float);
struct A {
operator fp1() { return f1; }
operator fp2() { return f2; }
} a;
int i = a(1); // calls f1 via pointer returned from conversion function
How does this help us?
What’s in a Name?
Why did we need the using
declaration in the initial approach, anyway?
It’s due to the fact that the name of the overloaded function in
that case was operator()
, and FUN<T, Ts...>::operator()
shadowed
FUN<Ts...>::operator()
. So we needed to pull the
FUN<Ts...>::operator()
name explicitly into FUN<T, Ts...>
’s scope.
With surrogate call functions, the “pulling in” happens implicitly:
Similarly, surrogate call functions are added to the set of candidate functions for each non-explicit conversion function declared in a base class of
T
provided the function is not hidden withinT
by another intervening declaration.
As such, the “name” is essentially the type of the pointer to function.
Now, we can implement FUN
like this:
template <typename Head, typename... Tail>
struct FUN;
// Base case.
template <typename Head>
struct FUN<Head> {
using F = Head (*)(Head);
operator F() const;
};
// Recursive case.
template <typename Head, typename... Tail>
struct FUN : FUN<Tail...> {
using F = Head (*)(Head);
operator F() const;
};
With this implementation,
at N = 350
we get ~0.6s! MUCH better than ~20s.
Final Touch with Multiple Inheritance
Since the surrogate call functions are implicitly pulled in from the base classes, we don’t even need to keep the recursive inheritance anymore.
We can flatten, simplify, and improve the implementation with multiple inheritance:
template <typename T>
struct FUN_leaf {
using F = T (*)(T);
operator F() const;
};
template <typename... Ts>
struct FUN : FUN_leaf<Ts>... {};
Now, with this implementation,
at N = 350
we get about ~0.2s! Not as drastic, but still a ~3x improvement.
Final Remarks
This issue was discovered while I was setting up MPark.Variant/Benchmark to
compare the many variant
implementations out there.
My hope is that this website will help to improve the community-wide
implementation quality of variant
by comparing existing implementations.
Take a look at the latest GCC 5 results to see the improvements that have been made to mpark/variant!
Thanks to Agustín “K-ballo” Bergé for introducing / explaining surrogate call functions to me.
Thanks to Louis Dionne and Bruno Dutra for developing the metabench framework and setting a precedent with metaben.ch!