An anecdote from benchmarking the compile time of mpark/variant, an implementation of
std::variantfor C++11/14/17.
Introduction
std::variant has a converting constructor that looks like this:
template <typename... Ts>
class variant {
template <typename T>
constexpr variant(T &&);
};
For this constructor, we need to determine which of the Ts... will be
initialized with T. For std::variant, the behavior is defined as follows:
template <class T> constexpr variant(T&& t) noexcept(see below);
Let
Tjbe a type that is determined as follows: build an imaginary functionFUN(Ti)for each alternative typeTi. The overloadFUN(Tj)selected by overload resolution for the expressionFUN(std::forward<T>(t))defines the alternativeTjwhich is the type of the contained value after construction.
For example, given variant<int, std::string> v("hello");, we want to create
an overload set FUN with int and std::string then manually perform
overload resolution with an argument of type const char (&)[6].
The following is the initial approach I took for FUN:
template <typename Head, typename... Tail>
struct FUN;
// Base case.
template <typename Head>
struct FUN<Head> {
Head operator()(Head) const;
};
// Recursive case.
template <typename Head, typename... Tail>
struct FUN : FUN<Tail...> {
using FUN<Tail...>::operator();
Head operator()(Head) const;
};
Then the winner of overload resolution can be determined using FUN like this:
using best_overload = std::result_of_t<FUN<int, std::string>(const char (&)[6])>;
The compile-time benchmark of variant’s converting constructor using this
implementation looked like this on GCC 5:
A minimal example on Wandbox shows (via -ftime-report) that the cost of
using this particular implementation of FUN at N = 350 is ~20s!
What’s so expensive…?
Determining the Source
The first thing I found is that commenting out the using declaration from
the above example takes the compile time down to ~0.15s. So the recursive
inheritance seems fine, and the using declaration could be the issue,
but by not pulling in the parent operator() we’re also no longer performing
overload resolution. Could it be the cost of overload resolution?
Well, the non-recursive, hand-rolled version of FUN which does perform
overload resolution takes ~0.1s. Okay, it must be the using
declaration.
Now what…?
Surrogate Call Function
When considering the overload set for a call to object of a class type,
it’s not just the operator()s that get involved in the action.
For example, an implicit conversion operator to R (*)(P1, ..., PN) is also
considered. They participate in overload resolution as surrogate call function
with the unique name call-function and has the form:
using F = R (*)(P1, ..., PN);
R call-function(F f, P1 a1, ..., PN aN) { return f(a1, ..., aN); }
This is an example taken basically verbatim from [over.call.object]:
int f1(int);
int f2(float);
typedef int (*fp1)(int);
typedef int (*fp2)(float);
struct A {
operator fp1() { return f1; }
operator fp2() { return f2; }
} a;
int i = a(1); // calls f1 via pointer returned from conversion function
How does this help us?
What’s in a Name?
Why did we need the using declaration in the initial approach, anyway?
It’s due to the fact that the name of the overloaded function in
that case was operator(), and FUN<T, Ts...>::operator() shadowed
FUN<Ts...>::operator(). So we needed to pull the
FUN<Ts...>::operator() name explicitly into FUN<T, Ts...>’s scope.
With surrogate call functions, the “pulling in” happens implicitly:
Similarly, surrogate call functions are added to the set of candidate functions for each non-explicit conversion function declared in a base class of
Tprovided the function is not hidden withinTby another intervening declaration.
As such, the “name” is essentially the type of the pointer to function.
Now, we can implement FUN like this:
template <typename Head, typename... Tail>
struct FUN;
// Base case.
template <typename Head>
struct FUN<Head> {
using F = Head (*)(Head);
operator F() const;
};
// Recursive case.
template <typename Head, typename... Tail>
struct FUN : FUN<Tail...> {
using F = Head (*)(Head);
operator F() const;
};
With this implementation,
at N = 350 we get ~0.6s! MUCH better than ~20s.
Final Touch with Multiple Inheritance
Since the surrogate call functions are implicitly pulled in from the base classes, we don’t even need to keep the recursive inheritance anymore.
We can flatten, simplify, and improve the implementation with multiple inheritance:
template <typename T>
struct FUN_leaf {
using F = T (*)(T);
operator F() const;
};
template <typename... Ts>
struct FUN : FUN_leaf<Ts>... {};
Now, with this implementation,
at N = 350 we get about ~0.2s! Not as drastic, but still a ~3x improvement.
Final Remarks
This issue was discovered while I was setting up MPark.Variant/Benchmark to
compare the many variant implementations out there.
My hope is that this website will help to improve the community-wide
implementation quality of variant by comparing existing implementations.
Take a look at the latest GCC 5 results to see the improvements that have been made to mpark/variant!
Thanks to Agustín “K-ballo” Bergé for introducing / explaining surrogate call functions to me.
Thanks to Louis Dionne and Bruno Dutra for developing the metabench framework and setting a precedent with metaben.ch!
