How we used C++20 to eliminate an entire class of runtime bugs

Cameron DaCamara

January 13th, 20222 0

C++20 is here and has been supported in MSVC since 16.11, but today’s post is not about how you can use it, but rather how we used it to effectively eliminate an entire class of runtime bugs by hoisting a check into compile-time. Let’s get right into it!

Humble beginnings

In compiler design one of the very first things you need is a way to convey to the programmer that their source code has an error or warn them if their code might not behave as expected. In MSVC our error infrastructure looks something like this:

enum ErrorNumber {
    C2000,
    C2001,
    C2002,
    ...
};
void error(ErrorNumber, ...);

The way error works is that each ErrorNumber has a corresponding string entry which represents the text we want to display to the user. These text strings can be anything from: C2056 -> "illegal expression" to: C7627 -> "'%1$T': is not a valid template argument for '%2$S'", but what are these %1$T and %2$S things? These are some of the compiler’s format-specifiers to display certain types of structures in the compiler to the user in a readable way.

The double-edged sword of format-specifiers

Format-specifiers provide a lot of flexibility and power to us as compiler developers. Format-specifiers can more clearly illustrate why a diagnostic was issued and provide the user with more context into the problem. The problem with format-specifiers is that they are not type checked in the call to error, so if we happen to get an argument type wrong or did not pass an argument at all it will almost certainly end up in a runtime error later for the user. Other problems arise when you want to refactor a diagnostic message into something clearer, but to do that you need to query every caller of that diagnostic message and ensure that the refactor agrees with the arguments being passed to error.

We have three high-level goals when designing a system that can check our format-specifiers:

Validate that argument types passed into our diagnostic APIs at compile-time so authoring a mistake is caught as early as possible.
Minimize changes made to callers of diagnostic APIs. This is to ensure well-formed calls retain their original structure (no disruption to future calls as well).
Minimize changes made to implementation details of the callee. We should not change the behavior of the diagnostic routines at runtime.

There are, of course, some solutions introduced with later C++ standards which could aid in trying to remedy this problem. For one, once variadic templates were introduced into the language we could have tried some template metaprogramming to try and type check the calls to error, but that would require a separate lookup table since constexpr and templates were limited in what they could do. C++14/17 both introduced a lot of improvements to constexpr and non-type template arguments. Something like this would work great:

constexpr ErrorToMessage error_to_message[] = {
    { C2000, fetch_message(C2000) },
    { C2001, fetch_message(C2001) },
    ...
};

template <typename... Ts>
constexpr bool are_arguments_valid(ErrorNumber n) {
    /* 1. fetch message
       2. parse specifiers
       3. check each specifier against the parameter pack Ts... */
    return result;
}

So we finally had the tools to try and check the format-specifiers at compile-time. But there was still a problem: we still did not have a way to silently check all the existing calls to error meaning that we would have to add an extra layer of indirection between the call sites of error to ensure that the ErrorNumber could fetch the string at compile-time and check the argument types against it. In C++17 this will not work:

template <typename... Ts>
void error(ErrorNumber n, Ts&&... ts) {
    assert(are_arguments_valid<Ts...>(n));
    /* do error stuff */
}

And we cannot make error itself constexpr because it does a lot of constexpr-unfriendly things. Additionally, adjusting all the call sites to something like: error<C2000>(a, b, c) so that we can check the error number as a compile-time expression is unsavory and would cause a lot of unnecessary churn in the compiler.

C++20 to the rescue!

C++20 introduced an important tool for us to enable compile-time checking, consteval. consteval is in the family of constexpr but the language guarantees that a function adorned with consteval will be evaluated at compile-time. A well-known library by the name of fmtlib introduced compile-time checking as part of the core API and it did so without changing any call sites, assuming the call site was well-formed according to the library. Imagine a simplified version of fmt:

template <typename T>
void fmt(const char* format, T);

int main() {
    fmt("valid", 10);    // compiles
    fmt("oops", 10);     // compiles?
    fmt("valid", "foo"); // compiles?
}

Where the intent is that format should always be equal to "valid" and T should always be an int. The code in main is ill-formed according to the library in this case, but nothing validates that at compile-time. fmtlib accomplished compile-time checking using a little trick with user-defined types:

#include <string_view>
#include <type_traits>

// Exposition only
#define FAIL_CONSTEVAL throw

template <typename T>
struct Checker {
    consteval Checker(const char* fmt) {
        if (fmt != std::string_view{ "valid" }) // #1
            FAIL_CONSTEVAL;
        // T must be an int
        if (!std::is_same_v<T, int>)            // #2
            FAIL_CONSTEVAL;
    }
};

template <typename T>
void fmt(std::type_identity_t<Checker<T>> checked, T);

int main() {
    fmt("valid", 10);    // compiles
    fmt("oops", 10);     // fails at #1
    fmt("valid", "foo"); // fails at #2
}

Note: you need to use the std::type_identity_t trick to keep checked from participating in type deduction. We only want it to deduce the rest of the arguments and use their deduced types as template arguments to Checker.

You can fiddle with the example for yourself using Compiler Explorer.

Tying it all together

The code above is powerful in that it gives us a tool which can perform additional safety checking without changing any caller which is well-formed. Using the technique above we applied compile-time checking to all our error, warning, and note message routines. The code used in the compiler is nearly identical to the fmt above except that the argument to Checker is an ErrorNumber.

In total we identified ~120 instances where we were either passing the incorrect number of arguments to a diagnostic API or where we passed the wrong type for a particular format-specifier. Over the years we have received bugs regarding strange compiler behavior when emitting a diagnostic or a straight-up ICE (Internal Compiler Error) because the format-specifiers were looking for arguments which were incorrect or did not exist. Using C++20 we have largely eliminated possibility of such bugs happening in the future and while offering the ability for us to safely refactor diagnostic messages, made possible by one little keyword: consteval.

Closing

As always, we welcome your feedback. Feel free to send any comments through e-mail at visualcpp@microsoft.com or through Twitter @visualc. Also, feel free to follow me on Twitter @starfreakclone.

If you encounter other problems with MSVC in VS 2019/2022 please let us know via the Report a Problem option, either from the installer or the Visual Studio IDE itself. For suggestions or bug reports, let us know through DevComm.

Cameron DaCamara Senior Software Engineer, Visual C++

2 comments

Roman Dremov January 14, 2022 2:50 pm 0

If you are set on C++20, I suggest using concepts to constrain template types. Much cleaner than other type tricks and traits.

Also, I am surprised you uncovered 120 format errors in your error reporting code. Did you not have a negative test for every error? This is a common practice, at least in my company.

Cameron DaCamara January 14, 2022 6:09 pm 0

Hi Roman,

Yes, we use concepts in the compiler religiously. Concepts help you better reason about the semantics of the types being passed into a function, but in the case of our diagnostic message APIs a concept doesn’t make a lot of sense because the semantics of how to interact with the type is dictated by the format string itself, so you can’t reasonably create a concept to constrain the parameter pack. The compile-time format-specifier checking constrains them for you.

> Did you not have a negative test for every error?

We do. We have around 250,000 tests which run on each commit and a significant portion of them are negative tests looking for a specific error. Given the age of the compiler though, it is hard to say that we test every possible code path to an individual call to `error`–because there may be multiple places where the same error, warning, or note are emitted. The solution presented in the article helps us reason about every code path no matter how the compiler gets there.

Discussion is closed. Login to edit/delete existing comments.

Roman Dremov January 14, 2022 2:50 pm 0

If you are set on C++20, I suggest using concepts to constrain template types. Much cleaner than other type tricks and traits.

Also, I am surprised you uncovered 120 format errors in your error reporting code. Did you not have a negative test for every error? This is a common practice, at least in my company.
- Cameron DaCamara January 14, 2022 6:09 pm 0
  
  Hi Roman,
  
  Yes, we use concepts in the compiler religiously. Concepts help you better reason about the semantics of the types being passed into a function, but in the case of our diagnostic message APIs a concept doesn’t make a lot of sense because the semantics of how to interact with the type is dictated by the format string itself, so you can’t reasonably create a concept to constrain the parameter pack. The compile-time format-specifier checking constrains them for you.
  
  > Did you not have a negative test for every error?
  
  We do. We have around 250,000 tests which run on each commit and a significant portion of them are negative tests looking for a specific error. Given the age of the compiler though, it is hard to say that we test every possible code path to an individual call to `error`–because there may be multiple places where the same error, warning, or note are emitted. The solution presented in the article helps us reason about every code path no matter how the compiler gets there.