2 major reasons why modern C++ is a performance beast

Use smart pointers and move semantics to supercharge your C++ code base.

By Leor Zolman
September 23, 2016
Arrows on road Arrows on road (source: marsblac via Pixabay)

Representing the first major update in the 13 years since 1998, the age of “modern” C++ was heralded with the ambitious C++11 standard. Three years later, C++14 emerged to represent the completion of the overall feature set the committee had been aiming for during that original 13-year gestation period.

One only needs to do a bit of Googling to see that there are a lot of new features in modern C++. In this article, I’ll focus on two key features that represent major milestones in C++’s performance evolution: smart pointers and move semantics.

Learn faster. Dig deeper. See farther.

Join the O'Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

Smart pointers

The Prime Directive in the C/C++ continuum has always been performance. As I often tell groups when teaching C++, when I ask a question beginning with “Why” concerning the rationale for a particular C++ language or library feature, they have a 90% chance of getting the answer right by replying with a single word: “Performance.”

Raw pointers may be fragile and prone to errors, but they’re so close to the machine that code using them runs like a bat out of hell. For decades, there was no better way to satisfy the need for speed demanded by a large class of applications. Memory leaks, segfaults, and torturous debugging sessions were simply the price of doing business if you needed the level of performance raw pointers uniquely provided.

The trouble with raw pointers is that there are too many ways to misuse them, including: forgetting to initialize them, forgetting to release dynamic memory, and releasing dynamic memory too many times. Many such problems can be mitigated or even completely eliminated through the use of smart pointers—class templates designed to encapsulate raw pointers and greatly improve their overall reliability. C++98 provided the auto_ptr template that did part of the job, but there just wasn’t enough language support to do that tricky job completely.

As of C++11, that language support is there, and as of C++14 not only is there no remaining need for the use of raw pointers in the language, but there’s rarely even any need for the use of the raw new and delete operators. The reliability (including exception safety) of code performing dynamic memory allocation goes way up in modern C++, without any corresponding cost in performance whatsoever. This represents perhaps the most visible paradigm shift in the transition to modern C++: the replacement of raw pointers with smart pointers just about everywhere imaginable. The below code illustrates the fundamental difference between using a raw pointer and a unique_ptr to manage dynamic memory:

// ----------------------------
// Using old-style raw pointers:
// ----------------------------

Widget *getWidget();

void work()
{
    Widget *wptr = getWidget();

    // Exception or return here: Widget never released!

    delete wptr; // manual release required
}

//-----------------------------
// Using Modern C++ unique_ptr:
//-----------------------------

unique_ptr<Widget> getWidget();

void work()
{
    unique_ptr<Widget> upw = getWidget();

    // Exception or return here: no problem!

}   // Widget released automatically

Move semantics

Pre C++11, there was still one fundamental area where performance was throttled: where C++’s value-based semantics incurred costs for the unnecessary copying of resource-intensive objects. In C++98, a function declared something like this:

vector<Widget> makeWidgetVec(creation-parameters);

struck fear into any cycle-counter’s heart, due to the potential expense of returning a vector of Widgets by value (let’s assume that Widget is some sort of resource-hungry type). The rules of the C++98 language require that a vector be constructed within the function and then copied upon return or, at the very least, that the program behave as if that were the case. When individual Widgets are expensive to copy, then the cost of copying an entire vector full of them becomes prohibitive. Compilers have historically applied what’s known as the return value optimization (RVO) to elide the cost of copying such a vector, but the RVO is only an optimization and circumstances do not always allow compilers to apply it. Hence, the cost of such code could not be reliably predicted.

The introduction of move semantics in modern C++ completely removes that uncertainty. Even if the Widgets are not “move-enabled,” returning a temporary container of Widgets from a function by value becomes a very efficient operation because the vector template itself is move-enabled.

Additionally, if the Widget class is move-enabled (the rules for doing so are straightforward and applied consistently across platforms—they’re not just a platform-dependent optimization), then the vector’s overhead is dramatically reduced as well. An example of this can be seen when managing memory reallocation and capacity is exceeded. The below code shows two versions of a class named Widget, a “conventional” version as displayed below:

#include <cstring>

class Widget
{
    public:
        const size_t TEST_SIZE = 10000;
        Widget() : ptr(new char[TEST_SIZE]), size(TEST_SIZE)
        {}
        ~Widget() { delete[] ptr; }
            // Copy constructor:
        Widget(const Widget &rhs) :
                ptr(new char[rhs.size]), size(rhs.size) {
            std::memcpy(ptr, rhs.ptr, size);
        }
            // Copy assignment operator:
        Widget& operator=(const Widget &rhs) {
            Widget tmp(rhs);
            swap(tmp);
            return *this;
        }
        void swap(Widget &rhs) {
            std::swap(size, rhs.size);
            std::swap(ptr, rhs.ptr);
        }

    private:
        char *ptr;
        size_t size;
};

// Output of test program:
//
// Size of vw: 500000
// Time for one push_back on full vector: 53.668

And one enhanced to support move semantics:

#include <cstring>
#include <utility>

class Widget
{
    public:
        const size_t TEST_SIZE = 10000;
        Widget() : ptr(new char[TEST_SIZE]), size(TEST_SIZE)
        {}
        ~Widget() { delete[] ptr; }
            // Copy constructor:
        Widget(const Widget &rhs) :
                ptr(new char[rhs.size]), size(rhs.size) {
            std::memcpy(ptr, rhs.ptr, size);
        }
            // Move constructor:
        Widget(Widget &&rhs) noexcept : ptr(rhs.ptr), size(rhs.size) {
            rhs.ptr = nullptr; rhs.size = 0;
        }
            // Copy assignment operator:
        Widget& operator=(const Widget &rhs) {
            Widget tmp(rhs);
            swap(tmp);
            return *this;
        }
        void swap(Widget &rhs) noexcept {
            std::swap(size, rhs.size);
            std::swap(ptr, rhs.ptr);
        }
            // Move assignment operator
        Widget &operator=(Widget &&rhs) noexcept {
            Widget tmp(std::move(rhs));
            swap(tmp);
            return *this;
        }

    private:
        char *ptr;
        size_t size;
};

// Output:
//
// Size of vw: 500000
// Time for one push_back on full vector: 0.032

Using a simple timer class, the test program populates a vector with half a million instances of a Widget and, making sure the vector is at its capacity, reports the time for a single additional push_back of a Widget onto the vector. In C++98, the push_back operation takes almost a minute (on my ancient Dell Latitude E6500). In modern C++ and using the move-enabled version of Widget, the same push_back operation takes .031 seconds. Here’s a simple timer class:

#include <ctime>

class Timer {
    public:
        Timer(): start(std::clock()) {}

        operator double() const 
                { return (std::clock() - start) / static_cast<double>(CLOCKS_PER_SEC); }

        void reset() { start = std::clock(); }

    private:
        std::clock_t start;
};

And here’s a test program to time one push_back call on a large, full vector of Widgets (a memory-hogging class):

#include <iostream>
#include <exception>
#include <vector>
#include "Widget2.h"
#include "Timer.h"   // for Timer class

int main()
{
    using namespace std;

    vector<Widget> vw(500000);
    while (vw.size() < vw.capacity())
        vw.push_back(Widget());

    cout << "Size of vw: " << vw.size() << endl;

    Timer t;    // initialize timer to current clock time
    vw.push_back(Widget());
    cout << "Time for one push_back on full vector: " << t << endl;
}

The effects of modern C++’s move semantics reveal themselves through overall improved performance, but they are not always very apparent at the user source code level. Hence, I think of move semantics as a “stealth” feature. In contrast, smart pointers make themselves very apparent at the source code level because developers must make the conscious decision to use them.

It’s difficult to say much more about these new features without drilling down into implementation techniques. If you’d like to learn more, register for my in-person training course on October 26-28, Transitioning to Modern C++, where I’ll teach you these features and much more will be explored in greater detail.

Post topics: Software Engineering
Share:

Get the O’Reilly Radar Trends to Watch newsletter