Chapter 4. The Standard Template Library Part I: Containers and Iterators
The Standard Template Library, also commonly referred to as the STL, is a subset of C++’s Standard Library that houses a set of container classes, including std::vector
, that will be discussed in this chapter. The STL also provides a group of algorithms applicable to those containers and other containers—including your own—that follow the same coding conventions.
The STL is a revolutionary design brought to the world by Alexander Stepanov, David Musser, and Meng Lee. It was officially integrated into the C++ Standard library in the 1990s, as described in “C++ in 2005” by Bjarne Stroustrup. This “library within a library” combines algorithms and containers to form a whole that is significantly larger than the sum of its parts. Technically speaking, as Scott Meyers points out in the introduction to Effective STL, there is “no official definition of the STL, and different people mean different things when they use the term.”1 However, it is used throughout the C++ vernacular to represent the container classes, iterators, and algorithms.
You have already seen that a vector
on its own is quite useful and versatile. It is the workhorse of the STL and the container of choice for most financial modeling applications (as well as for many other application domains). For this reason alone, it is worth getting familiar with the vector
class in more detail, but in doing so, you will also find it easier to then understand how other STL containers work.
What makes STL containers so powerful is their relationship with STL algorithms. Algorithms work on an abstraction called iterators, rather than on containers themselves:
-
Containers define how data is organized.
-
Iterators describe how organized data can be traversed.
-
Algorithms describe what you can do on the data.
At a high level, algorithms let you traverse an STL container, and they can apply a function to each member, or a subset thereof, efficiently replacing iterative for
and while
statements in a single statement. Since algorithms are expressed in terms of iterators, not in terms of containers, you can write a single algorithm and leverage its impact on a whole set of containers at once.
STL containers such as a vector
are generic in the sense that they can hold a homogeneous set of (almost) any arbitrary type—indicated as a template parameter—be it a plain numerical type such as double
or a class type such as BlackScholes
(from Chapter 2), as long as these types conform to the requirements of the functions we seek to apply to them. Such containers can also handle polymorphic objects as pointers to a common interface base class, in which case three programming paradigms converge: functional, with STL algorithms; generic, using STL container class templates; and object-oriented, at least under some definitions of that term, whereby the elements are pointers to polymorphic objects. In modern C++, as you saw in Chapter 3, unique pointers provide a safe yet efficient alternative to the raw pointers of yore, and these can also be held as elements in an STL container.
In this chapter, we will first work through an overview/review of templates, as these are the T in STL. STL containers and iterators will follow. Chapter 5 demonstrates how iterative tasks over container elements can be expressed more efficiently and safely with STL algorithms, compared to loops. Most of the material presented in these two chapters will focus primarily on C++ itself as opposed to applications in finance, but it will set us up for more applied examples in the remaining chapters.
Templates
Templates in C++ are like blueprints for functions and classes. They facilitate generic programming in C++, enabling a function or class to be designed for arbitrary types, as opposed to individual versions written for each type that can end up duplicating code to a large degree. The following examples of a generic function and of a generic class will give you insight on how templates work in C++.
To start, let’s go back to the case of a vector
, which is a common example of a class template. Like other containers, it can hold a homogeneous collection of an arbitrary type by specifying the type as a template parameter inside the angle brackets:
// A vector of real numbers: vector<double> v{1.0, 2.0, 3.0}; // A vector of BlackScholes objects: vector<BlackScholes> opts { {100.0, 106.0, 0.04, 0.5, PayoffType::Call}, {110.0, 106.0, 0.04, 0.5, PayoffType::Put}, {95.0, 106.0, 0.032, 0.25, PayoffType::Call}, {105.0, 106.0, 0.032, 0.02, PayoffType::Put}, {115.0, 106.0, 0.045, 1.5, PayoffType::Put} };
The vector
class is generic in the sense that it doesn’t care what type of object it is holding.
Going a step further, to better understand how templates work, we will next look at how to implement user-defined function and class templates.
Using Function Templates
Recalling the Fraction
class from Chapter 2, we can add a definition of the *
operator, and accessors for the numerator and denominator, in order to help demonstrate function templates. The previous inclusions of the <
and spaceship operators will also be used shortly in an example involving class templates. The updated header file could be written as follows:
class Fraction { public: Fraction(unsigned n, unsigned d); bool operator === (const Fraction& rhs) const = default; std::strong_ordering operator <=> (const Fraction& rhs) const; // operator * has been added for current chapter: Fraction operator *(const Fraction& rhs) const; unsigned n() const; // return n_ (numerator) unsigned d() const; // return d_ (denominator) private: unsigned n_, d_; };
The two accessors returning the numerator and denominator are trivial, and for the *
operator, we can return the fraction product:
Fraction Fraction::operator *(const Fraction& rhs) const { return {n_ * rhs.n_, d_ * rhs.d_}; }
Suppose now you need to square integer, real, and Fraction
types. Proceeding naively, as C++ is a strongly typed language, you could write three separate functions:
// integers: int int_square(int k) { return k * k; } // real numbers (double): double dbl_square(double x) { return x * x; } // Fraction: Fraction frac_square(const Fraction& f) { return f * f; // operator* on Fraction is defined }
However, all three can be replaced by a single function template:
template <typename T> T tmpl_square(const T& t) { return t * t; }
Squaring an integer or double-precision value is trivial, as the numerical arguments imply their types, int
and double
, respectively:
int sq_int = tmpl_square(4); // tmpl_square(const int&) double sq_real = tmpl_square(4.2); // tmpl_square(const double&)
Applying the function to a Fraction
is more interesting, because we no longer have the built-in arithmetic operators, as we do with numerical types. The object type needs to be specified somehow, and this can be done in a couple of ways. One is to construct a Fraction
object and pass it into the function:
Fraction frac{2, 3}; Fraction sq_frac = tmpl_square(frac); // tmpl_square(const Fraction&)
The second option is to use uniform initialization, where the constructor arguments alone suffice. But here, we need to specify the template parameter; otherwise, the tmpl_square(.)
function will have no way of knowing the type being passed:
auto sq_frac_unif = tmpl_square<Fraction>({2, 3});
In each case, will be returned:
// Use the accessors to get the numerator and denominator in each case: cout << format("Square of fraction {}/{} is {}/{}\n", frac.numer(), frac.denom(), sq_frac.numer(), sq_frac.denom()); cout << format("Square of fractions using unif initialization = {}/{}\n", sq_frac_unif.numer(), sq_frac_unif.denom());
The output is:
Square of fraction 2/3 is 4/9 Square of fraction using unif initialization = 4/9
The function template does not care what the specific type T
is, as long as the multiplication operator is defined. If it is not, such as in attempting to square a Circle
object (Chapter 3), the code would not compile:
// operator* not defined for Circle. // Will not compile! Circle circ{1.0}; double area = tmpl_square(circ);
Note
Although it is generally good when errors are detected by the compiler, with template code, compiler error output can get long and somewhat cryptic, making it more painful to locate the source of the problem in the code, particularly in more realistic settings with expanded code complexity. Fortunately, we have new ways of alleviating this, particularly C++20 concepts, which are presented in Chapter 10.
Using Class Templates
A class template is syntactically similar to a function template except that a template parameter applies to the entire class. For example, here is a class template that will hold two private members of the same type, and provide a public member function that computes the minimum of the two:
// Class Declaration: template <typename T> export class MyPair { public: MyPair(const T &first, const T &second) :a_(first), b_(second) {} T get_min() const; private: T a_, b_; }; // Class implementation: template <typename T> MyPair<T>::MyPair(T first, T second) :a_(first), b_(second) {} template <typename T> T MyPair<T>::get_min() const { return a_ < b_ ? a_ : b_; }
As long as the less-than operator <
is defined for type T
, the code will compile. For two int
types, the inequality definition is provided by the language; no problem here:
MyPair<int> mp_int{10, 26}; int min_int = mp_int.get_min(); // OK, returns 10
Using the simple Fraction
class defined in Chapter 2, the inequality was defined within the class, so we are OK here as well:
MyPair<Fraction> mp_frac{{3, 2}, {5, 11}}; Fraction min_frac = mp_frac.get_min(); // Returns 5/11
Attempting to call the get_min()
member function when the template parameter type does not support the inequality, such as again with the Circle
class, the result will be a compile-time error:
// Will not compile! MyPair<Circle> fail{{1.5}, {3.0}}; fail.get_min(); // no operator< for Circle
Warning
We could have expressed the body of MyPair<T>::get_min()
as
T retval; // retval = a < b? a : b; // return retval;
However, this would have required that T
expressed a copy assignment operator as well as a default constructor , whereas our suggested implementation does not impose these requirements. When writing generic code, it is often useful to consider what we ask of the types for which our code will be instantiated and strive to ask only for what we really require, nothing more. This widens the set of types for which our code will be applicable.
Compiling Template Code
Templates are primarily written in header files, because templates guide the compiler as to how code should be generated. You can think about templates as being like a pattern for a suit. Nothing will happen until the tailor selects the material to be used (analogous to the type), and then cuts it and stitches it together. A tailor could even say, “In general, this is how I do things, but for you, I have something special in mind.” For templates, this is called a specialization.
For each type that a function or class template uses in its template parameter, a specialization is generated by the compiler, resulting in additional binary code for each. Suppose we write the same three lines of code as in the earlier example:
int sq_int = tmpl_square(4); double sq_real = tmpl_square(4.2); auto sq_frac_unif = tmpl_square<Fraction>({2, 3});
Putting these three lines in your code will cause the compiler to generate three versions (specializations) of tmpl_square(.)
, one for type int
, one for type double
, and one for Fraction
.
Similarly, creating instances of MyPair
objects as before will result in compiled code for each of the int
, double
, and Fraction
specializations:
MyPair<int> mp_int{10, 6}; MyPair<double> mp_dbl{4.22, 55.1115}; MyPair<Fraction> mp_frac{{3, 2}, {5, 10}};
Of course, had we written these respective functions and classes individually with parameter type overloading, we would have arrived at the same solution. Templates save us the trouble of manually writing that code.
Additional benefits of templates include facilitating generic programming, as well as potentially improving runtime performance. The downside is that templates can increase build times quite significantly with template-heavy codebases, although this situation has improved since C++11. The introduction of modules in C++20 in some respects has also helped reduce build times, discussed in Chapter 10.
We can also use default template parameters to make user code simpler to express. Returning to the MyPair
class that holds two elements of the same type T
, we could default this template parameter to a double
type:
template <typename T = double> class MyPair { public: MyPair(T first, T second); T get_min() const; private: T a_, b_; };
Then, if we want a MyPair
object to hold two double
types, we can omit the explicit type name from inside the angle brackets:
MyPair<> real_pair{19.73, 10.26}; double min_val = real_pair.get_min(); // 19.73
Class templates can take in more than one parameter type. For example, the MyPair
class template could have taken in two template parameters not necessarily of the same type:
template <typename T, typename U> class MyPair { public: MyPair(T first, U second); // . . . private: T a_; U b_; };
Another example is the std::map<K, V>
class template, an associative STL container covered in “Associative Containers”. Templates can also involve compile-time integral values, such as with the fixed-size sequential STL container std::array<T, N>
, where N
is a positive integer, which will also be discussed in this chapter (you might also recall that you got a sneak preview of this container type in Chapter 2). Templates can even accept templates as parameters—which eventually can become challenging—as you will see with expression templates introduced in Chapter 8. However, this book cannot cover all possible permutations, and thus we will mostly limit ourselves to providing a surface view of how templates can be written and how they can be used.
STL Containers
We have already used std::vector
in previous chapters, but other container classes exist, each of them allowing us to hold and organize data in different ways. Containers can be divided into at least two categories:
- Sequential containers
-
These emphasize sequential traversal of the elements. These containers include
std::vector
, along withstd::deque
andstd::list
. - Associative containers
-
These emphasize organizing the underlying storage in ways that make it easy to retrieve values after the data elements have been inserted. Examples include
std::set
andstd::map
.
We will start our overview with sequential containers.
Sequential Containers
Sequential containers hold elements that can be accessed sequentially. The following are sequential STL containers:
std::vector<T>
-
A dynamic array container guaranteed to be allocated in contiguous memory; this container is optimized for insertions and removals at its end. Insertions and removal in other locations in the container are possible but less efficient. It can be considered a no-cost and safe abstraction of a C-style dynamic array—safe in the sense that all heap memory allocation, management, and replacement is handled internally by the class. In practice, if used appropriately, a
vector
can be faster than a manually managed dynamic array since avector
does careful and efficient resource management. This is the best-known example of a sequential container in C++, but it is not the only container in this family. std::deque<T>
-
This container offers functionality similar to a
vector
, but it also provides efficient appending and removing of data elements to and from the front of the container. Unlike avector
, however, its storage is not guaranteed to be in contiguous memory, and as a result traversal of that container is generally less efficient as the same operation with avector
. std::list<T>
-
Data elements are organized as a doubly linked list of nodes containing the value as well as pointers to neighboring (both previous and next) nodes. This makes it more efficient for insertions and deletions at arbitrary locations within the container. Unlike a
vector
,deque
, or anarray
(introduced next), however, it does not provide random access via the[.]
operator or anat(.)
member function. The only way to reach elementi
in alist
is to start from the beginning and iterate through the firsti
elements. std::array<T, N>
-
A fixed-size array of
N
elements; the value ofN
must be explicitly known at compile time. Its contents are stored contiguously and are not dynamically allocated. As a result, anarray
can be the most efficient type of sequential container, but an explicit fixed-size declaration requirement at compile time makes its practical use in financial applications highly limited.
In addition to the member functions on the vector
class that you are now already familiar with, quite a few more can be important in practice. We will start our coverage of sequential STL containers with a vector
, but you will eventually see that much of what you learn here will be relevant for other containers.
The vector sequential container
A vector
will almost surely be the STL container you will use the most as a financial developer. However, many of the member functions described for this container will carry over to other standard containers, so by learning about the vector
container well, you should be able to easily pick up what you need when using other containers. For good reasons, a vector
is seen by most as the default container in C++.
In previous chapters, in examples using a vector
, the focus was primarily on storing plain numeric types as elements, but as a generic container, objects of any valid type (essentially, any type that is at least copyable or movable) can be used as its template parameter.
Storage of objects in a vector
A common device used in financial modeling and actuarial science is a mythical but theoretically useful continuously compounded bank account. Given an initial deposit and a fixed interest rate , after a period of time measured in units of years (or alternatively a year fraction), the account will grow to an amount
A simple class that represents this can be written as follows (the reason for the default constructor and in-class initialization will become apparent shortly):
#include <cmath> class BankAccount { public: BankAccount(double init_value, double continuous_rate) : init_value_{init_value}, continuous_rate_{continuous_rate} {} BankAccount() = default; double value(double time) const { return init_value_ * std::exp(continuous_rate_ * time); } private: double init_value_{1.0}, continuous_rate_{0.0}; };
Then, suppose we have three competing accounts with interest rates of 2.1%, 2.2%, and 2.3%, respectively, with initial deposits of $1,000, $2,000, and $3,000 (pretend you get a higher interest rate with more money deposited), and we need to place these objects in a vector
container. This may seem trivial, but some important consequences depend on how this placement is done, as we are now dealing with objects rather than simple numerical types.
A naive approach would be to create three BankAccount
instances and push each back onto the end of a vector
—say, ba_push
. When ba_push
is created, it will not hold any elements, which can be verified by calling its size()
method:
#include <iomanip> // For std::setprecision and std::fixed, // to fix account value output at two decimal places. // . . . vector<BankAccount> ba_push; cout << format("ba_push contains {} elements.\n", ba_push.size()) // size = 0
Now, create each bank account object and push it back on the vector:
BankAccount ba01{1000.00, 0.021}; BankAccount ba02{2000.00, 0.022}; BankAccount ba03{3000.00, 0.023}; ba_push.push_back(ba01); ba_push.push_back(ba02); ba_push.push_back(ba03);
Then, you can check the number of elements again with the size()
method, and you will find three elements. You can also loop through to get the account values after one year on each stored object:
for (const auto& ba : ba_push) { // Round to two decimal places: cout << "Accumulated amount after 1 year = " << std::setprecision(2) << std::fixed << ba.value(1.0) << "\n"; }
Rounded to two decimal places, the results are shown here:
Accumulated amount after 1 year = 1021.22 Accumulated amount after 1 year = 2044.49 Accumulated amount after 1 year = 3069.80
The point of this exercise is that creating the BankAccount
objects and using push_back(.)
requires inefficient object copy. With push_back
, an element in the vector will take in a BankAccount
object with its copy constructor. This can be verified by setting the BankAccount
copy constructor to delete
and noticing the code will not compile. In a more realistic situation with a vector
containing thousands of larger object elements, the performance hit could add up significantly.
C++11 resolved this problem with the introduction of the emplace_back(.)
member function. If the calling code knows which arguments to pass in order to construct an object but does not have the object on hand, calling emplace_back(.)
with the constructor arguments will let the container create the object for us, saving one construction (and one destruction) with every call. For example, no BankAccount
object copies are generated in the following code:
vector<BankAccount> ba_emplace; ba_emplace.emplace_back(1000.00, 0.021); ba_emplace.emplace_back(2000.00, 0.022); ba_emplace.emplace_back(3000.00, 0.023);
Any performance gains to be had in this toy example would be minimal, since BankAccount
is a tiny type. But when pushing “heavier” objects—and a lot more of them—onto the end of a vector
, the cost savings can become significant.
In this emplace_back(.)
example, you can also verify by using console output that the vector
size is three, and that the valuation results are the same as in the previous push_back(.)
example. Note that emplace_back(.)
is mostly useful if the object to add to the container has not been constructed yet. If you have a full object on hand, simply use push_back(obj)
, or alternatively you can use push_back(std::move(obj))
to avoid the costs of object copying, if obj
is no longer needed outside the container.
Polymorphic objects
In financial C++ programming, you might sometimes need to handle a collection of derived objects, determined dynamically, in a vector
. Using modern C++, and in particular the related discussion in Chapter 3, this can be accomplished by defining a vector
with a std::unique_ptr
template parameter pointing to objects of derived classes via the (often abstract) base class type.
Suppose we might want to value a book of options on the same underlying equity. Each call and put payoff could be managed by a vector
containing unique pointers to abstract Payoff
types discussed in Chapter 3:
vector<std::unique_ptr<Payoff>> payoffs;
Then, we can push back each pointer onto the vector
as exemplified here:
payoffs.push_back(std::make_unique<CallPayoff>(90.0)); payoffs.push_back(std::make_unique<CallPayoff>(95.0)); payoffs.push_back(std::make_unique<CallPayoff>(100.0)); payoffs.push_back(std::make_unique<PutPayoff>(100.0)); payoffs.push_back(std::make_unique<PutPayoff>(105.0)); payoffs.push_back(std::make_unique<PutPayoff>(110.0));
When the payoffs
container goes out of scope, each unique_ptr
cleans up after itself. In contrast, doing this with raw pointers, we would have this:
std::vector<Payoff*> raw_ptr_payoffs; raw_ptr_payoffs.push_back(new CallPayoff{90.0}); raw_ptr_payoffs.push_back(new PutPayoff{105.0}); // etc . . .
But now, once we are done with using raw_ptr_payoffs
, we would have to iterate through the vector
again to manually delete each element before exiting the active scope. This means the following:
-
Risk of forgetting this step, leading to memory leaks
-
More code to manage
-
More that can go wrong
Each unique_ptr
object automates the destruction of its pointed-to object, doing implicitly what we would otherwise have been forced to do manually. As such, using unique pointers should again be preferred over raw pointers when possible. Using this feature of modern C++ makes our lives so much easier, and at essentially no performance cost.
This example could also have used other sequential containers (to be discussed in due course), but it is shown here within the context of our discussion on vector
containers, as these are again the most common container types you are likely to find in practice.
Before proceeding to the other STL sequential containers, let’s cover various characteristics and useful member functions related to vector
containers.
Allocation and contiguous memory
A vector
models a dynamic array that can grow when it is full. When an insertion (e.g., a call to push_back()
) occurs, insertion into a vector can incur the costs of object copy, but that cost can be higher when the container’s capacity has been reached and the objects therein have to be copied or moved to a new and bigger storage location.
We need to distinguish the size of a vector
, exposed by its member function size()
—which represents the number of elements stored in the container—from its capacity, exposed by the member function capacity()
. The latter represents the number of objects that can be stored in the container, including those that are already present. At any time, for a vector
named v
, we can add up to v.capacity() – v.size()
objects before v
needs to increase its capacity.
Indeed, conceptually, appending an object to a vector
can be illustrated as follows:
// Note: this is an oversimplified illustration // . . . void push_back(const T &obj) { // if full() // full() means size() === capacity() // grow() // add obj at the end of the array // increment size } void grow() { // determine a new capacity, bigger than the current one // allocate a new array of that new capacity // copy or move the elements from the old array to the new one // dispose of the old array // update capacity to the new capacity } // . . .
This process is usually quite fast, as a vector
instance strives to keep its capacity bigger than its size in order to avoid reallocations. However, when a reallocation needs to be done, a number of copies (or a number of moves) will need to be performed.
We can reduce the costs of these potentially costly moments. The vector
class offers member functions that let the programmer’s code decide when to change the size or the capacity of the underlying storage. One is reserve(.)
, which changes its capacity but not its size (number of elements). Another is resize(.)
, which changes both the size (number of elements) and the capacity by conceptually adding default values at the end (e.g., 0 for fundamental types such as int
or double
, nullptr
for pointers, or default-constructed objects for classes with such a constructor).
Thus, if you know at least approximately how many objects will be stored in a container, you can call these functions at selected (noncritical) moments in a program to ensure that enough space will be available when the time comes to actually add the objects. Because of the possibility that default-constructed objects may be created with resize(.)
, reserve(.)
will usually be the more efficient choice between these two functions.
A vector
, when resizing or reallocating space, will need to copy or move the objects from the old storage to the new storage. To do so, it will call each object’s move constructor if these constructors have been marked as noexcept
; otherwise, it will call each object’s copy constructor. For a given object type, move operations are faster (often significantly faster) than copy operations.
Recall from Chapter 3, the noexcept
specification, added to the Standard in C++11, prevents a function from throwing an exception. Default move operations on an object are noexcept
if its data members’ move operations are noexcept
. In this book, as remarked in the previous chapter, we will be concerned with using only the default move operations going forward, and thus we will assume this condition is true.
When a vector
is created with its default constructor
std::vector<MyClass> v;
it contains no elements. This can be verified with the empty()
member function on the vector
cout << std::boolalpha << v.empty() << "\n"; // v.empty() === true
and/or with the size()
method introduced earlier:
auto s = v.size(); // s = 0
A vector
is guaranteed to store its data in contiguous memory, typically on the heap. With repeated emplace_back(.)
or push_back(.)
calls, the vector
will attempt to place the next object in the memory block adjacent to the previous element. However, if this block is already allocated to another resource, the vector
will need to reallocate its memory to a different location and either move or copy all the existing data into this location. This can become expensive, especially if the data is copied.
We can control these costs if we know the number of elements we will need (or a reasonable upper estimate). One way to achieve this is to proceed naively and place this value in the constructor of the vector:
vector<MyClass> w(100'000); // The apostrophe used as a proxy for a comma // was added to the Standard in C++14
This is not very appealing, however, because now the MyClass
default constructor has been called 100,000 times, creating a default object in each location, plus we might very well need to copy a new nondefault constructed object into each element. This option is reasonable if we do need the default MyClass
objects and are expecting to perhaps replace only a portion of them later.
Fortunately, a viable and efficient alternative exists—specifying the number of elements to reserve in heap memory, using reserve(.)
on the vector
container class:
std::vector <MyClass> u; u.reserve(100'000);
Now, u
has zero objects (u.size() == 0
) but has room to add 100,000 through push_back()
or emplace_back()
without u
ever having to reallocate memory.
The upshot here is if you have an estimate of how many objects will be stored in a vector
, you can call the reserve(.)
function after creating a vector
container object but before appending additional objects with emplace_back(.)
or push_back(.)
. This will avoid forcing a (potentially costly) reallocation of contiguous memory.
Let’s look at a simple demonstration by reprising the BankAccount
examples. The capacity is set to 5
, so this will accommodate three elements and leave room for a couple more just in case:
// Before applying push_back(.): vector<BankAccount> ba_push; ba_push.reserve(5); // Space is reserved for 5 elements in memory, // but the size at this point is still zero. // Same as before but included here for // easier reference: BankAccount ba_01{1000.00, 0.021}; BankAccount ba_02{2000.00, 0.022}; BankAccount ba_03{3000.00, 0.023}; ba_push.push_back(ba_01); ba_push.push_back(ba_02); ba_push.push_back(ba_03); // ba_push.size() now = 3, // but ba_push.capacity() still = 5. // Similarly for emplace_back(): vector<BankAccount> ba_emplace; ba_emplace.reserve(5); ba_emplace.emplace_back(1000.00, 0.021); ba_emplace.emplace_back(2000.00, 0.022); ba_emplace.emplace_back(3000.00, 0.023); // ba_emplace.size() = 3, // ba_emplace.capacity() = 5.
In real-world financial systems programming, portfolio valuation and risk calculations could apply to thousands of positions in multiple asset categories, and this could require loading each position object into a vector
. To prevent a series of incremental memory reallocations for large vector
containers, setting a sufficient capacity with the reserve(.)
member function before processing the position data can help ensure better runtime performance. Also, the actual integral argument for the reserve(.)
function will more likely be determined dynamically (based on external input) rather than by a hardcoded value (e.g., reserve(5)
or reserve(100'000)
) as in the introductory examples here.
The clear() method
The clear()
member function on the vector
container class will destroy each object stored in the container and reset its size to 0. Define a trivial class as follows with the destructor implemented:
class MakeItClear { public: ~MakeItClear() { std::cout << "MakeItClear destructor called" << "\n"; } };
Then, define a vector
with three MakeItClear
objects:
vector<MakeItClear> mic; mic.reserve(3); mic.emplace_back(); // Emplace MakeItClear object mic.emplace_back(); // using default constructor – no argument. mic.emplace_back(); cout << format("Size of vector<MakeItClear> = {}, capacity = {}\n", mic.size(), mic.capacity());
With the output to the screen, you can verify the three elements, and the capacity is also three, per the setting by the reserve(.)
member function:
Size of vector<MakeItClear> = 3, capacity = 3
Now, clear these elements, and output the updated size and capacity of the vector
object mic
:
mic.clear(); cout << format("Size of vector<MakeItClear> now = {}, capacity = {}\n", mic.size(), mic.capacity());
Then, the destructor is called three times, as indicated by the output stream in its body:
MakeItClear destructor called MakeItClear destructor called MakeItClear destructor called Size of vector<MakeItClear> now = 0, capacity = 3
Now, the vector
size is 0, as its contents have been cleared. The vector
capacity, however, has not changed. Since it can be costly to increase the capacity of a vector
, the memory already held by that container remains held until explicit actions to the contrary are taken (or until the container is destroyed).
The clear()
method is also defined for all the other STL containers covered in this chapter, except for std::array()
.
The front(), back(), and pop_back() methods
The first two of these member functions do what they say: return (a reference to) the first and the last element in a vector
, respectively. As an example, let’s go back to our MyPair
class template and create a vector
of MyPair<int>
elements. As an aside, note that a template argument can be a template itself:
vector<MyPair<int>> pairs; pairs.reserve(4); pairs.emplace_back(1, 2); pairs.emplace_back(1, -2); pairs.emplace_back(3, 4); pairs.emplace_back(3, -4);
We can then access the minimum element of the pair in the front (first) and back (last) elements of the vector
container:
int front_min = pairs.front().get_min(); // front_min = 1 int back_min = pairs.back().get_min(); // back_min = -4
This will result in 1
and -4
, respectively.
The pop_back()
function will then remove the last element:
pairs.pop_back(); back_min = pairs.back().get_min(); // back_min = 3
The value of back_min
is now 3
, the minimum value of the third (and now last) MyPair
element.
The front()
and back()
functions can also be used as mutators:
pairs.front() = {10, 6}; // pairs[0] now = (10, 6) front_min = pairs.front().get_min(); // front_min = 6
Utilizing back()
for this purpose would be similar.
The pop_back()
function is essentially the opposite of push_back()
. For a vector
, there is neither a push_front()
nor a pop_front()
, but these do exist for other STL container classes, because the Standard Library tends to offer only those member functions that can be implemented efficiently. A function like push_back()
takes constant time on a vector
(unless a call leads to a reallocation, which in practice should be infrequent by using its reserve(.)
function).
Having a push_front(.)
member function on the vector
container class would be a costly operation since a vector
models a dynamic array, and inserting at the beginning would mean copying or moving every existing element “one position to the right,” a significantly costly operation if the number of elements is large. Containers that have a push_front(.)
member function (e.g., std::deque
or std::list
) are those for which that function can be reasonably efficient.
Random access in a vector: at(.) versus [.]
A vector
of size n
allows random access of an element by its index 0, 1,…, n-1
, using either the [.]
operator or the at(.)
member function. The difference between these two is that at(.)
provides bounds checking by throwing exceptions when an attempt is made to access an invalid index. Should we seek to obtain an error message, it is made available by the overridden what()
function on this standard exception class, std::out_of_range
:
#include <stdexcept> // for exception class std::out_of_range // . . . vector<int> v {0, 1, 2, 3, 4, 5}; // v.size() === 5 try { int n = v.at(10); // at(.) throws out_of_range exception } catch (const std::out_of_range& e) { std::cout << e.what() << "\n\n"; }
Attempting to access an element at a negative index will also throw an out_of_range
exception.
The square bracket operator provides no such protection. Replacing at(.)
with [.]
would result in a runtime error, resulting possibly in a program crash or, even worse, the error being silently ignored and propagating itself where it could do even more harm. Technically, anything can happen if we access objects out of bounds since this leads to undefined behavior (recall from Chapter 1 this means that the code is not “playing by the rules” and that the program is broken). For this reason, if you believe an index could end up out of bounds, you can consider using at(.)
instead of [.]
to identify the bugs in your code, and fix them. Of course, checking the bounds on every access has a cost, so use at(.)
only when there is a doubt with respect to the validity of the indices.
A potential pitfall with the meaning of () and {} with a vector
As you now know, in modern C++, constructor arguments—including the default constructor, which has no arguments—are often preferred in modern codebases by using uniform initializations indicated by braces ({. . .}
), as opposed to the still-legal “traditional” definition using parentheses (round brackets). For example, reprising the Fraction
class that has two unsigned
integer parameters in its constructor, you could write the following two definitions to get the same results:
Fraction braces{1, 2}; Fraction round_brackets(1, 2);
In both cases, the numerator value will be 1
, and the denominator value will be 2
.
However, as first noted in Chapter 1, using a vector
(and other STL containers) can sometimes lead to different results. For example, say we attempt to use braces for the constructor argument for vector
length, specifically for a vector of int
values:
std::vector<int> v{10}; // size() === 1
This would initialize a vector
of size 1 with a single element equal to 10. To specify a size of 10, we would need to use the parentheses version:
vector<int> v_round_brackets(10); // size() === 10
This might seem like an oddity, although this situation occurs for a reason. However, the details of this more advanced topic are deferred to Appendix D.
To avoid this potential pitfall and related complications, coding style requirements are in wide use that explicitly require adhering to the C++03 method of parentheses for vector
constructor arguments. You can find a well-known example in the Google C++ Style Guide.
This book follows the same guidelines by using braces to indicate initialization of a vector
and using the parentheses version to dictate the number of elements. This way, we avoid any ambiguity. Similar behavior also applies to other STL containers, so we use the same style guideline in these cases as well.
C-arrays and the data() member function
A vector
is essentially a no-cost abstraction of an encapsulated dynamic C array; in fact, it can be considered a negative cost abstraction for this very idea as it performs extremely efficient memory management, probably much better than any casual programmer would be able to implement in a reasonable amount of time. Its data is stored in contiguous memory, usually on the heap just like its C counterpart, but like a unique pointer, it cleans up after itself, plus it carries a robust set of member functions, many of which have been discussed previously.
The data()
method on vector
, introduced in C++11, returns the memory address of its first element. Using data()
should be rarely, if ever, be necessary in a modern C++ context. However, legacy numerical libraries written in C (such as the GNU Scientific Library for C) and plenty of other legacy codebases require interfacing with a raw pointer, and the data()
member function makes matters more convenient. The data()
member function returns in Chapter 8 in practical linear algebra examples.
As mentioned initially, applications requiring a container typically use a vector
, but no container is good at everything, and knowing the strengths and weaknesses of each standard container will help you make informed choices. So now, let’s briefly look at the remaining sequential STL containers.
The std::deque sequential container
A std::deque
is functionally similar to a vector
, with the added feature of being able to push elements onto, and pop elements from, the beginning of the container as well as the end. These member functions are not surprisingly named as follows:
push_front()
-
Appends an element to the front of a
deque
pop_front()
-
Removes the first element from a
deque
As a first example, create a deque
containing integers, and apply the member functions at both the front and the back of the container:
std::deque<int> on_deque{0, 1, 2, 3}; // Push new elements onto the front: on_deque.push_front(-1); on_deque.push_front(-2); // Can also push onto the back like a vector; on_deque.push_back(4); for (int k : on_deque) { cout << k << " "; // on_deque now contains: -2 -1 0 1 2 3 4 } // Remove both first and last element: on_deque.pop_front(); on_deque.pop_back(); for (int k : on_deque) { cout << k << " "; // on_deque now contains: -1 0 1 2 3 }
In addition to emplace_back(.)
, a similar emplace_front(.)
function obviates copying when appending an object to the front of a deque
. Suppose we have a Rectangle
class with a constructor that accepts two double
arguments and exposes an area()
member function:
std::deque<Rectangle> recs; recs.emplace_front(3.0, 2.0); recs.emplace_front(4.0, 3.0); recs.emplace_front(5.0, 4.0);
Then, iterate through the container to get the area values:
for (const auto& elem : recs) { cout << elem.area()) << " "; }
Note that because we performed the insertions at the beginning, not at the end, the values displayed will be the corresponding area values from the last rectangle entered to the first:
20 12 6
Like a vector
, a deque
also supports random access using the at(.)
function and the [.]
operator, with at(.)
again checking the bounds and throwing exceptions in cases of invalid index values. In fact, with only a few differences, almost all the member functions defined on a vector will also work for a deque
. The reason a vector
is usually preferred over a deque
is that the common case of iterating through the elements in order is faster on a vector
, which models an array of contiguous values, than on a deque
, which models a sequence of contiguous chunks. Modern computers are equipped with different levels of cache memory and benefit quite a lot from linear traversals of objects arranged in contiguous memory.
In certain financial applications, however, such as storing values from oldest to newest in a time series, or computing returns from market prices, appending to the front of a container is convenient.
The fragmented memory storage also has one potential advantage over a vector
in that new elements can either be appended into a new heap memory block or they can be placed into a reallocated portion of the memory. This contrasts with reallocating an entire contiguous block of memory, as in the case of a vector
. However, as you saw with a vector
, the capacity can be set at the outset with the reserve(.)
member function to prevent reallocation altogether. This is not an option with a deque
.
The std::list sequential container
As mentioned previously, the vector
and deque
containers share much of the same functionality, and for appending new data at the end of either, or at the front of a deque
, these operations are efficient. They fall short, however, when data needs to be inserted somewhere in between.
The internal structure of a list
is very different from that of a vector
or deque
. Without going into a lot of detail, an element in a list
is stored in a node, which is an object that contains the element and knows the address of its predecessor and succeeding nodes. Thus, technically speaking, the container is a doubly linked list.
Note that this structure means that for containers holding n
elements, the overall memory consumption associated with a list
will be higher than with a vector
. For example, a vector<MyClass>
container object will contain some data to represent the size, capacity, and location of the underlying storage, but its data will be a contiguous sequence of MyClass
objects, nothing else. For that reason, the space consumed by the objects in a vector
is the number of elements multiplied by the size of an element. A list<MyClass>
container object, on the other hand, will contain a sequence of nodes, each containing a MyClass
object and two pointers—so each node occupies space that is greater than the space occupied by its element—and each node will be allocated separately, leading to slower traversals in practice.
When a new element needs to be inserted, rather than shifting elements in memory to provide space, a new node is created, the links between two existing elements are broken, and the new node is inserted in the list by establishing new links between the two that were previously connected. Therefore, the effect of an insertion (or of a removal) in a list is “local” in the sense that it impacts only the immediate neighboring nodes.
Inserting a new element in a sequential STL container requires a discussion of STL iterators, which will follow in “STL Iterators”. For now, know that insertion at an arbitrary location in a list
will be faster than in a vector
or a deque
, but accessing a particular element will be less efficient. Unlike a vector
or a deque
, a list
does not support random access by index using the at(.)
function or the [.]
operator, since there would be no way to implement these operations efficiently (see the Note that follows), but it does have many of the same member functions such as push_back(.)
, emplace_back(.)
, front()
, and back()
, plus it can be used in a range-based for
loop like other standard containers:
std::list<int> franz{0, 1, 2, 3}; for (int elem : franz) { cout << elem << "\n"; } franz.push_front(-1); franz.push_back(4); franz.push_back(5); franz.pop_back(); for (int elem : franz) { cout << elem << "\n"; } // Compiler error! // Neither at(.) nor [.] is defined for a list // int sum = franz.at(0) + franz[1];
The results are again not surprisingly as follows:
0 1 2 3 -1 0 1 2 3 4
Note
Since nodes in a list
know only their immediate neighbors, the only way to reach the nth element in a list
is to start from the beginning and iterate n
times. With a vector
, accessing the nth element is a simple matter of constant-time arithmetic, computing the address of the nth element past the beginning of the underlying storage.
Because most data in finance is temporal, you probably won’t need to insert data within the interior of a container very often—if at all—so the advantages of a list
will probably not apply frequently in quantitative financial applications. With contemporary hardware, a list
is mostly used when many node insertions and removals have to be performed at arbitrary locations in the container, or when the elements stored in the list
can be costly to copy or move (which can hurt in a vector
when either a reallocation or an insertion—or a removal—occurs somewhere else than at the end).
Fixed-length std::array
Fixed-length arrays were introduced in C++11 formally as an STL container class under the name of std::array
. Contrary to a vector
, which models a dynamic array and allocates on the heap, an array
does not allocate in and of itself, and it has a fixed size (and capacity, which amounts to the same thing with this container). Because its size must be provided at compile time, and because the amount of financial data needed for a typical task will not be known a priori, you will also probably find it of limited use—although it did come in handy, for example, in the BlackScholes
class in Chapter 2 as a temporary container for the and values. Furthermore, if your code uses raw arrays of fixed size on occasion, an array
will provide an exact equivalent (same memory consumption, same speed characteristics) but with additional services.
Indeed, an array
supports the member functions size()
, at(.)
, front()
, back()
, and empty()
, and the [.]
operator, but not push_back(.)
, push_front(.)
, pop_front(.)
, or pop_back(.)
, as the size of an array
cannot be modified after it is defined.
In the following code, the uses of arr_01
and arr_02
are strictly equivalent:
int arr_01[]{0, 1, 2, 3}; int sum_01 = 0; for (int x : arr_01) { sum_01 += x; } // sum_01 = 6 std::array<int, 4> arr_02{0, 1, 2, 3}; int sum_02 = 0; for (int elem : arr_02) { sum_02 += elem; } // sum_02 = 6
Note that we could also have defined arr_02
using CTAD, from C++17, introduced in Chapter 1:
std::array arr_02{0, 1, 2, 3};
When in doubt, use a vector
An obvious question at this point is, “Which sequential container class should I use?” As mentioned, the use of array
is quite limited, so that essentially leaves you with a vector
, deque
, or list
. The answer is that unless there is a compelling reason to use one of the latter two, you should choose a vector
by default.
This is mainly because a vector
is, as described in the book C++ Coding Standards, guaranteed to have the following:2
-
The lowest space overhead of any container
-
The fastest access speed to contained elements of any (dynamic) container
Furthermore, as also alluded to in C++ Coding Standards, the storage in a vector
is guaranteed to be in contiguous memory, providing it with the efficiency of a dynamic C-style array. It is also equipped with the same convenience of random access as in a C array. The storage for the other two dynamically sized sequential containers may be fragmented, and list
does not support random access.
The vector
container truly is the workhorse of the standard C++ library. Getting to know its capabilities and details will make you a more effective quantitative developer. Understanding its use within STL algorithms (to follow in Chapter 5) will also transfer to cases utilizing any of the other STL container classes.
Associative Containers
C++ has two primary associative containers that automatically keep their contents sorted, irrespective of the order the data is inserted. These containers also enforce the uniqueness of their elements. These two containers are as follows:
std::set<T>
-
Stores unique elements of type
T
, and reorders them whenever a new element is inserted. std::map<K, T>
-
Stores pairs of elements of types
K
andT
, whereK
is a key value used to sort the container and to obtain its associatedT
value. This container ensures that each key is unique but allows more than one key to have the same value.
For example:
#include <set> #include <map> // . . . std::set<int> some_set{5, 1, 2, 3, 4, 3}; for (int n : some_set) { cout << n << " "; } cout << "\n\n"; std::map<std::string, int> some_map { {"five", 5}, {"one", 1}, {"two", 2}, {"three", 3}, {"four", 4}, {"three", -3} }; for (auto [k, v] : some_map) { cout << k << ": " << v << "; "; }
If you run this code, you will see the output
1 2 3 4 5
displayed on the first line (note that the number 3 occurs only once). Then, recalling that std::string
has inequality operators defined based on lexicographic ordering such as
you will see this in the second line:
five:5; four:4; one:1; three:3; two:2;
Note that there are no duplicate keys, because for key three
only the first insertion has worked, and the keys are sorted in alphabetical order.
Individual values can be obtained by using the key value in the square bracket operator:
int four = some_map["four"]; // four = 4 (value)
Note, however, we will run into a problem with a set
of Circle
objects, for example (from Chapter 3). The following will not compile, because no ordering is defined on the class:
// Compiler error! std::set<Circle> circles{{5.5}, {3.2}, {8.4}};
The same issue would arise if attempting to use a type not supporting inequality operators as the key in a map
.
Two additional versions of these containers do allow duplicates:
std::multiset<T>
std::multimap<K, T>
For example:
// std::multiset and std::multimap are defined // in the Standard Library <set> and <map> headers // . . . std::multiset<int> some_multiset{5, 1, 2, 3, 4, 3}; for (int n : some_multiset) { cout << n << " "; } cout << "\n\n"; std::multimap<std::string, int> some_multimap { {"five", 5}, {"one", 1}, {"two", 2}, {"three", 3}, {"four", 4}, {"three", -3} }; for (auto [k, v] : some_multimap) { cout << k << ": " << v << "; "; } // ...
If you run this code, you will see the following output displayed on the first line (note the two occurrences of the number 3):
1 2 3 3 4 5
From the map
example, you will see this on the second line:
five:5; four:4; one:1; three:3; three:-3; two:2;
Note the two entries with the key three
, even though their corresponding values are different, and that the keys are sorted in alphabetical order.
Finally, starting in C++11, unordered (aka hashed) versions of each of these associative containers were added to the Standard Library. They have the capability of making searches faster than their ordered counterparts (but often consume more space in memory), although this is not guaranteed. These are as follows:
-
std::unordered_set<T>
-
std::unordered_multiset<T>
-
std::unordered_map<K, T>
-
std::unordered_multimap<K, T>
Technically speaking, the unordered versions are no longer associative containers, but they are variations on the themes of set
and map
. For this reason, as here in the present discussion, they are often combined into the same category,
although there is also an argument for a separate “unordered containers” classification. In any event, as their names show, these containers do not keep their data or keys sorted but can make search operations very fast.
Out of these eight associative containers, one that is particularly useful in financial modeling applications is a map
, for managing model input data and output results. Its unordered counterpart is seen more often in applications that operate under very low latency constraints, such as in high-frequency trading systems. In addition, by learning how to use the map
container well, using the others will become straightforward, as they work similarly.
std::map as a model data container
Enum classes can be useful as key values in a map
for inputs to and outputs from a financial model. To see an example, consider an option deal, where our model computes an option price and associated Greek (risk) values: delta (Δ), gamma (Γ), vega, rho (ρ), and theta (θ). Let’s define an enum class called RiskValues
:
enum class RiskValues { Delta, Gamma, Vega, Rho, Theta };
Returning to our BlackScholes
class in Chapter 2, we could include RiskValues
and add a private member function—say, risk_values_(.)
—to compute each Greek value. In addition, as we would need to compute the standard normal cumulative distribution function (CDF) for both the option price and risk values, it is also refactored into a private member function, norm_cdf_(.)
:
#include <map> // . . . class BlackScholes { public: BlackScholes(double strike, double spot, double time_to_exp, PayoffType payoff_type, double rate, double div = 0.0); double operator()(double vol) const; // Added to Ch 4 version: std::map<RiskValues, double> risk_values(double vol); private: std::array<double, 2> compute_norm_args_(double vol) const; // d1 and d2; double norm_cdf_(double x) const; // Added to Ch 4 version double strike_, spot_, time_to_exp_; PayoffType payoff_type_; double rate_, div_; };
We can compute the risk values by using the closed-form formulas derived from Black-Scholes theory, using the same notation adopted in Chapter 2:
Δ =
Γ =
ρ =
θ =
Here, is the standard normal probability density function (PDF):
The results are computed and then placed in the map
by using its insert(.)
member function. This is similar to push_back(.)
for a sequential STL container, except that each pair is ordered automatically by its key value. A lambda expression has been added for the standard normal PDF:
std::map<RiskValues, double> BlackScholes::risk_values(double vol) { using std::exp, std::sqrt; std::map<RiskValues, double> results; compute_norm_args_(vol); int phi = static_cast<int>(payoff_type_); auto norm_args = compute_norm_args_(vol); double d1 = norm_args[0]; double d2 = norm_args[1]; double nd_1 = norm_cdf_(phi * d1); // N(d1) double nd_2 = norm_cdf_(phi * d2); // N(d2) double disc_fctr = exp(-rate_ * time_to_exp_); // N'(x): Standard Normal pdf: auto norm_pdf = [](double x) { return (1.0 / std::numbers::sqrt2) * exp(-x); }; double delta = phi * exp(-div_ * time_to_exp_) * nd_1; double gamma = exp(-div_ * time_to_exp_) * norm_pdf(d1) / (spot_ * vol * sqrt(time_to_exp_)); double vega = spot_ * spot_ * gamma * vol * time_to_exp_; double rho = phi * time_to_exp_ * strike_ * disc_fctr * nd_2; double theta = phi * div_ * spot_ * exp(-div_ * time_to_exp_) * nd_1 - phi * rate_ * strike_ * exp(-rate_ * time_to_exp_) * nd_2 - spot_ * exp(-div_ * time_to_exp_) * norm_pdf(d1) * vol / (2.0 * sqrt(time_to_exp_)); // DELTA, GAMMA, VEGA, RHO, THETA results.insert({RiskValues::Delta, delta}); results.insert({RiskValues::Gamma, gamma}); results.insert({RiskValues::Vega, vega}); results.insert({RiskValues::Rho, rho}); results.insert({RiskValues::Theta, theta}); return results; }
This makes storing the option value and risk calculations to a database more foolproof. The following pseudocode shows an example. Suppose a database of positions is represented by an interface object database
, and that it contains a table called Options
, with column names PRICE
for the option valuation, and the respective Greek values in all caps. These align with the respective results from the C++ code:
double strike = 75.0; auto corp = corp = PayoffType::Put; double spot = 100.0; double rate = 0.05; double vol = 0.25; double time_to_exp = 0.3; BlackScholes bsp_otm_tv{strike, spot, rate, time_to_exp, corp}; // (This is pseudocode, NOT real {cpp} code!) DBRecord record = database.table("Options").next_record(); record.column("PRICE") = bsp_otm_tv(vol); // 0.0846 record.column("DELTA") = risk_values[RiskValues::Delta]; // -0.0164 record.column("GAMMA") = risk_values[RiskValues::Gamma]; // 0.0030 record.column("VEGA") = risk_values[RiskValues::Vega]; // 2.2349 record.column("RHO") = risk_values[RiskValues::Rho]; // -0.5180 record.column("THETA") = risk_values[RiskValues::Theta]; // -0.9598
An unordered_map
could be substituted for the map
in this example if desired, although for this small number of elements, any performance gain would probably be minimal.
STL Iterators
Iterators are objects that can iterate over elements of a sequence. Writing algorithms on iterators rather than writing them on containers makes algorithms more general and typically more useful.
Syntactically, iterators expose their services “via a common interface that is adapted from ordinary pointers.”3 This has the nice side effect of making ordinary pointers to sequences of objects work with Standard Library algorithms, as you will see in the next chapter. For now, given an iterator iter
, we move to the next element in the sequence by (preferably) applying the pre-increment operator: ++iter
(the post-increment operator may also be used if desired).
The most commonly used C++ textbooks describe six categories of iterator types: output, input, forward, bidirectional, random-access, and contiguous. These categories are not really classes; they are more like families conceptually grouped by the operations they allow:
- Output iterator
-
A single-pass tool that can be used for writing data to an output stream, such as with
cout
, or for inserting data into a new container, such as at the end of avector
. - Input iterator
-
A single-pass tool that can be used for such tasks as consuming data from a stream (including such fleeting things as keyboard input). An iterator can often both write and read data, which would make it both an output and input iterator. This will be the case for the examples in this book.
- Forward iterator
-
Can do anything an input iterator can do, but allows you to make more than one pass over the same sequence (if you don’t alter the elements along the way), but it lets you go forward only one element at a time.
- Bidirectional iterator
-
Can do anything a forward iterator can do, but also lets you go backward one element at a time.
- Random-access iterator
-
Can do anything a bidirectional iterator can do, but lets you go forward or backward n elements at a time efficiently.
- Contiguous iterator
-
A random-access iterator with the added requirement that the sequence is placed contiguously in memory (e.g., iterators on a
vector
or anarray
, but not adeque
).
This section primarily focuses on iterators for a vector
, which fall into the contiguous category. Obtaining an appropriate iterator for other STL containers is similar, but the category to which an iterator belongs depends on the actual container. For example, in a std::list
—which models a doubly linked list—we can efficiently move an iterator to only the next or previous element of the sequence, so an iterator for a list
is a bidirectional iterator.
We could define an iterator (contiguous) on a vector<int>
container as follows:
// Create the iterator: vector<int>::iterator pos; // pos = "position" // Also create a vector of integer values: vector<int> v = {17, 28, 12, 33, 13, 10};
Next, we set the iterator to the position of the first element, using the begin()
member function on a vector
. The word “position” here is not used in the way we would talk about an index in the same array. Because an iterator acts syntactically like a pointer, the actual element in the first position is accessed by dereferencing the iterator.
Suppose we have a print_this(.)
function template:
#include <iostream> template<typename T> void print_this(T t) { std::cout << t << " "; }
The begin()
member function returns an iterator to the first position of v
:
pos = v.begin(); // Sets iterator to point at 1st position // of the vector v int first_elem = *pos; // Dereferencing pos returns 17 print_this(first_elem);
Dereferencing the iterator pos
at this position, first_elem
would then hold a value of 17
:
To advance the iterator to the next position, just apply the increment operator:
++pos; print_this(*pos); // *pos = 28
We can also reassign the value to the element found at this position just as we would with a pointer:
*pos = 19; // Now, *pos = 19, but the iterator print_this(*pos); // still points to the 2nd position
In cases where the container supports random access, such as a vector
or a deque
, an iterator can also be moved ahead by adding to it the number of positions to advance:
pos += 3; print_this(*pos); // *pos = 13
Because in this example pos
is a random-access iterator associated with a vector
container, it is also a bidirectional iterator and is able to move backward. Again, as with pointers, this can be done with the decrement operator:
// Move back two positions: --pos; --pos; print_this(*pos); // *pos = 12
Subtraction assignment also works in the case of a random-access iterator (-=
):
pos -= 2; // *pos = 17 print_this(*pos); // Back to initial position
STL iterators are made to be fast and generally will not validate whether you go out of bounds, so you need to be careful. For example, the preceding code that moves backward two positions in a sequence will lead the iterator out of bounds if pos
initially points to the first element of the sequence. This behavior in turn leads the call to print(*pos)
to perform what the C++ Standard calls undefined behavior—recall again that this means behavior for which no guarantees are provided, and as a result means your code is broken (so do not do that).
The end()
member function of a container returns an iterator that conceptually points to “one past the last element.” The metaphor expressed by C++ Standard Library iterators is that the beginning of the sequence points to the first element, and the end of the sequence points to where an iterator will go if advanced past the last element. More succinctly, C++ iterators on a container define a half-open element range of the form [begin, end),
and understanding this is essential to using containers correctly (and coming soon, algorithms). Technically, a container v
is said to be empty if v.begin() == v.end()
. To access the last element of a non-empty vector, subtract 1 from the latter result and dereference the iterator again:
pos = v.end() - 1; print_this(*pos); // *pos = 10
Using auto to Reduce Verbosity
Both the begin()
and end()
functions on a Standard Library container will return an iterator for the sequence defined therein, so we can make our code less verbose while saving ourselves some typing by using the auto
keyword:
auto auto_pos = v.begin(); // vector v
Here, if v
is of type std::vector<int>
, then auto
replaces having to type
std::vector<int>::iterator
, which greatly reduces the noise in this declaration.
Using Constant Iterators
Much like preferring const
member functions on a class by default, it is often desirable to prevent elements of a container from being modified. We can do this by defining a const
iterator, using the cbegin()
member function rather than begin()
:
vector<int> w{17, 28, 12, 33, 13, 10}; auto cpos = w.cbegin(); // cpos = const iterator
A const
iterator provides only nonmutating operations on the elements. On a const
container, the begin()
and end()
member functions will yield const
iterators, whereas on a non-const
container, the begin()
and end()
member functions will yield non-const
iterators. The cbegin()
and cend()
member functions are useful if you need a const
iterator over a non-const
container.
Applying this to the previous example, everything will compile and run just fine except for the reassignment. With a const
iterator, this will result in a compile-time error:
++cpos; *cpos = 19; // Compiler error!
This is precisely the motivation for using a const
iterator and helps prevent undefined behavior.
Iterators or Indices?
You have probably noticed by now we used neither the operator [.]
nor member function at(.)
within this broader section on STL iterators, as both require indices as arguments, not iterators. In previous examples, traversing a vector
was performed by using iterators alone. This is convenient because iterators can be used on any STL container, but indices cannot, something you can verify yourself by trying to use indices on a container such as a list
. In fact, since we used iterators, we could replace the vector
with either a list
or a deque
in the previous examples, and they would compile and run successfully, yielding the same output.
Another useful application of iterators is the range-based for
loop, first introduced in Chapter 1, which will also work with any STL container, and even user-defined or third-party containers, as long as they offer the appropriate begin()
and end()
member functions. This is again because a range-based for
loop is built on iterators and not on indices. Indices make sense only when we can advance an iterator efficiently n
positions at once, or access the n
th element of a container just as efficiently. Through iterators, we have one consistent and reliable way to iterate over any STL container with the familiarity of a for
loop.
Iterators on Associative Containers
As alluded to earlier in this chapter, associative containers also expose iterators that let us traverse the elements of that container. In particular, in map
-type containers, the elements are key/value pairs. We will illustrate this with the following toy_map
container object. First, we create a simple int
/double
map object:
std::map<int, double> toy_map; toy_map = {{5, 92.6}, {2, 42.4}, {1, 10.6}, {4, 3.58}, {3, 33.3}};
Initialize a new iterator pos
by pointing it at the first element:
auto pos = toy_map.begin();
Display the first pair by dereferencing the iterator:
// Display 1st pair: print_this(pos->first); cout << ": "; print_this(pos->second);
The first pair is displayed by accessing its key (first
) and value (second
) by dereferencing the iterator:
1 : 10.6
Note that in a std::map
, keys are const
and cannot be modified, but values are not (unless the container itself is const
, of course). Suppose we advance one position and modify the value:
// Advance to next position and modify: ++pos; pos->second = 100.0; // In a map, keys are const but values // (pos->second) can be modified
A range-based for
loop will display each key/value pair pr
, including the updated value in the second position:
// Display modified map: for (const auto& pr : toy_map) { cout << format("{} : {}\n", pr.first, pr.second); }
However, as seen earlier, this can be written more succinctly in terms of the actual [k, v]
pairs:
for (const auto& [k, v] : toy_map) { cout << format("{} : {}\n", k, v); }
The range-based for
loop in each case displays a sequence of key/value pairs, ordered by the key value:
1 : 10.6 2 : 100 3 : 33.3 4 : 3.58 5 : 92.6
Summary
STL container classes are broadly classified as sequential or associative containers, although in reality containers such as unordered_map
are not actually associative and could arguably be categorized separately as “unordered containers.” Still, even among all the containers available, a set of common member functions allow for a degree of interchangeability, as well as when used as arguments in STL algorithms.
The member functions shown in Table 4-1 are common to all STL containers. This is not an exhaustive list, but it shows methods that you will likely encounter and find useful in practice.
Member function | Description |
---|---|
|
Returns an iterator that points to the position of the first element |
|
Returns an iterator that points to the first position after the last element |
|
Returns a |
|
Returns a |
|
Returns the number of elements in a container |
|
Returns |
|
Clears all elements from a container |
Note
The begin()
and end()
member functions will each return a non-const
iterator if the container is non-const
, and a const
iterator if the container itself is const
. The cbegin()
and cend()
functions will each return a const
iterator if the container is non-const
, and (trivially) as well for a const
container. It is not the case that cbegin()
and cend()
are required for a const
container.
Table 4-2 lists member functions that apply to all sequential containers (vector
, deque
, list
, array
).
Member function | Description |
---|---|
|
Accesses a reference to the first element of a non-empty container |
|
Accesses a reference to the last element of a non-empty container |
Table 4-3 lists member functions of the vector
, deque
, and array
containers only. These cannot be used with a list
, as it does not support random access.
Member function/operator | Description |
---|---|
|
Bounds-checked random accessor of the element at a specific position |
|
Same as |
Table 4-4 lists member functions that add elements to and remove elements from the end of a sequential container.
Member function | Description |
---|---|
|
Appends an element to the back of a container |
|
Removes the last element of a container |
Table 4-5 lists member functions that add elements to and remove elements from the front of a sequential container.
Member function | Description |
---|---|
|
Adds an element to the front of a container |
|
Removes the first element of a container |
Recalling that a vector
is guaranteed to hold its elements in contiguous memory, Table 4-6 shows relevant member functions that can help you avoid reallocation of memory.
Member function | Description |
---|---|
|
Reserves enough contiguous memory to hold |
|
Returns the number of elements that may be stored in contiguous memory |
STL iterators provide the means to traverse the elements of an STL container. The STL has a variety of iterator types, as described in “STL Iterators”, but usually in practice, the easiest way to access an iterator is to just call the generic begin()
or cbegin()
method on the container. Then, you can iterate across the container in either direction, and access the actual data in a particular position by using the dereferencing *
operator similarly to a pointer:
std::vector<int> v{1, . . ., 10}; std::map<unsigned, int> m{{1, 100}, {2, 200}, . . ., {10, 1000}}; auto v_iter = v.begin(); auto m_iter = m.cbegin(); int v_elem = *v_iter; // v_elem = 1 ++v_iter; ++v_iter; *v_iter = -3; // Third element is now -3 --v_iter; v_elem = *v_iter; // v_elem = 2 ++m_iter; ++m_iter; unsigned key = (*m_iter).first; // key = 3 int val = (*m_iter).second; // val = 300 // Alternatively can access as, say, [key_alt, value_alt] pair: ++m_iter; auto [key_alt, value_alt] = *m_iter; cout << format("key (alt) at 4th position = {}, ", key_alt); cout << format("val (alt) at 4th position = {}", value_alt);
STL containers and iterators, as you have seen, have useful properties, but their power is taken yet another level higher within the context of STL algorithms, coming up in the next chapter.
Further Resources
Recommended CppCon presentation on STL iterators: Nicolai Josuttis, “Back to Basics: Iterators in C++”, 2023.
1 Scott Meyers, Effective STL (Addison-Wesley, 2001).
2 Herb Sutter and Andrei Alexandrescu, C++ Coding Standards, (Addison-Wesley, 2004), Chapter 76.
3 Nicolai M. Josuttis, The C++ Standard Library: A Tutorial and Reference, 2nd edition (Addison-Wesley, 2012), section 9.2.
Get Learning Modern C++ for Finance now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.