Chapter 4. Index Out of Bounds

There are several ways in C++ to create an array of objects of some type T. Three common methods are:

#define N 10  // array size N is known at compile time
  T static_array[N];

  int n = 20; // array size n is calculated at runtime
  T* dynamic_array = new T[n];

  std::vector<T> vector_array; // array size can be changed at runtime

Of course, you can still use the calloc() and malloc() functions and your program will compile and run, but it’s not a good idea to mix C and C++ unless you have to because you’re relying on legacy C libraries. However you allocate the array, you can access an element in it using an unsigned integer index:

const T& element_of_static_array  = static_array[index];
const T& element_of_dynamic_array = dynamic_array[index];
const T& element_of_vector_array  = vector_array[index];

Let’s deal with dynamic arrays and vectors first, and return to the static array later in this chapter.

Dynamic Arrays

What would happen if we provide an index value that is larger than or equal to the array size? In all three of the preceding examples, the code will silently return garbage. (The exception to this rule for Microsoft Visual Studio 2010 is discussed later.) The situation is even worse if you decide to use the operator [] in the left-hand side of an assignment:

some_array[index] = x;

Depending on your luck (or lack of thereof) you might overwrite some other unrelated variable, an element of another array, or even a program instruction, and in the latter case your program will most likely crash. Each of these errors also provides opportunities for malicious intruders to take over your program and turn it to bad ends. However, the std::vector provides an at(index) function, which does bounds checking by throwing an out_of_range exception. The problem with this is that if you want to do this sanity check, you have to rigorously use the at() function everywhere for accessing an array element. And naturally, this slows your code down, so once you are done testing, you’ll want to replace it everywhere with the [] operator, which is faster. But doing that replacement requires massive editing of your code, which is a lot of work, followed by a need to retest it, because during that tedious process you could accidentally mistype something.

So instead of the at() function, I suggest the following. Although a dynamic array leaves the [] operator totally out of your control, the STL vector implements it as a C++ function that we can rewrite according to our bug-hunting goals. And that’s what we’ll do here. In the file scpp_vector.hpp we redefine the [] operators as follows:

T& operator [] (size_type index) {
  SCPP_TEST_ASSERT(index < std::vector<T>::size(),
    "Index " << index << " must be less than "
             << std::vector<T>::size());
  return std::vector<T>:: operator[](index);
}

const T& operator [] (size_type index) const {
  SCPP_TEST_ASSERT(index < std::vector<T>::size(),
    "Index " << index << " must be less than "
             << std::vector<T>::size());
  return std::vector<T>::operator[](index);
}

Let’s see how this works. Here is an example of how to use it (including—intentionally—how not to use it):

#include <iostream>
#include "scpp_vector.hpp"

using namespace std;

int main() {
  scpp::vector<int> vect;
  for(int i=0; i<3; ++i)
    vect.push_back(i);

  cout << "My vector = " << vect << endl;

  for(int i=0; i<=vect.size(); ++i)
    cout << "Value of vector at " << i << " is " << vect[i] << endl;

  return 0;
}

First, note that instead of writing std::vector<int> or just vector<int> we wrote scpp::vector<int>. This is to distinguish our vector from the STL’s vector. By using our scpp::vector we replace the standard implementation—in this case, the implementation of operator []—by our own safe implementation, and you will see the same approach to preventing other bugs later in this book. scpp::vector also gives you a << operator for free, so you can print your vector as long as it is not too big, and as long as the type T defines the << operator.

The next thing to notice is that in the second loop, instead of writing i<vect.size() we wrote i<=vect.size(). This is a very common programming error, and we did it just to see what happens when the index is out of bounds. Indeed, the program produces the following output:

My vector = 0 1 2
  
  Value of vector at 0 is 0
  
  Value of vector at 1 is 1 Value of vector at 2 is 2
           
  Index 3 must be less than 3 in file scpp_vector.hpp
        #17

This sanity check works as long as the symbol SCPP_TEST_ASSERT_ON is defined, and is easy to switch on and off at compile time when necessary. The problem with this approach is that the vector’s [] operator is very often used inside loops, so this sanity check is used a lot and therefore slows the program down significantly just as using at() would. If you feel that this is becoming a problem in your program, feel free to define a new macro, such as SCPP_TEST_ASSERT_INDEX_OUT_OF_BOUNDS, which would work exactly the same way as SCPP_TEST_ASSERT but would be used only inside scpp::vector::operator[]. SCPP_TEST_ASSERT_INDEX_OUT_OF_BOUNDS should differ from SCPP_TEST_ASSERT only by the fact that it can be switched on and off independently of the SCPP_TEST_ASSERT macro, so you can deactivate it after you are sure that your code does not have this bug while keeping the rest of your sanity checks active.

In addition to allowing you to catch this index-out-of-bounds error, the template vector has one advantage over statically and dynamically allocated arrays: its size grows as needed (as long as you don’t run out of memory). However, this advantage comes at a cost. The vector, if not told in advance how much memory will be needed, allocates some default amount (called its “capacity”). When the actual size reaches this capacity, the vector will allocate a bigger chunk of memory, copy old data into the new memory area, and release the old chunk of memory. So from time to time, adding a new element to a template vector could suddenly become slow. Therefore, if you know in advance what number of elements you will need, as with both static and dynamically allocated arrays, tell the vector up front, for instance, in the constructor:

scpp::vector<int> vect(n);

This creates a vector with a specified number of elements in it. You could also write:

scpp::vector<int> vect(n, 0);

which would also initialize all elements to a specified value (in this case zero, but any other value will work too).

An alternative is to create a vector with zero elements in it but to specify the desired capacity:

scpp::vector<int> vect;
vect.reserve(n);

The difference between this example and the previous one is that in this case the vector is empty (i.e., vect.size() returns 0), but when you start adding elements to it, you will not run into the incrementing capacity procedure with the corresponding slowdown until you reach the size of n.

Static Arrays

Now, as promised, let’s deal with the static array:

#define N 10  // array size N is known at compile time
  T static_array[N];

Here, the size is known at compile time and will not change. Of course, to use the safe array with its boundary check, you can use a template vector with the size specified in a constructor:

scpp::vector vect(N);

This will work exactly the same as the static array, but the problem here is efficiency. While the static array allocates its memory on stack, the template vector allocates memory inside the constructor using the new operator, and this is a relatively slow operation. If runtime efficiency is important in your case, it’s better to use a template array, defined as follows in the scpp_array.hpp file:

namespace scpp {

// Fixed-size array
template <typename T, unsigned N>
class array {
 public:
  typedef unsigned size_type;

  // Most commonly used constructors:
  array() {}

  explicit array(const T& initial_value) {
    for(size_type i=0; i<size(); ++i)
      data_[i] = initial_value;
  }

  size_type size() const { return N; }

  // Note: we do not provide a copy constructor and assignment operator.
  // We rely on the default versions of these methods generated by the compiler.

  T& operator[] (size_type index) {
    SCPP_TEST_ASSERT(index < N,
      "Index " << index << " must be less than " << N);
    return data_[index];
  }

  const T& operator [] (size_type index) const {
    SCPP_TEST_ASSERT(index < N,
      "Index " << index << " must be less than " << N);
    return data_[index];
  }

  // Accessors emulating iterators:
  T* begin() { return &data_[0]; }
  const T* begin()const { return &data_[0]; }

  // Returns an iterator PAST the end of the array.
  T* end() { return &data_[N]; }
  const T* end()const { return &data_[N]; }

 private:
  T data_[N];
};
} // namespace scpp

This array behaves exactly like a static C array. However, when compiled with the sanity check macro SCPP_TEST_ASSERT active, it provides an index-out-of-bounds check. The begin() and end() methods are provided to simulate iterators, so that you can use this array in some of the situations where you would have used the template vectors—for example, to sort numbers. The following code demonstrates how to sort this array using STL’s sort algorithm:

#include <algorithm>

  scpp::array<int, 5> a(0);
  a[0] = 7;
  a[1] = 2;
  a[2] = 3;
  a[3] = 9;
  a[4] = 0;

  cout << "Array before sort: " << a << endl;
  sort(a.begin(), a.end());
  cout << "Array after sort: " << a <<
  endl;

This produces the following output:

Array before sort: 7 2 3 9 0
      
Array after sort: 0 2 3 7 9

As a side benefit, you also get a << operator, which allows you to stream an array as shown in the previous example, assuming it is not too large and the template type T has a << operator. Of course, the use of this fixed-sized array must be limited to cases where the array size N is not too large. Otherwise, you’ll be spending your stack memory, a limited resource, on this array.

So the advice in this section is not to use static or dynamically allocated arrays, but a template vector or array instead. This solves another problem described in Chapter 1: when you use the new operator with brackets, you need to use the delete operator with brackets as well. If you cross-use these operators (new with brackets and delete without or vice versa) you will corrupt the memory heap, which generally leads to bad consequences. Once we decide not to use dynamically allocated arrays, which are created through the new operator with brackets, we’ve killed two birds with one stone: the problem of an index out of bounds, and the problem of mixing operators with and without brackets. In short, do not use the new operator (and the corresponding delete operator) with brackets. Your life will be easier.

Note

At the time of this writing, the newest version of Microsoft Visual Studio 2010 Ultimate diagnoses the index-out-of-bounds error in std::vector when compiled in a Debug mode, and pops up a dialog box (Figure 4-1).

This dialog offers you the choice to Ignore, Abort, or Retry (in which case you can debug the application). While “Ignore” seems appropriate only if you are extremely adventurous, one can hope that the rest of the compilers working under Unix, Linux, and Mac OS will catch up to the trend.

Microsoft Visual Studio “Index out of bounds” dialog box

Figure 4-1. Microsoft Visual Studio “Index out of bounds” dialog box

Multidimensional Arrays

Now that we’ve settled on the use of a template vector or array as an implementation of a linear array, let’s consider what to do if you need a two-dimensional matrix, a three-dimensional array, or generally speaking, an n-dimensional array. Because all the issues in the general case of n-dimensional arrays can be illustrated using a two-dimensional matrix, we will limit our discussion to this case and call it simply a matrix, with the understanding that the same principles apply to three or more dimensions.

If the size of the matrix is known at compile time, you can easily implement it as an array of arrays, and be done with it. Therefore, we’ll concentrate on the more complex case of a matrix whose size is calculated at run time. Such a matrix can easily be created as a vector of vectors, and in fact this approach is the only one possible if different rows must be of different lengths. However, most of the time all rows should be of the same length (e.g., the matrix is rectangular or even quadratic), and in this case the approach of using a vector of vectors is inefficient: it requires multiple memory allocations, which is a relatively slow operation (and the same can be said about deallocation). Because our goal in using C++ is efficiency, we’ll try a different approach and create a rectangular matrix using only one memory allocation, as shown in the class matrix in the scpp_matrix.hpp file:

// Two-dimensional rectangular matrix.
template <typename T>
class matrix {
 public:
  typedef unsigned size_type;

  matrix(size_type num_rows, size_type num_cols)
    : rows_(num_rows), cols_(num_cols), data_(num_rows * num_cols)
  {
    SCPP_TEST_ASSERT(num_rows > 0,
      "Number of rows in a matrix must be positive");
    SCPP_TEST_ASSERT(num_cols > 0,
      "Number of columns in a matrix must be positive");
  }

  matrix(size_type num_rows, size_type num_cols, const T& init_value)
    : rows_(num_rows), cols_(num_cols), data_(num_rows * num_cols, init_value)
  {
    SCPP_TEST_ASSERT(num_rows > 0,
      "Number of rows in a matrix must be positive");
    SCPP_TEST_ASSERT(num_cols > 0,
      "Number of columns in a matrix must be positive");
  }

  size_type num_rows() const { return rows_; }
  size_type num_cols() const { return cols_; }

  // Accessors: return element by row and column.
  T& operator() (size_type row, size_type col) {
    return data_[ index(row, col) ];
  }

  const T& operator() (size_type row, size_type col) const {
    return data_[ index(row, col) ];
  }

 private:
  size_type rows_, cols_;
  std::vector<T> data_;

  size_type index(size_type row, size_type col) const {
    SCPP_TEST_ASSERT(row < rows_, "Row " << row
      << " must be less than " << rows_);
    SCPP_TEST_ASSERT(col < cols_, "Column " << col
      << " must be less than " << cols_);
    return cols_ * row + col;
  }
};

First of all, there are two constructors. The first allows you to create a matrix with a specified number of rows and columns. The second, with the additional init_value argument, allows you also to initialize each element to a specified value (e.g., to set each element of a matrix<double> to 0.0). Note that access to elements is provided via the () operator, not []. This is because the [] operator in C++ takes only one argument, never two or more. So to access a multidimensional array, we either need to use multiple [] operators, such as my_matrix[i][j], or a single () operator, such as my_matrix(i,j).

The first approach could be achieved if we had the [] operator return a T* pointer to the zeroth element of the i-th row. However, this denies us the diagnosis of a column index out of bounds, which defeats the purpose of catching bugs at runtime. We could, of course, create some template class that would include a smart reference to a row, return an instance of it using the first operator ([i]), and then use the bounds check in the second operator ([j]). To some degree, it is a matter of taste. I did not see the value of resorting to this complex design just to preserve the my_matrix[i][j] syntax, and the () operator with multiple arguments seems just fine.

The checks for an index out of bounds are performed inside the index(row, col) function, separately for row and column numbers, and in the case of a runtime error lead to calls to an error handler that are familiar by now. Finally, at the end of the same file, you will discover a << operator for a template matrix<T>. They are provided so you can output your matrix like this:

cout << "my matrix =\n" << my_matrix << endl;

as long as the matrix is not too large and the type T defines the << operator.

Rules for this chapter to avoid “index out of bounds” errors:

  • Do not use static or dynamically allocated arrays; use a template array or vector instead.

  • Do not use new and delete operators with brackets—leave it up to the template vector to allocate multiple elements.

  • Use scpp:vector instead of std::vector and scpp::array consistently instead of a static array, and switch the sanity checks on.

  • For a multidimensional array, use scpp::matrix and access elements through the () operator to provide index-out-of-bounds checks.

Get Safe C++ now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.