Search

7.10 — std::vector capacity and stack behavior

In lesson 6.16 -- An introduction to std::vector, we introduced std::vector and talked about how std::vector can be used as a dynamic array that both remembers its length and can be dynamically resized as required.

Although this is the most useful and commonly used part of std::vector, std::vector has some additional attributes and capabilities that make it useful in some other capacities as well.

Length vs capacity

Consider the following example:

We would say that this array has a length of 10, even though we’re only using 5 of the elements that we allocated.

However, what if we only wanted to iterate over the elements we’ve initialized, reserving the unused ones for future expansion? In that case, we’d need to separately track how many elements were “used” from how many elements were allocated. Unlike a built-in array or a std::array, which only remembers its length, std::vector contains two separate attributes: length and capacity. In the context of a std::vector, length is how many elements are being used in the array, whereas capacity is how many elements were allocated in memory.

Taking a look at an example from the previous lesson on std::vector:

The length is: 5
0 1 2 0 0

In the above example, we’ve used the resize() function to set the vector’s length to 5. This tells variable array that we’re intending to use the first 5 elements of the array, so it should consider those in active use. However, that leaves an interesting question: what is the capacity of this array?

We can ask the std::vector what its capacity is via the capacity() function:

On the authors machine, this printed:

The length is: 5
The capacity is: 5

In this case, the resize() function caused the std::vector to change both its length and capacity. Note that the capacity is guaranteed to be at least as large as the array length (but could be larger), otherwise accessing the elements at the end of the array would be outside of the allocated memory!

More length vs. capacity

Why differentiate between length and capacity? std::vector will reallocate its memory if needed, but like Melville’s Bartleby, it would prefer not to, because resizing an array is computationally expensive. Consider the following:

This produces the following:

length: 5  capacity: 5
length: 3  capacity: 5

Note that although we assigned a smaller array to our vector, it did not reallocate its memory (the capacity is still 5). It simply changed its length, so it knows that only the first 3 elements are valid at this time.

Array subscripts and at() are based on length, not capacity

The range for the subscript operator ([]) and at() function is based on the the vector’s length, not the capacity. Consider the array in the previous example, which has length 3 and capacity 5. What happens if we try to access the array element with index 4? The answer is that it fails, since 4 is greater than the length of the array.

Note that a vector will not resize itself based on a call to the subscript operator or at() function!

Stack behavior with std::vector

If the subscript operator and at() function are based on the array length, and the capacity is always at least as large as the array length, why even worry about the capacity at all? Although std::vector can be used as a dynamic array, it can also be used as a stack. To do this, we can use 3 functions that match our key stack operations:

  • push_back() pushes an element on the stack.
  • back() returns the value of the top element on the stack.
  • pop_back() pops an element off the stack.

This prints:

(cap 0 length 0)
5 (cap 1 length 1)
5 3 (cap 2 length 2)
5 3 2 (cap 3 length 3)
top: 2
5 3 (cap 3 length 2)
5 (cap 3 length 1)
(cap 3 length 0)

Unlike array subscripts or at(), the stack-based functions will resize the std::vector if necessary. In the example above, the vector gets resized 3 times (from a capacity of 0 to 1, 1 to 2, and 2 to 3).

Because resizing the vector is expensive, we can tell the vector to allocate a certain amount of capacity up front using the reserve() function:

This program prints:

(cap 5 length 0)
5 (cap 5 length 1)
5 3 (cap 5 length 2)
5 3 2 (cap 5 length 3)
top: 2
5 3 (cap 5 length 2)
5 (cap 5 length 1)
(cap 5 length 0)

We can see that the capacity was preset to 5 and didn’t change over the lifetime of the program.

Vectors may allocate extra capacity

When a vector is resized, the vector may allocate more capacity than is needed. This is done to provide some “breathing room” for additional elements, to minimize the number of resize operations needed. Let’s take a look at this:

On the author’s machine, this prints:

size: 5  cap: 5
size: 6  cap: 7

When we used push_back() to add a new element, our vector only needed room for 6 elements, but allocated room for 7. This was done so that if we were to push_back() another element, it wouldn’t need to resize immediately.

If, when, and how much additional capacity is allocated is left up to the compiler implementer.

7.11 -- Recursion
Index
7.9 -- The stack and the heap

87 comments to 7.10 — std::vector capacity and stack behavior

  • Peter Baum

    It was only later in the lessons that I realized that there is something fundamental that I didn't understand at this point.  std::array was introduced and that used the stack and then "new" was used and that used the heap.  But suppose I want to use std::array and make sure it uses the heap.  How would I do that?  An example that does this and allows access from a function would be very helpful.

    Also I saw a comment at stackoverflow awhile back about the stack and heap in which someone said that from the point of view of CPP there is no stack or heap.  In trying to find that discussion (which I didn't) I ran across https://stackoverflow.com/questions/79923/what-and-where-are-the-stack-and-heap which is very good, and learned that individual threads get their own stack.  In that discussion (down a ways) I found "you are free to implement a compiler that doesn't even use a stack or a heap, but instead some other storage mechanisms." and that implies the disconnect between CPP and these storage areas/techniques.  In any case, as a practical matter, IMHO in these lessons, you really can't easily and should not in any case, separate CPP from important OS considerations like this.

    • nascardriver

      Hi Peter!

      > But suppose I want to use std::array and make sure it uses the heap.  How would I do that?
      You pretty much answered your own question in your quote "you are free to implement a compiler that doesn't even use a stack or a heap, but instead some other storage mechanisms.".
      The standard doesn't require compiler developers to use the stack for X and the heap for Y, they're free to do whatever they want.
      If you need control over heap and stack you'll have to use compiler/OS/CPU specific solutions.

    • Alex

      If you want std::array to use the heap, then you need to dynamically allocate it on the heap in the first place:

      However, I'd advise against this, as using std::vector is a better choice in such a case (as std::vector will clean itself up, whereas std::array on the heap relies on you to remember to do so).

      • Peter Baum

        Hi Alex,
        I tried something like that and ran into a problem with Visual Studio Community 2017.  Maybe I'm just doing something stupid, but this was the code:

        The at() function worked but not the usual array element access method with square brackets.

        Also, I was avoiding vector because of the behind the scenes garbage collection and I was, after all, trying to do timing.

        • nascardriver

          @array is a pointer, you can't use pointer operators like that.

          • Peter Baum

            Hi nascardriver,

            I certainly hadn't tried that!!! Where do you come up with this stuff!?!?  Did I miss a lesson along the way?

            I am still confused though, because we saw in a previous lesson that we could do

            and here "array" is also a pointer.  So what is the difference?

  • Mireska

    When I try to access a vector array with an index smaller than it's capacity but larger than its length, I get a runtime error. I know it's bad procedure, but I'd like to understand what's going on; what happens to those indices when the length is shortened? They're in its capacity, so I would assume the vector still has 'control' over them'; I expected undefined behaviour, or maybe it those indices returning the value they once had.

    So my question: Why exactly does trying that make it crash? Does the compiler 'limit/restrain' itself from accessing that, even though it technically could, because it knows it's unintended (that's my guess as to why it gives an 'assertion' error)

    Many thanks!

    • nascardriver

      Hi Mireska!

      > Why exactly does trying that make it crash?
      It doesn't crash, it's a controlled termination to let you know you're doing something wrong.

      > Does the compiler 'limit/restrain' itself from accessing that, even though it technically could, because it knows it's unintended
      Yes

      Assertions are covered in Lesson 7.12a

Leave a Comment

Put all code inside code tags: [code]your code here[/code]