Search

6.17 — Introduction to iterators

Iterating through an array (or other structure) of data is quite a common thing to do in programming. And so far, we’ve covered many different ways to do so: with loops and an index (for-loops and while loops), with pointers and pointer arithmetic, and with range-based for-loops:

Looping using indexes is more typing than needed if we only use the index to access elements. It also only works if the container (e.g. the array) provides direct access to elements (which arrays do, but some other types of containers, such as lists, do not).

Looping with pointers and pointer arithmetic is verbose, and can be confusing to readers who don’t know the rules of pointer arithmetic. Pointer arithmetic also only works if elements are consecutive in memory (which is true for arrays, but not true for other types of containers, such as lists, trees, and maps).

For advanced readers

Pointers (without pointer arithmetic) can also be used to iterate through some non-sequential structures. In a linked list, each element is connected to the prior element by a pointer. We can iterate through the list by following the chain of pointers.

Range-based for-loops are a little more interesting, as the mechanism for iterating through our container is hidden -- and yet, they still work for all kinds of different structures (arrays, lists, trees, maps, etc…). How do these work? They use iterators.

Iterators

An iterator is an object designed to traverse through a container (e.g. the values in an array, or the characters in a string), providing access to each element along the way.

A container may provide different kinds of iterators. For example, an array container might offer a forwards iterator that walks through the array in forward order, and a reverse iterator that walks through the array in reverse order.

Once the appropriate type of iterator is created, the programmer can then use the interface provided by the iterator to traverse and access elements without having to worry about what kind of traversal is being done or how the data is being stored in the container. And because C++ iterators typically use the same interface for traversal (operator++ to move to the next element) and access (operator* to access the current element), we can iterate through a wide variety of different container types using a consistent method.

Pointers as an iterator

The simplest kind of iterator is a pointer, which (using pointer arithmetic) works for data store sequentially in memory. Let’s revisit a simple array traversal using a pointer and pointer arithmetic:

Output:

0 1 2 3 4 5 6

In the above, we defined two variables: begin (which points to the beginning of our container), and end (which marks an end point). For arrays, the end marker is typically the place in memory where the last element would be if the container contained one more element.

The pointer then iterates between begin and end, and the current element can be accessed by dereferencing the pointer.

Warning

You might be tempted to calculate the end marker using the address-of operator and array syntax like so:

But this causes undefined behavior, because array[std::size(array)] accesses an element that is off the end of the array.

Instead, use:

Standard library iterators

Iterating is such a common operation that all standard library containers offer direct support for iteration. Instead of calculating our own begin and end points, we can simply ask the container for the begin and end points via functions conveniently named begin() and end():

This prints:

1 2 3

The iterator header also contains two generic functions (std::begin and std::end) that can be used:

This also prints:

1 2 3

Don’t worry about the types of the iterators for now, we’ll re-visit iterators in a later chapter. The important thing is that the iterator takes care of the details of iterating through the container. All we need are four things: the begin point, the end point, operator++ to move the iterator to the next element (or the end), and operator* to get the value of the current element.

Back to range-based for loops

All types that have begin and end member functions or can be used with std::begin and std::end are usable in range-based for-loops.

Behind the scenes, the range-based for-loop calls begin() and end() of the type to iterate over. std::array has begin and end member functions, so we can use it in a range-based loop. C-style fixed arrays can be used with std::begin and std::end functions, so we can loop through them with a range-based loop as well. Dynamic arrays don’t work though, because there is no std::end function for them (because the type information doesn’t contain the array’s length).

You’ll learn how to add functions to your types later, so that they can be used with range-based for-loops too.

Range-based for-loops aren’t the only thing that makes use of iterators. They’re also used in std::sort and other algorithms. Now that you know what they are, you’ll notice they’re used quite a bit in the standard library.

Iterator invalidation (dangling iterators)

Much like pointers and references, iterators can be left “dangling” if the elements being iterated over change address or are destroyed. When this happens, we say the iterator has been invalidated. Accessing an invalidated iterator produces undefined behavior.

Some operations that modify containers (such as adding an element to a std::vector) can have the side of effect of causing the elements in the container to change addresses. When this happens, existing iterators to those elements will be invalidated. Good C++ reference documentation should note which container operations may or will invalidate iterators. As an example, see the “Iterator invalidation” section of std::vector on cppreference.

Here’s an example of this:


6.18 -- Introduction to standard library algorithms
Index
6.16 -- An introduction to std::vector

(h/t to nascardriver for significant contributions to this lesson)

16 comments to 6.17 — Introduction to iterators

  • hellmet

    I think it's important to note that since we cannot use 'const auto&' in the range-based-for loop [not the 'for (int i : array)'; since we need to do ++iter], any modification to the underlying datastructure invalidates the iterator (I hope I got that right, from what I read online).

    • nascardriver

      I don't understand what you mean, can you provide an example of a range-based for-loop in which we can't use `const auto&`?

      Whether or not a modification to the data invalidates iterators depends on the container. In an `std::array`, you can modify all you want without invalidating iterators. `std::vector` on the other hand might perform reallocation, which invalidates iterators.

      • hellmet

        I meant to say, one can't do this.
        Also, mistake on my part! This is not a 'range-based for-loop'.

        Oh right, the invalidation depends on the container! I meant perhaps adding the point here is invaluable information!

        • nascardriver

          I added a note about the `const` iterator variable. Container invalidation doesn't fit this lesson, I'll try to remember it and add it to a lesson about algorithms or containers if it's not there already.

          • hellmet

            I'm glad I could be of help!
            Thanks!

          • Alex

            I do think iterator invalidation is a valid topic to be mentioned here. I added a section to the bottom of the lesson, including an example of invalidation that uses only topics that have been covered so far. Let me know if you think it doesn't fit in for some reason.

  • Eddie Ball

    Loving the new content! But I'm a little confused about what makes an iterator. In the section "Iterators" you say:

    "An iterator is an object designed to iterate through a container without having to care about the internal structure of the data."

    and then give an example of an iterator that uses pointers. How is is this iterator indifferent to the internal structure of the data? Is it because it uses pointers and the auto data type?

    Thanks!

    • Alex

      That sentence was poorly written. To clarify: iterators themselves are tightly coupled to the data they're iterating over (the iterator has to care about the internal data structure to know how to move through it). However, once an iterator is created, the user of the iterator only has to use the iterator's interface to navigate and access the elements, and is thus abstracted from how the data is actually stored.

      I've amended the lesson wording a bit to try and clarify.

  • siva

    Hi,

    Can you please explain the difference between std::size and array::size ?

    Regards,
    Siva.

  • Jim

    std::size(data) works but why not data.size()?

    • nascardriver

      `std::size` works for C-style arrays too. There are none in this lesson, I think this is a leftover from when the lesson used C-style arrays. `std::size` is fine too, so I'll leave it there.

  • Louis Cloete

    Great explanation! Just one suggestion: won't it make more sense / be clearer to initialize end like this:

    ?

  • swap145

    Thank you for this new section.

Leave a Comment

Put all code inside code tags: [code]your code here[/code]