Search

6.17 — Introduction to iterators

Iterating through an array (or other structure) of data is quite a common thing to do in programming. And so far, we’ve covered many different ways to do so: with loops and an index (for-loops and while loops), with pointers and pointer arithmetic, and with range-based for-loops:

Looping using indexes is more typing than needed if we only use the index to access elements. It also only works if the container (e.g. the array) provides direct access to elements (which arrays do, but some other types of containers, such as lists, do not).

Looping with pointers and pointer arithmetic is verbose, and can be confusing to readers who don’t know the rules of pointer arithmetic. Pointer arithmetic also only works if elements are consecutive in memory (which is true for arrays, but not true for other types of containers, such as lists, trees, and maps).

For advanced readers

Pointers (without pointer arithmetic) can also be used to iterate through some non-sequential structures. In a linked list, each element is connected to the prior element by a pointer. We can iterate through the list by following the chain of pointers.

Range-based for-loops are a little more interesting, as the mechanism for iterating through our container is hidden -- and yet, they still work for all kinds of different structures (arrays, lists, trees, maps, etc…). How do these work? They use iterators.

Iterators

An iterator is an object designed to traverse through a container (e.g. the values in an array, or the characters in a string), providing access to each element along the way.

A container may provide different kinds of iterators. For example, an array container might offer a forwards iterator that walks through the array in forward order, and a reverse iterator that walks through the array in reverse order.

Once the appropriate type of iterator is created, the programmer can then use the interface provided by the iterator to traverse and access elements without having to worry about what kind of traversal is being done or how the data is being stored in the container. And because C++ iterators typically use the same interface for traversal (operator++ to move to the next element) and access (operator* to access the current element), we can iterate through a wide variety of different container types using a consistent method.

Pointers as an iterator

The simplest kind of iterator is a pointer, which (using pointer arithmetic) works for data store sequentially in memory. Let’s revisit a simple array traversal using a pointer and pointer arithmetic:

Output:

0 1 2 3 4 5 6

In the above, we defined two variables: begin (which points to the beginning of our container), and end (which marks an end point). For arrays, the end marker is typically the place in memory where the last element would be if the container contained one more element.

The pointer then iterates between begin and end, and the current element can be accessed by dereferencing the pointer.

Warning

You might be tempted to calculate the end marker using the address-of operator and array syntax like so:

But this causes undefined behavior, because array[std::size(array)] accesses an element that is off the end of the array.

Instead, use:

Standard library iterators

Iterating is such a common operation that all standard library containers offer direct support for iteration. Instead of calculating our own begin and end points, we can simply ask the container for the begin and end points via functions conveniently named begin() and end():

This prints:

1 2 3

The iterator header also contains two generic functions (std::begin and std::end) that can be used:

This also prints:

1 2 3

Don’t worry about the types of the iterators for now, we’ll re-visit iterators in a later chapter. The important thing is that the iterator takes care of the details of iterating through the container. All we need are four things: the begin point, the end point, operator++ to move the iterator to the next element (or the end), and operator* to get the value of the current element.

Back to range-based for loops

All types that have begin and end member functions or can be used with std::begin and std::end are usable in range-based for-loops.

Behind the scenes, the range-based for-loop calls begin() and end() of the type to iterate over. std::array has begin and end member functions, so we can use it in a range-based loop. C-style fixed arrays can be used with std::begin and std::end functions, so we can loop through them with a range-based loop as well. Dynamic arrays don’t work though, because there is no std::end function for them (because the type information doesn’t contain the array’s length).

You’ll learn how to add functions to your types later, so that they can be used with range-based for-loops too.

Range-based for-loops aren’t the only thing that makes use of iterators. They’re also used in std::sort and other algorithms. Now that you know what they are, you’ll notice they’re used quite a bit in the standard library.

Iterator invalidation (dangling iterators)

Much like pointers and references, iterators can be left “dangling” if the elements being iterated over change address or are destroyed. When this happens, we say the iterator has been invalidated. Accessing an invalidated iterator produces undefined behavior.

Some operations that modify containers (such as adding an element to a std::vector) can have the side of effect of causing the elements in the container to change addresses. When this happens, existing iterators to those elements will be invalidated. Good C++ reference documentation should note which container operations may or will invalidate iterators. As an example, see the “Iterator invalidation” section of std::vector on cppreference.

Here’s an example of this:


6.18 -- Introduction to standard library algorithms
Index
6.16 -- An introduction to std::vector

(h/t to nascardriver for significant contributions to this lesson)

34 comments to 6.17 — Introduction to iterators

  • marius

    "All types that have begin and end member functions or can be used with std::begin and std::end are usable in range-based for-loops.
    Behind the scenes, the range-based for-loop calls begin() and end() of the type to iterate over. std::array has begin and end member functions, so we can use it in a range-based loop."

    Like that?

    • nascardriver

      Yes, but you're invoking undefined behavior by using pointer arithmetic on your class members. This is not allowed:

      If you update `Arr` to use an array internally, everything is fine. That would make the implementation trivial:

      If you want to iterate over distinct members, you'll have to create a custom iterator. We can use a vector and store pointers to the elements we want to iterator over. This requires features we haven't covered up to this point, but I'll leave it here in case you want to come back after chapter 9.

  • salah

    Dynamic arrays don’t work though, because there is no std::end function for them (because the type information doesn’t contain the array’s length.

    Could you illustrate this point more please .

    • nascardriver

      When you use a dynamically allocated array, in almost every situation it will decay to a pointer. A pointer doesn't know the length of the array it's pointing to, it doesn't even know that it's pointing to an array. Just like you can't use `sizeof` on a dynamically allocated array, you can't use `std::size`.

  • Oscar

    Hi, I have some questions:

    1. I copied the following code from the second program (line 8) in the tutorial:

    


    The tutorial says the simplest kind of iterator is a pointer. I wonder why we didn’t use `auto*`, but `auto` instead? Does `auto` handle `auto*` already? FYI, the quiz solution from 6.9a uses `auto*` when allocating an array.


    2. The tutorial says “For arrays, the end marker is typically the place in memory where the last element would be if the container contained one more element.” 

I thought the end marker is the place in memory where the “element after the last element” would be?


    3. The cppreference link about “Iterator invalidation” is really helpful. But I didn’t seem to find the following operation documented in it:

    

cppreference says `operator=` and `assign`, will always cause iterator invalidation:

    



    But I think these two are different from the operation I’m talking about. Could you help shed some light on whether `myVector[2] = 5;` causes iterator invalidation?

    • nascardriver

      1. `auto` handles `auto*`. Even though `auto` handles `auto*`, I recommend using `auto*` for clarity if the object is a pointer. This lesson uses `auto` rather than `auto*`, because it's more important that `begin` is an iterator than that it's a pointer. `&data[0]` could be replaced with another iterator and the code would still work. This can't be done if we used `auto*`.

      2. The two sentences say the same. "The element after the last element" and "The last element if the array had one more element".

      3. The assignment in this example doesn't affect the vector. The only vector operation in this example is `operator[]`. The vector isn't affected by what you do with the individual elements. Think of it like this

      • Oscar

        Thanks for the detailed reply!

        1. That's interesting. Could you elaborate a bit more why using `auto*` will disallow us to replace `&data[0]` with another iterator? Does it mean:

        2. Oh indeed! Very sorry for my misinterpretation!

        3. Thanks for the detailed explanation! It really helped a lot :D.

        • nascardriver

          1. Not replace as in assign or override, but update our code to use a different iterator type. Not all iterators are pointers. `auto*` only works for pointers. For example, using a linked list, which can't be iterated with a plain pointer:

          We don't care if `begin` is a pointer or not. We only need it to be an iterator. With `auto*`, we wouldn't be able to use just "some iterator".

          I like your questions, they go into details of the language, keep them up :)

          • Oscar

            Thanks for your detailed reply and encouragement!! I just happened to edit my original question. I removed the edited part and pasted it here:

            I think I misunderstood iterators. Like what you've said, cplusplus.com says not all iterators have the same functionality of pointers. Also, I realized iterators have their own type `std::iterator`, which is different from a pointer. So it's okay to use `auto*` if we intend to use pointers as iterators. But if we use `data1.begin()` (which returns std::iterator), we should use `auto`. Am I correct?

            I think if possible, it'd be helpful if some of the `auto` keywords in the examples are replaced with explicit types (especially when they first appeared in the code), just to make everything clear! Or perhaps adding some comments about the variable type.

            • nascardriver

              There is not iterator type per se. The only requirements for an iterator are that you can use `++it` and `*it`. A pointer fulfills these requirements, so we can use a pointer as an iterator. `std::iterator` was deprecated in C++17 and will likely be removed in the future. I can't see this being mentioned on cplusplus, so I'll recommend cppreference, where this is obvious https://en.cppreference.com/w/cpp/iterator/iterator .
              The good thing about iterators is that we don't have to care about their type or implementation. As long as they support `++it` and `*it`, which they must, we can iterate through a container. That's why the lesson uses `auto` and not `auto*`. We don't care that it's a pointer, it's some iterator, and `auto` can hold every iterator, not just pointers. Rather than using `auto*` because it's a pointer, we use `auto` to make the code work with all iterators.
              If you read through documentations, you'll see that the iterators aren't given by type, but categorized as named requirements (eg. "LegacyIterator", "LegacyOutputIterator", ...). Again, this is because the type doesn't matter. It's only important which operations are supported by the iterators. There are different kinds of iterators, some with more features than others (eg. iterating backwards or begin comparable).

              • Oscar

                Wow... thanks for correcting me again and again! The detailed explanation is superb and I really appreciate it!!

                I'll keep these in mind:
                Iterators aren't given by type, and type doesn't matter (this is why we use `auto`). What matters are the operations supported by the iterators. The only requirements for an iterator are that we can use `++it` and `*it`. `std::iterator` is deprecated in C++17 (gotta be careful when reading the docs and refer to cppreference next time!).

  • Omran

    hello!! , i know this questions out of this tutorial's context but Nascardriver are you a real nascar driver , sorry if this question was bothering anyone !! , i'm just curious , :)

    • nascardriver

      Nope, I drive a normal car, also not in circles :)

      • Omran

        hahahahahahahahahahahahaha cuz i was just reading something about centripetal acceleration so they covered some examples about nascar race or something :D , love you bro , you're helping a lot of c++ beginners , you and Alex :)

  • kavin

    Hi, why am i not able to do this?

    what does array.begin() actually give? If it's address cout'ing begin should print address of 1st element right ?

    • nascardriver

      It's an iterator, you can't print it. You can access the element the iterator is referring to via `*begin` or `*array.begin()`.
      We haven't covered user-defined types or operator overloading yet, you'll understand how this works in chapter 9.

  • hellmet

    I think it's important to note that since we cannot use 'const auto&' in the range-based-for loop [not the 'for (int i : array)'; since we need to do ++iter], any modification to the underlying datastructure invalidates the iterator (I hope I got that right, from what I read online).

    • nascardriver

      I don't understand what you mean, can you provide an example of a range-based for-loop in which we can't use `const auto&`?

      Whether or not a modification to the data invalidates iterators depends on the container. In an `std::array`, you can modify all you want without invalidating iterators. `std::vector` on the other hand might perform reallocation, which invalidates iterators.

      • hellmet

        I meant to say, one can't do this.
        Also, mistake on my part! This is not a 'range-based for-loop'.

        Oh right, the invalidation depends on the container! I meant perhaps adding the point here is invaluable information!

        • nascardriver

          I added a note about the `const` iterator variable. Container invalidation doesn't fit this lesson, I'll try to remember it and add it to a lesson about algorithms or containers if it's not there already.

          • hellmet

            I'm glad I could be of help!
            Thanks!

          • Alex

            I do think iterator invalidation is a valid topic to be mentioned here. I added a section to the bottom of the lesson, including an example of invalidation that uses only topics that have been covered so far. Let me know if you think it doesn't fit in for some reason.

  • Eddie Ball

    Loving the new content! But I'm a little confused about what makes an iterator. In the section "Iterators" you say:

    "An iterator is an object designed to iterate through a container without having to care about the internal structure of the data."

    and then give an example of an iterator that uses pointers. How is is this iterator indifferent to the internal structure of the data? Is it because it uses pointers and the auto data type?

    Thanks!

    • Alex

      That sentence was poorly written. To clarify: iterators themselves are tightly coupled to the data they're iterating over (the iterator has to care about the internal data structure to know how to move through it). However, once an iterator is created, the user of the iterator only has to use the iterator's interface to navigate and access the elements, and is thus abstracted from how the data is actually stored.

      I've amended the lesson wording a bit to try and clarify.

  • siva

    Hi,

    Can you please explain the difference between std::size and array::size ?

    Regards,
    Siva.

  • Jim

    std::size(data) works but why not data.size()?

    • nascardriver

      `std::size` works for C-style arrays too. There are none in this lesson, I think this is a leftover from when the lesson used C-style arrays. `std::size` is fine too, so I'll leave it there.

  • Louis Cloete

    Great explanation! Just one suggestion: won't it make more sense / be clearer to initialize end like this:

    ?

  • swap145

    Thank you for this new section.

Leave a Comment

Put all code inside code tags: [code]your code here[/code]