Search

9.4 — Sorting an array using selection sort

A case for sorting

Sorting an array is the process of arranging all of the elements in the array in a particular order. There are many different cases in which sorting an array can be useful. For example, your email program generally displays emails in order of time received, because more recent emails are typically considered more relevant. When you go to your contact list, the names are typically in alphabetical order, because it’s easier to find the name you are looking for that way. Both of these presentations involve sorting data before presentation.

Sorting an array can make searching an array more efficient, not only for humans, but also for computers. For example, consider the case where we want to know whether a name appears in a list of names. In order to see whether a name was on the list, we’d have to check every element in the array to see if the name appears. For an array with many elements, searching through them all can be expensive.

However, now assume our array of names is sorted alphabetically. In this case, we only need to search up to the point where we encounter a name that is alphabetically greater than the one we are looking for. At that point, if we haven’t found the name, we know it doesn’t exist in the rest of the array, because all of the names we haven’t looked at in the array are guaranteed to be alphabetically greater!

It turns out that there are even better algorithms to search sorted arrays. Using a simple algorithm, we can search a sorted array containing 1,000,000 elements using only 20 comparisons! The downside is, of course, that sorting an array is comparatively expensive, and it often isn’t worth sorting an array in order to make searching fast unless you’re going to be searching it many times.

In some cases, sorting an array can make searching unnecessary. Consider another example where we want to find the best test score. If the array is unsorted, we have to look through every element in the array to find the greatest test score. If the list is sorted, the best test score will be in the first or last position (depending on whether we sorted in ascending or descending order), so we don’t need to search at all!

How sorting works

Sorting is generally performed by repeatedly comparing pairs of array elements, and swapping them if they meet some predefined criteria. The order in which these elements are compared differs depending on which sorting algorithm is used. The criteria depends on how the list will be sorted (e.g. in ascending or descending order).

To swap two elements, we can use the std::swap() function from the C++ standard library, which is defined in the utility header.

This program prints:

Before swap: x = 2, y = 4
After swap:  x = 4, y = 2

Note that after the swap, the values of x and y have been interchanged!

Selection sort

There are many ways to sort an array. Selection sort is probably the easiest sort to understand, which makes it a good candidate for teaching even though it is one of the slower sorts.

Selection sort performs the following steps to sort an array from smallest to largest:
1) Starting at array index 0, search the entire array to find the smallest value
2) Swap the smallest value found in the array with the value at index 0
3) Repeat steps 1 & 2 starting from the next index

In other words, we’re going to find the smallest element in the array, and swap it into the first position. Then we’re going to find the next smallest element, and swap it into the second position. This process will be repeated until we run out of elements.

Here is an example of this algorithm working on 5 elements. Let’s start with a sample array:

{ 30, 50, 20, 10, 40 }

First, we find the smallest element, starting from index 0:

{ 30, 50, 20, 10, 40 }

We then swap this with the element at index 0:

{ 10, 50, 20, 30, 40 }

Now that the first element is sorted, we can ignore it. Now, we find the smallest element, starting from index 1:

{ 10, 50, 20, 30, 40 }

And swap it with the element in index 1:

{ 10, 20, 50, 30, 40 }

Now we can ignore the first two elements. Find the smallest element starting at index 2:

{ 10, 20, 50, 30, 40 }

And swap it with the element in index 2:

{ 10, 20, 30, 50, 40 }

Find the smallest element starting at index 3:

{ 10, 20, 30, 50, 40 }

And swap it with the element in index 3:

{ 10, 20, 30, 40, 50 }

Finally, find the smallest element starting at index 4:

{ 10, 20, 30, 40, 50 }

And swap it with the element in index 4 (which doesn’t do anything):

{ 10, 20, 30, 40, 50 }

Done!

{ 10, 20, 30, 40, 50 }

Note that the last comparison will always be with itself (which is redundant), so we can actually stop 1 element before the end of the array.

Selection sort in C++

Here’s how this algorithm is implemented in C++:

The most confusing part of this algorithm is the loop inside of another loop (called a nested loop). The outside loop (startIndex) iterates through each element one by one. For each iteration of the outer loop, the inner loop (currentIndex) is used to find the smallest element in the remaining array (starting from startIndex+1). smallestIndex keeps track of the index of the smallest element found by the inner loop. Then smallestIndex is swapped with startIndex. Finally, the outer loop (startIndex) advances one element, and the process is repeated.

Hint: If you’re having trouble figuring out how the above program works, it can be helpful to work through a sample case on a piece of paper. Write the starting (unsorted) array elements horizontally at the top of the paper. Draw arrows indicating which elements startIndex, currentIndex, and smallestIndex are indexing. Manually trace through the program and redraw the arrows as the indices change. For each iteration of the outer loop, start a new line showing the current state of the array.

Sorting names works using the same algorithm. Just change the array type from int to std::string, and initialize with the appropriate values.

std::sort

Because sorting arrays is so common, the C++ standard library includes a sorting function named std::sort. std::sort lives in the <algorithm> header, and can be invoked on an array like so:

By default, std::sort sorts in ascending order using operator< to compare pairs of elements and swapping them if necessary (much like our selection sort example does above).

We’ll talk more about std::sort in a future chapter.

Quiz time

Question #1


Manually show how selection sort works on the following array: { 30, 60, 20, 50, 40, 10 }. Show the array after each swap that takes place.

Show Solution

Question #2


Rewrite the selection sort code above to sort in descending order (largest numbers first). Although this may seem complex, it is actually surprisingly simple.

Show Solution

Question #3


This one is going to be difficult, so put your game face on.

Another simple sort is called “bubble sort”. Bubble sort works by comparing adjacent pairs of elements, and swapping them if the criteria is met, so that elements “bubble” to the end of the array. Although there are quite a few ways to optimize bubble sort, in this quiz we’ll stick with the unoptimized version here because it’s simplest.

Unoptimized bubble sort performs the following steps to sort an array from smallest to largest:
A) Compare array element 0 with array element 1. If element 0 is larger, swap it with element 1.
B) Now do the same for elements 1 and 2, and every subsequent pair of elements until you hit the end of the array. At this point, the last element in the array will be sorted.
C) Repeat the first two steps again until the array is sorted.

Write code that bubble sorts the following array according to the rules above:

Print the sorted array elements at the end of your program.

Hint: If we’re able to sort one element per iteration, that means we’ll need to iterate roughly as many times as there are numbers in our array to guarantee that the whole array is sorted.
Hint: When comparing pairs of elements, be careful of your array’s range.

Show Solution

Question #4


Add two optimizations to the bubble sort algorithm you wrote in the previous quiz question:

  • Notice how with each iteration of bubble sort, the biggest number remaining gets bubbled to the end of the array. After the first iteration, the last array element is sorted. After the second iteration, the second to last array element is sorted too. And so on… With each iteration, we don’t need to recheck elements that we know are already sorted. Change your loop to not re-check elements that are already sorted.
  • If we go through an entire iteration without doing a swap, then we know the array must already be sorted. Implement a check to determine whether any swaps were made this iteration, and if not, terminate the loop early. If the loop was terminated early, print on which iteration the sort ended early.

Your output should match this:

Early termination on iteration 6
1 2 3 4 5 6 7 8 9

Show Solution


9.5 -- Multidimensional Arrays
Index
9.3 -- Arrays and loops

602 comments to 9.4 — Sorting an array using selection sort

  • spaceman

    I can use:

    std::size_t, std::swap() and std::size()

    regardless of this headers being added or not

    #include <cstddef>
    #include <iterator>
    #include <utility>

    I just use
    #include <iostream>

    why?

    • Alex

      Because your iosteam includes these other headers, which means when you include iostream, those headers are included transitively. Although this may work on your compiler, you should not rely on it. Explicitly include all the headers you need.

      • spaceman

        does this mean that iostream is different in every compiler? I use VS2019, how should I configure it so my iostream is the same as all others iostreams?

        Thanks in advanced and love your work.

        • Alex

          Different implementations of iostream may include different headers. Header files do not provide any guarantees about what other headers they include.

          Rather than try and "fix" iostream, just #include the headers you need directly in your code, as is the best practice.

  • RR

    Can we bubble sort in this way instead? Is this correct (I know it works) and are there any downsides to this?

  • John

  • Dark_Tracker

    Hi nascardriver,
    There is a bug in solution of Question 2. I think the swap function should be inside the if(..) statement

    Like this
            if (array[currentIndex] > array[largestIndex])
                {
                    // This is the new largest number for this iteration
                    largestIndex = currentIndex;
                    // Swap our start element with our largest element
                    std::swap(array[startIndex], array[largestIndex]);
                }

  • Lucy Kumapawa

    Question #4.
    There are definition and declaration in the first loop: // bool swapped{ false };
    Is there any difference if we would declare bool variable out of the loop and would leave in the loop only definition?
    Thanks in advance for answering.

    • nascardriver

      If you declared `swapped` outside of the loop, you'd have to reset it in every iteration.
      For fundamental types, there's no difference between declaring it inside or outside of the loop.
      For user-defined types, assignment can be faster than initialization. Even then, the variable will often be defined inside of the loop for readability's sake.

  • Alexander Agruso

    Hey, for the first optimization of bubble sort, I came up with something slightly different, that still sorts the array properly. Here is my algorithm:
    [code]
        for (int startIndex {0}; startIndex < size - 1; startIndex++)
        {
            bool noSwap {true};
            
            for (int currentIndex {0}; currentIndex < size - startIndex; currentIndex++)
            {
                if (array[currentIndex] > array[currentIndex + 1])
                {
                    noSwap = false;
                    std::swap(array[currentIndex], array[currentIndex + 1]);
                }
            }
            
            if (noSwap)
            {
                std::cout << "Early termination at " << startIndex + 1 << " iterations" << std::endl;
                break;
            }
        }
    [code]

  • kio

    Hi Nascar and Alex, I'm back :D

    Is having adjcentIndex and doing ++adjcentIndex every iteration better soultion than doing [currentIndex+1] from performance standpoint?

    Thank you.

    • nascardriver

      Hi kio!

      I don't think there's a difference. In either case you have to do `++currentIndex`. In one case you do `currentIndex + 1`, in the other you do `++adjecentIndex`. You can run benchmarks if you want to know for sure, though I wouldn't be surprised if the compilers turns both versions into the same binary code.

  • Husain Patel

    Please if anyone can explain question#4 in a more easy way, I did not get it.

  • SuperNoob

    Everything is fine except the site shows some weird behavior at times. Like comment keeps loading, or keeps saving. Sometime the site itself doesn't load at all. Shows "origin error". Although I have fast internet connection and every other sites loads well.

  • SuperNoob

    Solution #3
    I did some benchmarks. Mine is a bit faster. Besides, I think it's a bit easier to understand than nested for loops.

  • betty

    could you please help me to Write a C++ program that will implement all of the sorting algorithms. The program should accept different unsorted data items from user and sort using all algorithms and should tell which algorithm is efficient for that particular unsorted data items.

  • Hi. When you determine the size of the array, you specify the variable type arrayLength as int, and then in the initialization you static_cast the std::size(array) to an int.

    Since you have already set arrayLength as int, an autoconversion is already taking place, right? So why bother to static_cast the value inside? What is the gain on int? Is there a problem if I don't static_cast?

    Thank you!

    • nascardriver

      I've updated the lesson to use `std::size_t` rather than `int`. The cast should not have been necessary.

      • habib rijik doyan ngentot

        hey nascardriver?

        about using 'std::size_t' here to determine the length of the array, is there a particular reason for that? I heard that if i'm using int, compiler would give me warning, is that true? what's the cause of that?

        Thanks before

        • nascardriver

          C++ uses `std::size_t` for all kinds of sizes, including array sizes. If you use an `int`, you might get a warning because you're converting from an unsigned integer type to `int`.

  • Samenergustav

    Is my implementation of not re-checking elements effective/recommended?

  • Andreas Krug

    Small suggestion:
    And swap it with the element in index 4 (which doesn’t do anything):
    { 10, 20, 30, 40, 50 }
    instead of
    And swap it with the element in index 4 (which doesn’t do anything):
    { 10, 20, 30, 40 50 }

  • Sahil

    Why do you use "int" instead of "size_t" for array index and size?
    Is there any specific reason for doing so? After all

    is much much easier to read than,

    Thanks.

    • nascardriver

      No good reason, use `std::size_t` unless you need a signed integer.

      • Sahil

        Ok thanks

        One more question,
        Is there a difference between std::size_t and size_t?

        • nascardriver

          `size_t` comes from <stddef.h>, which is a C compatibility header, as you can tell by the .h file extension. C++'s standard library doesn't use .h extensions, it has <cstddef>, which defines its contents inside of the `std` namespace.

          The type of `size_t` is the same in both files, but the non-std version is old and will hopefully get removed in a future version of C++.

          • Sahil

            I see, so basically using std::size_t would ensure that the program will remain compatible in the future as well, in case the non-standard version gets removed.

            Thanks again for the help

  • Berrie

    Thanks for all the great tutorials. I'm going through a few chapters a week still. Can't wait to finally get to objects. Been struggeling to properly understand that for about a decade. I think I'll finally manage because I've already learned so much on this course and how ugly my code from the past is.

    This is what I got for question 4. I had the first optimization for Q4 already in my code for Q3.

    I've used an int instead of a bool to keep track of swaps for early termination of the loop. Is using a bool more efficient for large arrays?

    • nascardriver

      Hi!

      You never access `check` apart from checking if it's non-zero, so there is no reason to use an `int`.
      Using an `int` might confuse the reader, because they'll think you need to know the number of swaps for something, when really, you don't. Although unlikely, you could also overflow the `int` if too many swaps happen, which would invoke undefined behavior.

  • Waldo Lemmer

    1. Section "How sorting works":
    > To swap two elements, we can use the std::swap() function from the C++ standard library, which is defined in the algorithm header. For efficiency reasons, std::swap() was moved to the utility header in C++11.

    All snippets in this lesson #include <algorithm> instead of <utility>, even though pretty much everyone is using at least C++14 for new projects by now.

    2. Is getting a value from an array equally as fast as getting a value from a variable? Would it be faster to keep track of an array element with a variable storing its value, as opposed to a variable storing its index?

    3. Section "Selection sort in C++":
    > Sorting names works using the same algorithm. Just change the array type from int to std::string, and initialize with the appropriate values.

    The < operator (line 19) doesn't work with strings

    4. Question #2's solution:
    > smallestIndex should probably be renamed largestIndex as well.

    Pressing Ctrl+R, Ctrl+R while the cursor is over an identifier allows you to rename all instances of said identifier (Visual Studio). I find it very useful :)

    • nascardriver

      1. Thanks

      2. Getting a value from an array is slower than accessing a variable directly, because the address of the element has to be calculated first. `a[3]` is `*(a + 3)`. Whether or not you can store an element in a variable depends on the situation and the element type. If you need the same element multiple times, you can bind a reference to it.

      3. The < operator works with strings, it compares an alphabetic comparison https://en.cppreference.com/w/cpp/string/basic_string/operator_cmp

  • Put all code inside code tages explained where you write a comment.

  • J34NP3T3R

    how do you quote codes like that. mine looks like a comment and harder to read.

    • nascardriver

      "Put all code inside code tages explained where you write a comment."
      - Wake in the comment above yours

      There's a yellow box with an example every time you write a comment.

  • J34NP3T3R

    i changed the example a bit so that we only (for)loop half of the array.
    basically its the solution provided in the "Selection sort in C++" section PLUS comparing LOWEST and HIGHEST values in one loop instead of just the LOWEST value.
    i was hoping to optimize it a little bit so i thought less loop = optimize.
    BUT the problem is in exchange for half loop, we still have the same number of comparisons and the same number of swaps.
    is this going to work ?

    #include <iostream>
    #include <iterator>     // for std::size
    #include <algorithm>

    int main()
    {
        int aray[]{ 30, 60, 20, 50, 40, 10 };
        int len{ std::size(aray) };
        int maxIteration{ static_cast<int>(len / 2) }; // half size of the array. fractional part discarded.
                                                       // if array length is odd number the middle element is not iterated but will be in correct position in the end.

        // sorting (range begin from start of the array up to the end of the array and the range gets shorter each iteration)  
        for (int startIndex{ 0 }, endIndex{ len - 1 }, iteration{ 1 } ; iteration <= maxIteration; ++startIndex, --endIndex ,++iteration )
        {
            // scan each element in the current range to get the highest and lowest element value in this range
            for (int i{ startIndex }; i <= endIndex; ++i)
            {
                if (aray[i] < aray[startIndex])
                    std::swap(aray[i], aray[startIndex]);

                // its ok if both if statements are true for this element. it will be fixed by the next loop.

                if (aray[i] > aray[endIndex])
                    std::swap(aray[i], aray[endIndex]);
            }
        }

        // display result
        for (int i{ 0 }; i < len; ++i)
        {
            std::cout << aray[i] << " ";
        }

        return 0;
    }

  • Getting better at solving my own problems!

    Found this std::reverse function

    Now, let's view the solution :)

Leave a Comment

Put all code inside code tags: [code]your code here[/code]