11.18 — Timing your code

When writing your code, sometimes you’ll run across cases where you’re not sure whether one method or another will be more performant. So how do you tell?

One easy way is to time your code to see how long it takes to run. C++11 comes with some functionality in the chrono library to do just that. However, using the chrono library is a bit arcane. The good news is that we can easily encapsulate all the timing functionality we need into a class that we can then use in our own programs.

Here’s the class:

That’s it! To use it, we instantiate a Timer object at the top of our main function (or wherever we want to start timing), and then call the elapsed() member function whenever we want to know how long the program took to run to that point.

Now, let’s use this in an actual example where we sort an array of 10000 elements. First, let’s use the selection sort algorithm we developed in a previous chapter:

On the author’s machine, three runs produced timings of 0.0507, 0.0506, and 0.0498. So we can say around 0.05 seconds.

Now, let’s do the same test using std::sort from the standard library.

On the author’s machine, this produced results of: 0.000693, 0.000692, and 0.000699. So basically right around 0.0007.

In other words, in this case, std::sort is 100 times faster than the selection sort we wrote ourselves!

A few caveats about timing

Timing is straightforward, but your results can be significantly impacted by a number of things, and it’s important to be aware of what those things are.

First, make sure you’re using a release build target, not a debug build target. Debug build targets typically turn optimization off, and that optimization can have a significant impact on the results. For example, using a debug build target, running the above std::sort example on the author’s machine took 0.0235 seconds -- 33 times as long!

Second, your timing results will be influenced by other things your system may be doing in the background. For best results, make sure your system isn’t doing anything CPU or memory intensive (e.g. playing a game) or hard drive intensive (e.g. searching for a file or running an antivirus scan).

Then measure at least 3 times. If the results are all similar, take the average. If one or two results are different, run the program a few more times until you get a better sense of which ones are outliers. Note that seemingly innocent things, like web browsers, can temporarily spike your CPU to 100% utilization when the site you have sitting in the background rotates in a new ad banner and has to parse a bunch of javascript. Running multiple times helps identify whether your initial run may have been impacted by such an event.

Third, when doing comparisons between two sets of code, be wary of what may change between runs that could impact timing. Your system may have kicked off an antivirus scan in the background, or maybe you’re streaming music now when you weren’t previously. Randomization can also impact timing. If we’d sorted an array filled with random numbers, the results could have been impacted by the randomization. Randomization can still be used, but ensure you use a fixed seed (e.g. don’t use the system clock) so the randomization is identical each run. Also, make sure you’re not timing waiting for user input, as how long the user takes to input something should not be part of your timing considerations.

Finally, note that results are only valid for your machine’s architecture, OS, compiler, and system specs. You may get different results on other systems that have different strengths and weaknesses.

11.x -- Chapter 11 comprehensive quiz
11.17 -- Nested types in classes

84 comments to 11.18 — Timing your code

  • yeokaiwei

    Oh, the Output in our compiler (MS VS 19) already gives us the answer.

    We are just learning to do it manually.

    1. Feedback
    There's a warning with the last example.

    Severity    Code    Description    Project    File    Line    Suppression State
    Warning    C6262    Function uses '40008' bytes of stack:  exceeds /analyze:stacksize '16384'.  Consider moving some data to heap.    Chapter8.15ChronoTiming    C:\Users\user\source\repos\Chapter8.15ChronoTiming\Chapter8.15ChronoTiming\Chapter8.15ChronoTiming.cpp    35

    2. Debug vs Release
    Huge difference between

    //x64, Debug
    //Time taken : 0.0626876 seconds
    //Time taken : 0.030281 seconds
    //Time taken: 0.0626715 seconds
    //Average = 0.05188003333 seconds

    //x64, Release
    //Time taken : 0.0002759
    //Time taken : 0.0006351 seconds
    //Time taken: 0.0009158 seconds
    //Average = 0.00060893333 seconds

    The release is ~85x faster.

  • Rishi

    This timing thing seems a little harder to keep in memory. Is this optional or important? I understood it though.

    • nascardriver

      You can always look it up when you need it.

      • Rishi

        Do you remember it? I mean, do professional programmers keep it in memory? I can memorize it if necessary. Just wanna know if it's worth doing so

        • nascardriver

          Yes, chrono is a much used library, so putting together a timer should not be a problem for developers who are familiar with chrono. There's no need to remember this timer class. All you need to know is how to get the current time, from that point you can do what you want, eg. timing code.

          • Rishi

            Okay I almost made it handy. Except, explain me this line briefly please

            • nascardriver

              `using` is a type-alias.
              `std::chrono::duration` is a type that takes template type arguments (The types in brackets, similar to how you can give `std::vector` types).
              The first argument is the type that's used to store the number of ticks, ie. the type that `.count()` returns.
              The second argument is the scale of the duration. `std::ratio<1>` is the same as `std::ratio<1, 1>`, which means the number of ticks is multiplied by 1/1 (ie. the value doesn't change). This is the default argument of `std::chrono::duration`, so you can just do

              You can also read this information on cppreference

  • kevin

    Hi sir, in Chapter "6-15-an-introduction-to-stdarray" we have not used

    but we have used it here. Is there any specific reason for it ?

  • I can't compile your optimized sort example - it appears you may be using a C++20 feature that my compiler apparently does not support:

    I'm using g++ v9.30, but it has no <ranges> header file


    • nascardriver

      Thanks for pointing this out! I've added a note to the code in question and added the pre-C++20 version.

      g++ supports ranges since version 10

  • MU

    I get multiple "Implicit conversion changes signedness: 'int' to 'std::__1::array::size_type' (aka 'unsigned long')" errors when trying to compile the selection sort timing examples. Of course that's because I have the -Wsign-conversion switch on as recommended here. You either need to turn it off to get the examples compiled or change all array index variable types, i.e startIndex, currentIndex, smallestIndex, i) from int to size_t.  

    Is there a good reason to use size_t instead of int for array indices or is the -Wsign-conversion flag making the compiler overcautious for no good reason?

    • nascardriver

      The warning is good, keep it. `int` for indexes is bad, I removed it. Indexes are always unsigned, by using `int` we're limiting the indexable range and need casts all over the place. Conventionally, all indexes are `std::size_t` (with few exceptions).

      Thanks for pointing out the warnings!

  • Van


    I have 2 examples as shown below.  Example 1 works exactly as intended; however, Example 2 is not working.  The only small difference between Example 1 and Example 2 is the different type of array.  Could you please let me know why Example 2 is not working?

    Example 1:

    Output of Example 1

    a is A
    b is B
    c is C

    a is D
    b is E
    c is F

    a is D
    b is E
    c is F

    Copy is D, E, and F

    Press any key to continue . . .

    Example 2:

    Output of Example 2:

    a is A
    b is B
    c is C

    a is D
    b is E
    c is F

    Press any key to continue . . .

  • I think it would be helpful if you could explain how the class Timer works. I understand everything except the code that has to do with the timing

  • Rushhahb


    std -> namespace
    time_point is a type(class)
    what is chrono?

  • Connor

    Hi, I've done a bit more searching around on the chrono library and I came across some more clock types: system_clock and steady_clock. What is the difference between system_clock, steady_clock and high_resolution_clock? Thanks, loved the lesson :)

  • Ged

    Why do we need to use const here?

  • manasra

    how to configure it please?

  • Jordy

    Third to last paragraph: temporary instead of temporarily

  • Ares

    Alex, do you memorise all this stuff or do you copy the code from somewhere when you need it?

    I can not imagine for the life of me remembering the class. Even if I memorise it today I would not be able to redo it in a month.

    • Alex

      Copy it for sure!

      IMO, C++ is 50% knowing things and 50% being able to remember where to look things up.

      • Ares

        Through out the entire tutorial up until now I wondered about this, "is he some wizard!?".

        Perhaps devoting an article on this early on might be a good idea! It really made me feel stupid at times wondering whether or not this is something to memorise or not.

  • Helliarc

    Ah, you guys did a race too!  If you didn't see my comment before, my friends and I had a race I made from calculating prime numbers.  I lost... But on std::sort I get: 0.10000 seconds.  You must be using a beast single core CPU with a TON of cache!  I'm on a Ryzen 2700x, definitely under-performs on a single core...

  • sonngo

    Hey Alex, what is your computer specs? I have a notebook with i5-3210m, 4gb ram. Everytime I run the selection sort algorithm with 1000 double and the timer's output is always 0.

    • 1000 elements is very few for a computer and it should take close to 0 time units. However, I find exactly 0 unlikely. Do you get positive numbers when you increase the array's size?

      What's the output of

  • Gili

    I think I found a typo.

    "we can easily encapsulate all the timing **functional** we need"

    "functional" should be replaced by "functionality"

  • Olga

    Additional, what is the meaning of this part?

  • Olga

    Can you explain please what these lines mean?

    • Alex

      The using statements are acting as type aliases (see "type aliases" in the site index for more information). The third statement is defining a new member named m_beg of type std::chrono::time_point. You've seen examples of both the prefixing (e.g. std:: prefix on almost everything) and the angled brackets (see the examples of std::array or std::vector. These are actually template instantiations -- we cover templates in chapter 13.

  • Sod

    Instead of using chrono and actually time programs, won't it be much more sensible to count clocktics instead? (and get dynamic and hard disk calls separately, as thay are slower by changing by the machine)

    • Alex

      Not sure what you mean by clock ticks in this context -- ticks in reference to a clock typically just means a count of some time unit (e.g. microseconds). Are you instead trying to suggest we count via cpu cycles or some other cpu related metric?

  • DecSco

    You are using single quotation marks around ' seconds\n'. It should be double quotation marks, I think.

  • Peter Baum

    Maybe it will be helpful to have a practical timing example which demonstrates some of the validity issues that we have been considering here.  The following is code that tests two different implementations of the binary search from a previous lesson.  Here is a test program to try to determine which of two approaches is better.  

    If I just run the program as a stand-alone .exe file (i.e., not within an IDE) and don't bother to disable the network or virus scanner, I get these results:

    Microsoft Windows [Version 10.0.16299.431]
    (c) 2017 Microsoft Corporation. All rights reserved.
    Alex Binarysearch Time elapsed: 8.2431 seconds.
    Peter Binarysearch3 Time elapsed: 6.84111 seconds.
    Normalized version is 17.008% faster.
    Alex Binarysearch Time elapsed: 8.23866 seconds.
    Peter Binarysearch3 Time elapsed: 6.85477 seconds.
    Normalized version is 16.7976% faster.
    Alex Binarysearch Time elapsed: 8.79291 seconds.
    Peter Binarysearch3 Time elapsed: 6.85149 seconds.
    Normalized version is 22.0794% faster.
    Alex Binarysearch Time elapsed: 8.23806 seconds.
    Peter Binarysearch3 Time elapsed: 7.46674 seconds.
    Normalized version is 9.36288% faster.
    Alex Binarysearch Time elapsed: 8.25212 seconds.
    Peter Binarysearch3 Time elapsed: 6.85163 seconds.
    Normalized version is 16.9713% faster.

    So from a practical point of view, should we conclude that the Normalized version of a binary search is best?  Without additional information, should we definitely use that method?

    • Alex

      Yes, we can reasonably conclude that the normalized version is faster. Look how nice and clustered those times are -- with the exception of the 3rd Alex and 4th Peter, they're all within hundredths of a second per run each time.

      • Peter Baum

        Then should we update (or add to) the solution to 3a) of the comprehensive quiz for chapter 7 ( to show this better implementation of a binary search?

        • Alex

          We could, but the point of these tutorials is to teach C++ fundamentals, not be an archive of the best data structures and algorithms. As such, I generally favor simplicity over performance.

          • Peter Baum

            Hi Alex,
            A couple of thoughts...
            1. It isn't that much more complicated.
            2. Perhaps, as a minimum, there should information so students aren't misled into thinking that the implementation is optimum.  Same for the sort algorithm.

    • Anastasia

      I'm just curious whether the normalized version's speed boost compared to the Alex's variant is (to some extent) due to using right-shift (>>) bitwise operator as opposed to the ordinary division one?

      Running the original version of the code above I've got the results close to the author's (~18-20% faster norm.version). Changing line 69 (Alex binarySearch) from


      the difference in speed was reduced to no more than 4.7%.

      Next step was to change line 50 (Peter binarySearch) so it uses division instead of the shift:


      After that the normalized version became actually slower than Alex's one (~ -13.5%).

      edit:wrong line numbers

      • gcc, clang and msvc (The 3 major compilers) all turn the division by 2 into a right shift.
        The time difference you're observing is caused by something else or you didn't enable compiler optimizations.
        These are the little things that should be left to the compiler. Favor readability. If you want to divide, divide.

  • Peter Baum

    I previously posted about some of the problems I encountered trying to do valid program timing under Microsoft Windows.  I did a little more work trying to see if I could run a timing program under the Microsoft Windows recovery disk OS to avoid some of the timing pitfalls that one encounters trying to use Windows 10.  You start by creating a recovery disk or recovery USB and bring up that OS using one of the BIOS routines.  It is possible to get to a command prompt (DOS box) from this free OS by a menu item under the troubleshooting menu.  When I tried to run a timing .exe, I got the message “This version of [program name.exe] is not compatible with the version of Windows you’re running.”  It appears that the OS being used for recovery in my case is actually Windows PE.

    The issues then becomes:

    1. Would Windows PE introduce its own set of timing issues?

    2. How should your IDE be configured so that a program will run under that operating system?

    3. Are there limits to the kind of program that will run under Windows PE that will make this approach unsuitable?

    At I found some information that will help with issues 2 and 3.  For example,

    a) The program will not run if it uses MFC or ATL.  MFC (Microsoft Foundation Classes) provide a C++ object-oriented wrapper over Win32 for rapid development of native desktop applications.  ATL (Active Template Library) is a wrapper library that simplifies COM development and is used extensively for creating ActiveX controls.

    b) You also have to link to the static run-time libraries and thus not use Msvcrt.dll.

    c) The project properties under Configuration Properties \ C/C++ RunTime Library must be set to Multi-threaded or Multi-threaded debug, not one of the .dll versions.

    d) You cannot run the Common Language Runtime (CLR).  See for more information about CLR.

    So I don’t know at this point how realistic using Windows PE is going to be.  In any case, finding out more about this approach would be something of a project unless someone has already gone down that path.

    Other approaches might use an actual real-time OS.  Such an operating system would allow fine control of priorities and interrupts but the underlying OS might not support everything written in CPP with a particular IDE.

    All that I have seen so far leaves this timing lesson in a very awkward position:  Although we all recognize how important timing is for certain kinds of program development, we have not been given the tools to perform valid timing tests.

    • Alex

      I think all of the constraints you list above are achievable: You shouldn't be using MFC, ATL, or CLR anyway, and linking to the static run-times without multithreading is definitely doable. I think I cover how in appendix A.

  • Peter Baum

    I don't think the constructor should initialize m_beg because we don't know for any particular compiler, if the results will be the same as a call to reset().  In other words, we should force the user to always call reset() to start the timing.

    • nascardriver

      Hi Peter!

      Why would the results be different?
      I partially agree, though. I think there should "start" and "stop" functionality.

      • Peter Baum

        The results might differ because initialization using the constructor as part of the operation of creating an object may be different from the call to the start (reset) function.  In particular, we don't know when the call to get the system clock information occurs in that creation sequence.  Even something like obtaining the time within a different function calling sequence (which involves stack operations) or different page faulting might make a difference.

  • Peter Baum

    I think this lesson is extremely important because having the ability to evaluate the execution-time performance of routines is sometimes critical to the success of a computer program.  Unfortunately, we face some very significant obstacles trying to do accurate timing, only some of which were addressed in this lesson.  Here are some additional topics that might be considered.  I’m sure there are experts out there who know much more about this than I do.

    1.    Running a program 3 or more times is insufficient to detect whether or not the timing is reliable.  Consider a delay that takes effect 50% of the time.  That means that 1/8 times or 12.5% of the time we can expect a situation where there is no detected delay 3 times in a row.  We also have the same probability that the delay will occur 3 times in a row.

    2.    There are potential problems using a routine itself as an experimental control.  This is effectively what is being done when you run the same routine several times and look for variations in the results.  A better approach is to use relatively simple routines as controls that exercise possible problem areas.  Thus, controls might stress machine language execution, memory, or I/O, and we can also use various combinations of these fundamental routines.  If these simple controls produce consistent results, then we can be pretty sure that our basic timing platform is suitable.  

    3.    We need practical advice about placing our machine in a state that is likely to produce the best results.  One idea I had was to put my windows 10 machine in safe-mode.  That got rid of some things like the network, but I was surprised at how much was left running, including Cortana!  One can use the task manager to stop programs such as anti-virus software, but others are hard to delete (they automatically restart), can’t be deleted because of permissions, or should not be deleted.  Even in safe mode, I was seeing very obvious performance variations in the 0.5% range.

    Can an expert give us some advice here?

    4.    We know that the timers used to measure time intervals are part of a hardware/software system that depends upon interrupts.  There can be several timer interrupts and they can be interfered with by other hardware interrupts and interrupt service routines.  However, the situation is worse than I thought.  From I learned

    “Because timers on Windows do not have direct support from hardware, it means that they are not very accurate. That is even more so on Windows 8 which introduces a new feature called “dynamic clock tick”, where the system clock does not interrupt at fixed intervals but rather, when the operating system deems it necessary for power saving reasons. This has caused any measuring method dependent on a kernel timer to become unreliable. If a device should need accurate periodic attention from software, the hardware should come equipped with its own timer that can fire interrupts so that it can be serviced by software in a timely fashion.  This is true despite the fact that the Windows 8.1 WDK has a new feature called high resolution timers.”

    5.    Even if we were to solve the underlying timing problems and figured out some reasonable bounds for the error we could expect from the results of our experiment, there are other subtle potential sources of error.  As an example, I was tripped up comparing two routines that I arranged so that one routine would be timed and the results output followed by the timing of the second routine and outputting its results.  What I hadn’t considered is that the output from the first routine might be asynchronously output while the second routine is running, making the second routine appear to run slower than it might otherwise.  I also ended up doing a timer reset before the first routine because I could not be sure that the result would be the same as the initialization of the timer during timer object construction.  Vectors doing garbage collection would be another potentially issue, along with things like page swaps, and the use of multiple cores.  The experience left me wondering what other mistakes I might have made.

    6.    One of my complaints about C++ is that so much goes on that is invisible to the programmer.  That means that optimizing a routine is difficult because it is hard to know what (perhaps obvious) optimizations will be applied to the code.  Yes, we could try to look at the assembly language code, but not only is this inconvenient, it is sometimes difficult to follow if other kinds of optimization has been applied.  As a practical matter, I think perhaps I should just assume that the compiler doesn’t do much optimization.

    7.    Comparing two routines can also be made difficult if the algorithms are fast for some configurations and slow for others.  Sometimes we just end up with “it depends” rather than being able to say one is faster than another.

    It is tempting to run some timing tests and if we see a 5 or 10 percent difference and we have taken some precautions, then assume that the results have some validity.  Putting on my scientist hat, I am not so sure.

    • nascardriver

      Hi Peter!

      If you haven't already, you might want to take a look at the Big O notation. It can be used to compare efficiencies of algorithms without even needing to compile the code.

      * (I didn't read the article, but it seems like a good start)

      • Peter Baum

        Hi nascardriver,

        That is certainly a useful tool when comparing algorithms.  That consideration is especially important when the orders are obviously different.  However, at some point (especially if the orders are the same) we get down to specifics that relate to actual machine language execution times.  In fact, it is possible that the order difference at some large number or at infinity is completely irrelevant to a practical implementation.

        Another thing that can be done is to look at machine instruction execution times.  With cache and multiple execution units, that has gotten hard to do now.  There may be software that will assist; anyone know?

        My other thought is that perhaps a system repair disk might contain enough of a simplified operating system so that we could execute programs that only use the CLI.  I haven't experimented with that yet but perhaps somebody has already tried it.

        • nascardriver

          Another thing to keep in mind is that your CPU might have one or another feature that allows algorithm A to perform better than algorithm B even though B performs better than A on another machine. I don't think timing code is a reliable way of comparing algorithms.
          By "perform" I mean "run faster".

          • Peter Baum

            Hi nascardriver,
            I agree with what you suggest about the performance on two different machines.  However, we will be confronted with having to choose an algorithm to implement, and making a choice based on some kind of testing and documentation is better than giving up completely.

  • Peter Baum

    1. In the timer class, why not use uniform initialization?  If the argument is that you want it to compile regardless of the compiler used, that same argument suggests we shouldn't bother learning any of the new c++ capabilities and just stick to the old formats.  Perhaps the way forward is to teach and use the new formats and make a separate note telling people how to make do if they aren't willing to obtain an uptodate compiler.

    2. The code in main() is missing the units.  Because of the duration cast used, it should be

  • Peter Baum

    First sentence: "performant" isn't a word.  You want to talk about execution speed here.

  • J Gahr

    Hello! In Visual Studio 2017, I ran your selection sort example, and then replaced std::array<>() with array[] to see how their times compared:

    In release mode, the time taken by array[] was identical to std::array<>()'s time (roughly 0.12 seconds).

    In debug mode, however, using array[] resulted in a consistent time of 0.12 seconds, and when I used std::array<>() the time taken was always hovering around 2.1 seconds.

    I was wondering if this is a typical result that happens across most compilers and machines, or if this was specific to my compiler/machine. What do you guys think?

    • nascardriver

      Hi J!

      Debug mode is always slower, because compiler optimizations are usually disabled and your compiler might have added some extra checks.

  • Pierce

    I'm assuming that if the timer outputs 0, then the timer is not set to have enough precision to actually hold the amount of seconds it took

    • Alex

      C++ doesn't require the time to have a specific resolution, so maybe yours simply doesn't have a good enough resolution. You can see your resolution as follows:

      This shows you how many ticks per second your clock has.

  • Sébastien

    I was not able to run the code above although it compiles without any warning. std::chrono crashes the program even before it reaches the first line of main (used gdb to discover that). I use a Win 10 64x architecture(Core-i5 4200H).

  • Benjamin

    Some thoughts about this section:
    The std::sort is deploying better algorithm than yours. Although std::sort sorting algorithm differs by compiler by compiler, it must have average complexity requirement of O(n*log(n)) by the ISOCPP. However, your sort implementation is a kind of selection sort, which causes average complexity(also best and worse) to be O(n^2). Roughly comparing, although the constant will be assumed to be 1, the complexity ratio in your case may be (yours):(std) = 10^8:40000. I think that that would be the one explanation why your implementation was slow, plus the compiler might have the optimized function itself.
    Also, I think your array initialization was not good, as you rely on some garbage values which may not be garbage values on some machine or compiler even compiling mode. Thus, I think it is wise that you fix your array initialization using random function inside <random.h>.
    Finally, I would be appreciated if you publicize the name of the compiler used on the compilation, as it would help everyone compare the algorithm implemented.

    • Alex

      You're correct in that my selection sort algorithm is o(n^2), whereas std::sort is o(n log n). This should account for the bulk of the difference in timing.

      My array initialization looks fine to me. I initialize elements 0 through length-1 with integer values length-1 through 0 (essentially, the array starts reverse sorted). Originally I was randomizing the array, but you have to be careful with that so to ensure you get the same randomization across both runs -- that adds a bit of complexity that I thought muddled the example slightly.

      This was done on Visual Studio 2017. I'm not sure what algorithm it's using. It might be

    • Peter Baum

      There are many different sorting algorithms and no one best algorithm.  There is a great deal of information on the internet about sorting algorithms and the circumstances where you might choose one method over the other.  One of the nicest visual displays that shows something about the different methods used by some of the common ones can be found at

Leave a Comment

Put all code inside code tags: [code]your code here[/code]