5.9 — Random number generation

The ability to generate random numbers can be useful in certain kinds of programs, particularly in games, statistics modeling programs, and scientific simulations that need to model random events. Take games for example -- without random events, monsters would always attack you the same way, you’d always find the same treasure, the dungeon layout would never change, etc… and that would not make for a very good game.

So how do we generate random numbers? In real life, we often generate random results by doing things like flipping a coin, rolling a dice, or shuffling a deck of cards. These events involve so many physical variables (e.g. gravity, friction, air resistance, momentum, etc…) that they become almost impossible to predict or control, and produce results that are for all intents and purposes random.

However, computers aren’t designed to take advantage of physical variables -- your computer can’t toss a coin, throw a dice, or shuffle real cards. Computers live in a controlled electrical world where everything is binary (false or true) and there is no in-between. By their very nature, computers are designed to produce results that are as predictable as possible. When you tell the computer to calculate 2 + 2, you always want the answer to be 4. Not 3 or 5 on occasion.

Consequently, computers are generally incapable of generating random numbers. Instead, they must simulate randomness, which is most often done using pseudo-random number generators.

A pseudo-random number generator (PRNG) is a program that takes a starting number (called a seed), and performs mathematical operations on it to transform it into some other number that appears to be unrelated to the seed. It then takes that generated number and performs the same mathematical operation on it to transform it into a new number that appears unrelated to the number it was generated from. By continually applying the algorithm to the last generated number, it can generate a series of new numbers that will appear to be random if the algorithm is complex enough.

It’s actually fairly easy to write a PRNG. Here’s a short program that generates 100 pseudo-random numbers:

The result of this program is:

23070   27857   22756   10839   27946
11613   30448   21987   22070   1001
27388   5999    5442    28789   13576
28411   10830   29441   21780   23687
5466    2957    19232   24595   22118
14873   5932    31135   28018   32421
14648   10539   23166   22833   12612
28343   7562    18877   32592   19011
13974   20553   9052    15311   9634
27861   7528    17243   27310   8033
28020   24807   1466    26605   4992
5235    30406   18041   3980    24063
15826   15109   24984   15755   23262
17809   2468    13079   19946   26141
1968    16035   5878    7337    23484
24623   13826   26933   1480    6075
11022   19393   1492    25927   30234
17485   23520   18643   5926    21209
2028    16991   3634    30565   2552
20971   23358   12785   25092   30583

Each number appears to be pretty random with respect to the previous one. As it turns out, our algorithm actually isn’t very good, for reasons we will discuss later. But it does effectively illustrate the principle of PRNG number generation.

Generating random numbers in C++

C (and by extension C++) comes with a built-in pseudo-random number generator. It is implemented as two separate functions that live in the cstdlib header:

srand() sets the initial seed value to a value that is passed in by the caller. srand() should only be called once at the beginning of your program. This is usually done at the top of main().

rand() generates the next random number in the sequence. That number will be a pseudo-random integer between 0 and RAND_MAX, a constant in cstdlib that is typically set to 32767.

Here’s a sample program using these functions:

Here’s the output of this program:

17421	8558	19487	1344	26934	
7796	28102	15201	17869	6911	
4981	417	12650	28759	20778	
31890	23714	29127	15819	29971	
1069	25403	24427	9087	24392	
15886	11466	15140	19801	14365	
18458	18935	1746	16672	22281	
16517	21847	27194	7163	13869	
5923	27598	13463	15757	4520	
15765	8582	23866	22389	29933	
31607	180	17757	23924	31079	
30105	23254	32726	11295	18712	
29087	2787	4862	6569	6310	
21221	28152	12539	5672	23344	
28895	31278	21786	7674	15329	
10307	16840	1645	15699	8401	
22972	20731	24749	32505	29409	
17906	11989	17051	32232	592	
17312	32714	18411	17112	15510	
8830	32592	25957	1269	6793

PRNG sequences and seeding

If you run the rand() sample program above multiple times, you will note that it prints the same result every time! This means that while each number in the sequence is seemingly random with regards to the previous ones, the entire sequence is not random at all! And that means our program ends up totally predictable (the same inputs lead to the same outputs every time). There are cases where this can be useful or even desired (e.g. you want a scientific simulation to be repeatable, or you’re trying to debug why your random dungeon generator crashes).

But often, this is not what is desired. If you’re writing a game of hi-lo (where the user has 10 tries to guess a number, and the computer tells them whether their guess is too high or too low), you don’t want the program picking the same numbers each time. So let’s take a deeper look at why this is happening, and how we can fix it.

Remember that each number in a PRNG sequence is generated from the previous number, in a deterministic way. Thus, given any starting seed number, PRNGs will always generate the same sequence of numbers from that seed as a result! We are getting the same sequence because our starting seed number is always 5323.

In order to make our entire sequence randomized, we need some way to pick a seed that’s not a fixed number. The first answer that probably comes to mind is that we need a random number! That’s a good thought, but if we need a random number to generate random numbers, then we’re in a catch-22. It turns out, we really don’t need our seed to be a random number -- we just need to pick something that changes each time the program is run. Then we can use our PRNG to generate a unique sequence of pseudo-random numbers from that seed.

The commonly accepted method for doing this is to enlist the system clock. Each time the user runs the program, the time will be different. If we use this time value as our seed, then our program will generate a different sequence of numbers each time it is run!

C comes with a function called time() that returns the number of seconds since midnight on Jan 1, 1970. To use it, we merely need to include the ctime header, and then initialize srand() with a call to time(0).

Here’s the same program as above, using a call to time() as the seed:

Now our program will generate a different sequence of random numbers every time! Run it a couple of times and see for yourself.

Generating random numbers between two arbitrary values

Generally, we do not want random numbers between 0 and RAND_MAX -- we want numbers between two other values, which we’ll call min and max. For example, if we’re trying to simulate the user rolling a die, we want random numbers between 1 and 6 (pedantic grammar note: yes, die is the singular of dice).

Here’s a short function that converts the result of rand() into the range we want:

To simulate the roll of a die, we’d call getRandomNumber(1, 6). To pick a randomized digit, we’d call getRandomNumber(0, 9).

Optional reading: How does the previous function work?

The getRandomNumber() function may seem a little complicated, but it’s not too bad.

Let’s revisit our goal. The function rand() returns a number between 0 and RAND_MAX (inclusive). We want to somehow transform the result of rand() into a number between min and max (inclusive). This means that when we do our transformation, 0 should become min, and RAND_MAX should become max, with a uniform distribution of numbers in between.

We do that in five parts:

  1. We multiply our result from rand() by fraction. This converts the result of rand() to a floating point number between 0 (inclusive), and 1 (exclusive).

    If rand() returns a 0, then 0 * fraction is still 0. If rand() return RAND_MAX, then RAND_MAX * fraction is RAND_MAX / (RAND_MAX + 1), which is slightly less than 1. Any other number returned by rand() will be evenly distributed between these two points.

  2. Next, we need to know how many numbers we can possibly return. In other words, how many numbers are between min (inclusive) and max (inclusive)?

    This is simply (max - min + 1). For example, if max = 8 and min = 5, (max - min + 1) = (8 - 5 + 1) = 4. There are 4 numbers between 5 and 8 (that is, 5, 6, 7, and 8).

  3. We multiply the prior two results together. If we had a floating point number between 0 (inclusive) and 1 (exclusive), and then we multiply by (min - max + 1), we now have a floating point number between 0 (inclusive) and (max - min + 1) (exclusive).
  4. We cast the previous result to an integer. This removes any fractional component, leaving us with an integer result between 0 (inclusive) and (max - min) (inclusive).
  5. Finally, we add min, which shifts our result to an integer between min (inclusive) and max (inclusive).

Optional reading: Why don’t we use the modulus operator (%) in the previous function?

One of the most common questions readers have submitted is why we use division in the above function instead of modulus (%). The short answer is that modulus tends to be biased in favor of low numbers.

Let’s consider what would happen if the above function looked like this instead:

Seems similar, right? Let’s explore where this goes wrong. To simplify the example, let’s say that rand() always returns a random number between 0 and 9 (inclusive). For our sample case, we’ll pick min = 0, and max = 6. Thus, max - min + 1 is 7.

Now let’s calculate all possible outcomes:

0 + (0 % 7) = 0
0 + (1 % 7) = 1
0 + (2 % 7) = 2
0 + (3 % 7) = 3
0 + (4 % 7) = 4
0 + (5 % 7) = 5
0 + (6 % 7) = 6

0 + (7 % 7) = 0
0 + (8 % 7) = 1
0 + (9 % 7) = 2

Look at the distribution of results. The results 0 through 2 come up twice, whereas 3 through 6 come up only once. This method has a clear bias towards low results. By extension, most cases involving this algorithm will behave similarly.

Now lets take a look at the result of the getRandomNumber() function above, using the same parameters as above (rand() returns a number between 0 and 9 (inclusive), min = 0 and max = 6). In this case, fraction = 1 / (9 + 1) = 0.1. max - min + 1 is still 7.

Calculating all possible outcomes:

0 + static_cast(7 * (0 * 0.1))) = 0 + static_cast(0) = 0
0 + static_cast(7 * (1 * 0.1))) = 0 + static_cast(0.7) = 0
0 + static_cast(7 * (2 * 0.1))) = 0 + static_cast(1.4) = 1
0 + static_cast(7 * (3 * 0.1))) = 0 + static_cast(2.1) = 2
0 + static_cast(7 * (4 * 0.1))) = 0 + static_cast(2.8) = 2
0 + static_cast(7 * (5 * 0.1))) = 0 + static_cast(3.5) = 3
0 + static_cast(7 * (6 * 0.1))) = 0 + static_cast(4.2) = 4
0 + static_cast(7 * (7 * 0.1))) = 0 + static_cast(4.9) = 4
0 + static_cast(7 * (8 * 0.1))) = 0 + static_cast(5.6) = 5
0 + static_cast(7 * (9 * 0.1))) = 0 + static_cast(6.3) = 6

The bias here is still slightly towards lower numbers (0, 2, and 4 appear twice, whereas 1, 3, 5, and 6 appear once), but it’s much more uniformly distributed.

Even though getRandomNumber() is a little more complicated to understand than the modulus alternative, we advocate for the division method because it produces a less biased result.

What is a good PRNG?

As I mentioned above, the PRNG we wrote isn’t a very good one. This section will discuss the reasons why. It is optional reading because it’s not strictly related to C or C++, but if you like programming you will probably find it interesting anyway.

In order to be a good PRNG, the PRNG needs to exhibit a number of properties:

First, the PRNG should generate each number with approximately the same probability. This is called distribution uniformity. If some numbers are generated more often than others, the result of the program that uses the PRNG will be biased!

For example, let’s say you’re trying to write a random item generator for a game. You’ll pick a random number between 1 and 10, and if the result is a 10, the monster will drop a powerful item instead of a common one. You would expect a 1 in 10 chance of this happening. But if the underlying PRNG is not uniform, and generates a lot more 10s than it should, your players will end up getting more rare items than you’d intended, possibly trivializing the difficulty of your game.

Generating PRNGs that produce uniform results is difficult, and it’s one of the main reasons the PRNG we wrote at the top of this lesson isn’t a very good PRNG.

Second, the method by which the next number in the sequence is generated shouldn’t be obvious or predictable. For example, consider the following PRNG algorithm: num = num + 1. This PRNG is perfectly uniform, but it’s not very useful as a sequence of random numbers!

Third, the PRNG should have a good dimensional distribution of numbers. This means it should return low numbers, middle numbers, and high numbers seemingly at random. A PRNG that returned all low numbers, then all high numbers may be uniform and non-predictable, but it’s still going to lead to biased results, particularly if the number of random numbers you actually use is small.

Fourth, all PRNGs are periodic, which means that at some point the sequence of numbers generated will eventually begin to repeat itself. As mentioned before, PRNGs are deterministic, and given an input number, a PRNG will produce the same output number every time. Consider what happens when a PRNG generates a number it has previously generated. From that point forward, it will begin to duplicate the sequence between the first occurrence of that number and the next occurrence of that number over and over. The length of this sequence is known as the period.

For example, here are the first 100 numbers generated from a PRNG with poor periodicity:

112	9	130	97	64	
31	152	119	86	53	
20	141	108	75	42	
9	130	97	64	31	
152	119	86	53	20	
141	108	75	42	9	
130	97	64	31	152	
119	86	53	20	141	
108	75	42	9	130	
97	64	31	152	119	
86	53	20	141	108	
75	42	9	130	97	
64	31	152	119	86	
53	20	141	108	75	
42	9	130	97	64	
31	152	119	86	53	
20	141	108	75	42	
9	130	97	64	31	
152	119	86	53	20	
141	108	75	42	9

You will note that it generated 9 as the second number, and 9 again as the 16th number. The PRNG gets stuck generating the sequence in-between these two 9’s repeatedly: 9-130-97-64-31-152-119-86-53-20-141-108-75-42-(repeat).

A good PRNG should have a long period for all seed numbers. Designing an algorithm that meets this property can be extremely difficult -- most PRNGs will have long periods for some seeds and short periods for others. If the user happens to pick a seed that has a short period, then the PRNG won’t be doing a good job.

Despite the difficulty in designing algorithms that meet all of these criteria, a lot of research has been done in this area because of its importance to scientific computing.

rand() is a mediocre PRNG

The algorithm used to implement rand() can vary from compiler to compiler, leading to results that may not be consistent across compilers. Most implementations of rand() use a method called a Linear Congruential Generator (LCG). If you have a look at the first example in this lesson, you’ll note that it’s actually a LCG, though one with intentionally picked poor constants. LCGs tend to have shortcomings that make them not good choices for most kinds of problems.

One of the main shortcomings of rand() is that RAND_MAX is usually set to 32767 (essentially 15-bits). This means if you want to generate numbers over a larger range (e.g. 32-bit integers), rand() is not suitable. Also, rand() isn’t good if you want to generate random floating point numbers (e.g. between 0.0 and 1.0), which is often useful when doing statistical modelling. Finally, rand() tends to have a relatively short period compared to other algorithms.

That said, rand() is perfectly suitable for learning how to program, and for programs in which a high-quality PRNG is not a necessity.

For applications where a high-quality PRNG is useful, I would recommend Mersenne Twister (or one of its variants), which produces great results and is relatively easy to use.

A note for Visual Studio users (and possibly others)

The implementation of rand() in Visual Studio has a flaw -- the first random number generated doesn’t change much for similar seed values. This means that when using time() to seed your random number generator, the first result from rand() won’t change much in successive runs. This problem is compounded by calling getRandomNumber(), which compresses similar inputs into the same output number.

However, there’s an easy fix: call rand() once and discard the result. Then you can use rand() as normal in your program.

Debugging programs that use random numbers

Programs that use random numbers can be difficult to debug because the program may exhibit different behaviors each time it is run. Sometimes it may work, and sometimes it may not. When debugging, it’s helpful to ensure your program executes the same (incorrect) way each time. That way, you can run the program as many times as needed to isolate where the error is.

For this reason, when debugging, it’s a useful technique to set the random seed (via srand) to a specific value (e.g. 0) that causes the erroneous behavior to occur. This will ensure your program generates the same results each time, making debugging easier. Once you’ve found the error, you can seed using the system clock again to start generating randomized results again.

Random numbers in C++11

C++11 added a ton of random number generation functionality to the C++ standard library, including the Mersenne Twister algorithm, as well as generators for different kinds of random distributions (uniform, normal, Poisson, etc…). This is accessed via the <random> header.

Here’s a short example showing how to generate random numbers in C++11 using Mersenne Twister (h/t to user Fernando):

You’ll note that Mersenne Twister generates random 32-bit unsigned integers (not 15-bit integers like rand()), giving a lot more range. There’s also a version (std::mt19937_64) for generating 64-bit unsigned integers!

There’s so much functionality in <random> that it really warrants its own section. We’ll look to cover that in a future lesson in more detail.

5.10 -- std::cin, extraction, and dealing with invalid text input
5.8 -- Break and continue

257 comments to 5.9 — Random number generation

  • Dot

    Hey Alex. I noticed when doing the chapter 5 quiz in CodeBlocks that the same problem as in Visual Studio occurs, where the first number generated doesn't change much over consecutive runs. It wasn't until I discarded the first call of rand() in my program that the first random number would be different to the previous run's. Might be worth mentioning that in the section mentioning it (after validating it of course, I'm not perfect and just a clueless beginner), seeing as it's one of the IDEs you recommend in the beginning of the tutorial.


  • Milad

    Hey Alex,
    I bumped into this searching for a program I was doing"

    it actually produced a very high quality random number set but I have no clue about what this is?
    can you elaborate and maybe introduce some resources on how to learn more?
    thank you

  • Peter Baum

    A few comments/questions/ideas

    • First program line 14: unnecessary parentheses.

    • Perhaps a word about the best choices for seeds and other constants.  Here we make the constants primes:

    • You may wish to consider the use of the phrase “uniform distribution” rather than “even distribution.”

    • Section Optional reading: Why don’t we use the modulus operator (%) in the previous function?
    If you want a uniform distribution using the modulus function, then one way to get it would be to set RAND_MAX to one less than a multiple of 7, such as 13 (assuming that doing so is possible).  In your example, you would then get the perfectly uniform distribution you desire:
    0 + (0 % 7) = 0
    0 + (1 % 7) = 1
    0 + (2 % 7) = 2
    0 + (3 % 7) = 3
    0 + (4 % 7) = 4
    0 + (5 % 7) = 5
    0 + (6 % 7) = 6
    0 + (7 % 7) = 0
    0 + (8 % 7) = 1
    0 + (9 % 7) = 2
    0 + (10%7) = 3
    0 + (11%7) = 4
    0 + (12%7) = 5
    0 + (13%7) = 6

    Of course, if you are creating your own PRNG, it will be possible to set the equivalent of RAND_MAX.    

    Suppose, instead, you wanted to use the built in rand() function instead of creating your own.  Since RAND_MAX is typically 32,767 = 7*31*151 we would get exactly 4,681 values for each of the possible returned values modulo 7 except for one more zero because both 0 and 32,767 modulo 7 is zero.  By eliminating any return of zero before the modulus operation, we can make the distribution perfectly uniform and we don’t need any floating point operations at all.

    • Visual Studio rand() implementation bug.  Could you explain the bug details?  Has it still not been fixed?

    • Debugging PRNG idea: In some cases you might want to consider using something even simpler for debugging, such as an incrementing integer.

    • Alex

      Thanks for all the great feedback. The lesson has been tweaked per your comments.

      The rand() bug is still present in visual studio as of Visual Studio 2017. srand() is used to seed the random generator -- however, when using the system clock to seed (which is by far the most common way to seed), the first call to rand() produces a similar (often identical value). It's not until the second call to rand() that these diverge. In my opinion, that's a bug.

  • Peter Baum

    Regarding the statement in the first program that reads, "Due to our use of large constants and overflow, it would be very hard for someone to predict what the next number is going to be from the previous one."  To clarify, I suggest adding the following phrase to this sentence: "without knowing what those constants are."

  • Fernando

    Hello guys!
    First of all, thank you so much for such a good tutorial. The quizzes and code questions are challenging and nice to work on them. Thank you also for promptly answering our questions and helping us with all these stuffs.
    Now, may I ask you to compare the solution above(mersenne) with this one:

    Don't get me wrong, I'm asking you this because it was taught to me without major explanations. Besides, I compared the results generated by both solutions and, at least to me, they were pretty similar, and I think this code is simpler, but I'm not sure it's more efficient and what's the difference between them.

    • Alex

      Your solution is strictly better, as it uses more functionality from the standard library instead of a user-created function. I've updated the example in the lesson accordingly, and have given you a hat tip. Thanks!

  • Qing Lu

    In the getRandomNumber() function, I'm confused with why you are doing a manual type casting to the constant RAND_MAX at the following statement.

    static const double fraction = 1.0 / (static_cast<double>(RAND_MAX) + 1.0);

    I do get warnings from the complier without it, but I have no idea why.

    • nascardriver

      Hi Qing!

      I don't know why Alex added a cast here, RAND_MAX + 1.0 will evaluate to a double anyway.
      My compiler is not producing any warnings with this code

      Which compiler are you using and what's the output?

  • karston

    what does these 2 lines does?

    • Alex

      Creates a mersenne twister random number generator, seeded using a std::random_device object (you can use time(0) instead if you want to seed based on the system clock).

  • Denis Delinger

    Hey Alex,

    I wanted to think of my own way to limit generated numbers (I use the marsenne twister) to a range between two numbers (e.g. 0 to 100)

    I thought of two things:

    1) return min + (mersenne() % (max - min)); //Here I limit the max number with the % operator.
    2) while(number > max) {number /= 10;} //Here, in case the number is bigger than max, divide it by 10 until it's not.

    Also a quick note: I noticed that you already responded to some people about the remainder operator, so no need to explain what the problem with method 1 is. In conclusion:

    1)Problem with method 1 is less evenly distributed numbers across the range.
    2)Method 2 problem is extra iterations (So with a big number the loop would possibly have to run like 5,6 times to get the number, which is less efficient.)

    My questions are:

    1)Is method 1 OK, even though it will not distribute numbers as well?
    2)Is method 2 OK, or is it too inefficient?
    3)Or are both not OK and I should use the exact method you use in this section with the fraction?

    Final note: I know there are some other flaws, but I am currently just concerned with those 3 questions.

    Thanks for your time.

    • Alex

      My opinions:
      #1 is pretty bad as a general solution. In extreme cases, low numbers may appear twice as often as high numbers. That's a pretty severe skewing. So I'd say no, this isn't okay.
      #2 is also pretty bad as a general solution. If you wanted to generate the results of a coin flip (e.g. a random number that is either 0 or 1) this could take thousands of iterations before you randomly landed on a 0 or a 1. So I'd say no, this isn't okay either.

      I think it's great that you're thinking about these on your own and understanding the limitations. But why would you want to use a suboptimal solution when a better solution is being handed to you?

  • Anderson

    Hi, I'm confused, why are you using this "static const double fraction" formula in all advanced methods instead of the modulus operator like in the first function? Is this used to create a perfect distribution uniformity?

    • Anderson

      Could someone explain me this function?

      • Alex

        I updated the lesson to talk about how the non-mersenne version of this function works. The mersenne version works the same way, just using the mersenne generator instead of rand().

  • Landulf

    If you are working with a uniform distribution can you use % instead to get a random number in the desired range like randomnumber = min + ( rand() % max-min+1) ?

  • Victoria Nimitz

    Dear Alex,
    Thanks for the note on the Visual Studio fault with using rand!  I kept getting the same three "random" numbers no matter when I ran the program. Now my program works like a charm!

    Since I was doing my program for college class, I need to cite all sources of any help I obtained on the web. Your article only says "Alex" for the author.  Do you have a link that has the information I would need for an APA citation on this website?  I want to give credit where credit is due, especially for such a nice article that was easy for a newbie to understand.

    Thanks again for your help!

    • Alex

      Sorry for the slow response on this. Try with "Alex" as the last name. The rest of the information (title, publish date, etc...) you can glean from the website.

      Lack of a full name shouldn't matter as long as it's clear what site it's from. After all, Voltaire didn't use a last name either.

  • Jamie

    I am wondering how I can adjust the getRandomNumber to not use rand() and still get a value between a given range, and not outside of it.

    I am working on an assignment, and need to do a priority scheduler without the rand() or system time. I am wondering is there a way where I can generate between to given numbers 0-9?

    • Alex

      You could write your own PRNG like in the first example. It probably wouldn't be very good, but it would be better than nothing. Alternatively, if your result doesn't need to be random, you could use a static variable to create just increment through the numbers 0 through 9.

      Outside of that, I'm not really sure.

Leave a Comment

Put all code inside code tags: [code]your code here[/code]