Search

4.5 — Unsigned integers, and why to avoid them

Unsigned integers

In the previous lesson (4.4 -- Signed integers), we covered signed integers, which are a set of types that can hold positive and negative whole numbers, including 0.

C++ also supports unsigned integers. Unsigned integers are integers that can only hold non-negative whole numbers.

Defining unsigned integers

To define an unsigned integer, we use the unsigned keyword. By convention, this is placed before the type:

Unsigned integer range

A 1-byte unsigned integer has a range of 0 to 255. Compare this to the 1-byte signed integer range of -128 to 127. Both can store 256 different values, but signed integers use half of their range for negative numbers, whereas unsigned integers can store positive numbers that are twice as large.

Here’s a table showing the range for unsigned integers:

Size/Type Range
1 byte unsigned 0 to 255
2 byte unsigned 0 to 65,535
4 byte unsigned 0 to 4,294,967,295
8 byte unsigned 0 to 18,446,744,073,709,551,615

An n-bit unsigned variable has a range of 0 to (2n)-1.

When no negative numbers are required, unsigned integers are well-suited for networking and systems with little memory, because unsigned integers can store more positive numbers without taking up extra memory.

Remembering the terms signed and unsigned

New programmers sometimes get signed and unsigned mixed up. The following is a simple way to remember the difference: in order to differentiate negative numbers from positive ones, we use a negative sign. If a sign is not provided, we assume a number is positive. Consequently, an integer with a sign (a signed integer) can tell the difference between positive and negative. An integer without a sign (an unsigned integer) assumes all values are positive.

Unsigned integer overflow

What happens if we try to store the number 280 (which requires 9 bits to represent) in a 1-byte (8-bit) unsigned integer? The answer is overflow.

Author's note

Oddly, the C++ standard explicitly says “a computation involving unsigned operands can never overflow”. This is contrary to general programming consensus that integer overflow encompasses both signed and unsigned use cases. Given that most programmers would consider this overflow, we’ll call this overflow despite C++’s statements to the contrary.

If an unsigned value is out of range, it is divided by one greater than the largest number of the type, and only the remainder kept.

The number 280 is too big to fit in our 1-byte range of 0 to 255. 1 greater than the largest number of the type is 256. Therefore, we divide 280 by 256, getting 1 remainder 24. The remainder of 24 is what is stored.

Here’s another way to think about the same thing. Any number bigger than the largest number representable by the type simply “wraps around” (sometimes called “modulo wrapping”, or more obscurely, “saturation”). 255 is in range of a 1-byte integer, so 255 is fine. 256, however, is outside the range, so it wraps around to the value 0. 257 wraps around to the value 1. 280 wraps around to the value 24.

Let’s take a look at this using 2-byte integers:

What do you think the result of this program will be?

x was: 65535
x is now: 0
x is now: 1

It’s possible to wrap around the other direction as well. 0 is representable in a 2-byte integer, so that’s fine. -1 is not representable, so it wraps around to the top of the range, producing the value 65535. -2 wraps around to 65534. And so forth.

x was: 0
x is now: 65535
x is now: 65534

The above code triggers a warning in some compilers, because the compiler detects that the integer literal is out-of-range for the given type. If you want to compile the code anyway, temporarily disable “Treat warnings as errors”.

As an aside...

Many notable bugs in video game history happened due to wrap around behavior with unsigned integers. In the arcade game Donkey Kong, it’s not possible to go past level 22 due to an overflow bug that leaves the user with not enough bonus time to complete the level.

In the PC game Civilization, Gandhi was known for often being the first one to use nuclear weapons, which seems contrary to his expected passive nature. Players believed this was a result of Gandhi’s aggression setting was initially set at 1, but if he chose a democratic government, he’d get a -2 modifier. This would cause his aggression to overflow to 255, making him maximally aggressive! However, more recently Sid Meier (the game’s author) clarified that this wasn’t actually the case.

The controversy over unsigned numbers

Many developers (and some large development houses, such as Google) believe that developers should generally avoid unsigned integers.

This is largely because of two behaviors that can cause problems.

First, consider the subtraction of two unsigned numbers, such as 3 and 5. 3 minus 5 is -2, but -2 can’t be represented as an unsigned number.

On the author’s machine, this seemingly innocent looking program produces the result:

This occurs due to -2 wrapping around to a number close to the top of the range of a 4-byte integer. A common unwanted wrap-around happens when an unsigned integer is repeatedly decremented with the -- operator. You’ll see an example of this when loops are introduced.

Second, unexpected behavior can result when you mix signed and unsigned integers. In the above example, even if one of the operands (x or y) is signed, the other operand (the unsigned one) will cause the signed one to be promoted to an unsigned integer, and the same behavior will result!

Consider the following snippet:

The author of doSomething() was expecting someone to call this function with only positive numbers. But the caller is passing in -1. What happens in this case?

The signed argument of -1 gets implicitly converted to an unsigned parameter. -1 isn’t in the range of an unsigned number, so it wraps around to some large number (probably 4294967295). Then your program goes ballistic. Worse, there’s no good way to guard against this condition from happening. C++ will freely convert between signed and unsigned numbers, but it won’t do any range checking to make sure you don’t overflow your type.

If you need to protect a function against negative inputs, use an assertion or exception instead. Both are covered later.

Some modern programming languages (such as Java) and frameworks (such as .NET) either don’t include unsigned types, or limit their use.

New programmers often use unsigned integers to represent non-negative data, or to take advantage of the additional range. Bjarne Stroustrup, the designer of C++, said, “Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea”.

Warning

Avoid using unsigned numbers, except in specific cases or when unavoidable.

Don’t avoid negative numbers by using unsigned types. If you need a larger range than a signed number offers, use one of the guaranteed-width integers shown in the next lesson (4.6 -- Fixed-width integers and size_t).

If you do use unsigned numbers, avoid mixing signed and unsigned numbers where possible.

So where is it reasonable to use unsigned numbers?

There are still a few cases in C++ where it’s okay (or necessary) to use unsigned numbers.

First, unsigned numbers are preferred when dealing with bit manipulation (covered in chapter O (That’s a capital ‘o’, not a ‘0’).

Second, use of unsigned numbers is still unavoidable in some cases, mainly those having to do with array indexing. We’ll talk more about this in the lessons on arrays and array indexing.

Also note that if you’re developing for an embedded system (e.g. an Arduino) or some other processor/memory limited context, use of unsigned numbers is more common and accepted (and in some cases, unavoidable) for performance reasons.


4.6 -- Fixed-width integers and size_t
Index
4.4 -- Signed integers

129 comments to 4.5 — Unsigned integers, and why to avoid them

  • everybody gangsta till gandhi decides to nuke the world

  • This helps cause the standard has gone up and it difficult to remember everything.

  • John Goodwin

    Hmm somethings wrong either in my understanding or somewhere else. I'm familiar with Overflow at the machine language level. It was clear to me there had to be another mechanism between the language and the registers so I tested it.

    Close as I can figure the sizeof() function isn't reporting actual results but rather is using and internal lookup table to show what the size of a type "should be" because clearly I've blown the limit wide open and its not reporting the new size. It also appears to be either upgrading the datatype or the whole concept of data type is a façade if it can continue to keep taking larger and larger numbers.

    In this example I started with x+5 then went to x+50, x+500 and so on looking for a sudden drop in the size of the number when the wrap occurred.. It never happened and instead the number got ridiculously large

    Outputs this in Visual Studio 2019:

    Starting size of x: 4
    5000001073741824
    Ending size of x: 4

    • John Goodwin

      Additional testing:

      When I move the calculation out of the cout line the number climbs until it reaches a point
      and then rebounds back to the initialization value. x+5000 seems to be close to the limit but it never wraps

      output:
      Starting size of x: 4
      1078741824           //this number will climb with x+5000. At x+50000 it bounced back to what you see
      Ending size of x: 4

    • John Goodwin

      I suspected the cout in my last test was creating an in-memory computation that was avoiding the variable size limit so I tried adding the value explicitly to the variable itself, again starting at x=x+50 and working my way up waiting for a wrap.. it never happened??? But the sizeof() remained 4???

      output:

      Starting size of x: 4
      1123741824
      Ending size of x: 4

    • John Goodwin

    • John

    • John Goodwin

      Sorry all..
      Even as I write this, I cannot see any of the replies to my own post. I see only the single post.
      Refreshed and even closed the page and reopened in another tab.. no difference.

    • Egor

      1) Overflow doesn't change data type, so if you had int (32 bits), then it's size can't be changed while program is running. That's why sizeof(x) always returned 4.
      2) In std::cout << x + 50000000 << std::endl; "x + 50000000" is expression, expressions are computed in maximum available data type (for you it was 64 bits (8 bytes), I suppose). There are ways to force result of expression to specific data type (discussed in future lessons).

      As a side note:
      You couldn't compile "signed int x{ 2147483648 };" 'cause 2147483648 is higher than maximum of signed int (which is 2147483647, yes 1 less). Maximum of signed integers is 2^(bit length - 1) - 1. (And to be sure: 1123741824 (result from the second test) is less than 2147483647, so there you get overflow)

  • Miko

    Hey, I believe there's a small mistake in the following sentence:

    "It’s possible to wrap around the other direction as well. 0 is representable in a ???1-byte??? integer, so that’s fine. -1 is not representable, so it wraps around to the top of the range, producing the value 65535. -2 wraps around to 65534. And so forth."

    If we're talking about a 1-byte integer then shouldn't -1 produce the value 255? A 2-byte integer given value -1 should produce the value 65535 but perhaps I missed something. Thanks for the tutorials!

  • Burfed

    There is 1 more example of int overflow - MU Online (up to 2nd or 3rd season)
    There was a maximum stat (firstly 32768, then 65535) which , when overflowed, would result it your health points or agility being 0 or 1

  • yeokaiwei

    Dear Alex,
    What if APIs have unsigned ints? Like openGL

    E.g.

    What do we do then?

    GLuint is their parameter so it specifically asks for unsigned ints.

  • arun mathew iype

    "By definition, unsigned integers cannot overflow", this is confusing.

    The fact that unsigned integers don not overflow but wrap is due to the standard taking care of it. So what does this statement mean ?

    • Alex

      The C++ standard says unsigned integer overflow isn't overflow. I rewrote that part of the lesson to try to make it more clear that this is a weirdness of the C++ standard.

  • I feel this to be a poor advice with wrong reasoning behind.

    The most important difference between signed and unsigned types in C / C++ assuming the task doesn't require negative numbers is that the former has possible undefined behavior related to mathematics operations while the latter not.

    In the majority of use cases going beyond the range of the variable is a bug. Ideally the code is free of bugs. However when a problem arises, it can become quite significant which is happening!

    In case of unsigned types, with the observation (case of erroneous behavior) at hand, what exactly happens can be fully deducted from investigating the C/C++ code as the wrapping behavior of unsigned types is fully defined.

    In case of signed types, this is not possible due to the undefined behavior. The compiler for example could generate code behaving as if the overflow didn't happen (the ubiquitous example of "if (x < (x + 1))"), or just about anything. If you have this in embedded code, devices operating on the field which are difficult to update, you might need to provide an impact analysis of the defect, what consequences it could have, to determine whether it is necessary to provide a fix, which may be costly in restricted environments (or in an extreme situation, what if the software is running on a satellite far up in space?).

    So if you used signed numbers there, such a situation could get you all the way down digging into the binary, analyzing assembly code to figure out what even is happening due to that on C/C++ level you triggered UB.

    So in my opinion, use unsigned unless the task needs the negative numbers, and either case, be diligent with appropriate use of assertions and bounds checks. You shouldn't let the bug happen the first place, but if it does, your chances of delivering an accurate analysis in short time are much better if the area affected used unsigned arithmetic.

  • J34NP3T3R

    i needed to go back to this lesson.

    im in chapter 4 lesson 4.x and im really lost.

    there are a lot of what we "shouldn't" do but "sometimes we must do" it confused me a bit and i lost track of whether we should use unsigned if need be.

    so in the quiz about the AGE OF A PERSON my answer was unsigned short but the correct answer was to use a type INT.

    you can never have a negative AGE .. so im confused why use INT ? isn't this one of those "sometimes we must do" as in use unsigned ?

  • J34NP3T3R

    "An integer without a sign (an unsigned integer) assumes all values are positive."

    is there an integer type that assumes all values are negative ?

  • Andreas Krug

    Small suggestion: (covered in chapter O (That’s a capital ‘o’, not a ‘0’)).  instead of  (covered in chapter O (That’s a capital ‘o’, not a ‘0’). -> missing )

  • iCwhatyoudidthere

    So we should avoid unsigned integers because they have a known behavior when overflow/underflow occurs, as opposed to signed integers because they have undefined behavior when overflow/underflow occurs?  Wut?
    If the supplier of a function uses unsigned integers and specifies a precondition that only a certain range should be passed in as input, is it not on the client that calls said function to ensure that they do not violate the precondition?  Obviously, it would be bad practice to pass end user input to unsigned integers, but outside of that instance I fail to see how the problem of overflow/underflow is exclusive to unsigned integers.

  • yeokaiwei

    Your 2-byte integer example will trigger a warning on Visual Studio 2019.

    "Severity    Code    Description    Project    File    Line    Suppression State
    Error    C2220    the following warning is treated as an error    Bytes    C:\Users\user\source\repos\Bytes\Bytes\Bytes.cpp    8    
    Warning    C4305    '=': truncation from 'int' to 'unsigned short'    Bytes    C:\Users\user\source\repos\Bytes\Bytes\Bytes.cpp    8    
    Warning    C4309    '=': truncation of constant value    Bytes    C:\Users\user\source\repos\Bytes\Bytes\Bytes.cpp    8    
    Warning    C4305    '=': truncation from 'int' to 'unsigned short'    Bytes    C:\Users\user\source\repos\Bytes\Bytes\Bytes.cpp    11    
    Warning    C4309    '=': truncation of constant value    Bytes    C:\Users\user\source\repos\Bytes\Bytes\Bytes.cpp    11"

  • Evan Kilpatrick

    It is chapter letter O not chapter 0 It confused the crap out of me I went back looking like i missed something to find i hadn't gotten there yet just head up for anyone else

  • Steve

    I got a warning about signed/unsigned mismatch. The line in question was this

    where cars is a vector of pointers to Car objects. If I change i to be unsigned then the warning goes away, but I don't really understand how or why it wants to be unsigned or if I should do something different.

    • nascardriver

      `std::vector::size()` returns some unsigned integer type. `i` is a signed integer, so you get a warning about comparing a signed and an unsigned integer. If possible, use a range-based for-loop. If you do need the index, make its type `std::size_t`, which is what most containers use.

  • Manish

    Hi,thank you for great lessons. I need to ask a query on the "wrap around" thing.

    In a code if I use std::cin() to input the overflown value (like 65538) to unsigned short, it does not wrap around but takes the maximum value. On the other hand, in C, the scanf() does wrap around.

    Response: No wrap around for overflown input integer.

    In C :

    Response: Overflown input integer gets wrap around.

    I couldn't find any resource that explains it.

    • nascardriver

      I won't look into `std::scanf`, it's not as well documented as the newer functions.
      For `std::cin>>`

      https://en.cppreference.com/w/cpp/io/basic_istream/operator_gtgt
      "extracts an integer value by calling std::num_get::get()."

      https://en.cppreference.com/w/cpp/locale/num_get/get
      "The input is parsed as if by [...] std::strtoull for unsigned integer"

      https://en.cppreference.com/w/cpp/string/byte/strtoul
      "If the converted value falls out of range of corresponding return type, range error occurs and ULONG_MAX or ULLONG_MAX is returned"

      Back to https://en.cppreference.com/w/cpp/locale/num_get/get
      "If the conversion function results in a value not to fit in the type of v which is an unsigned integer type, the most positive representable value is stored in v."

      • Manish

        Thank you nascardriver for reply and detailed explanation. This pretty much explains it. I need to put more efforts to understand references.

  • faskldj

    Sorry, but your arguments are not very good.

    Most importantly, you completely ignored the main problem with signed variables: Overflow will cause UB, UB is something you may not notice till the software is used in production and everything breaks.
    You also ignored the difference between integer that are used to store numbers and other uses such as indexes, number of elements, addresses or to identify an "object" (such as file descriptors).

    "Many notable bugs in video game history happened due to wrap around behavior with unsigned integers. In the arcade game Donkey Kong, it’s not possible to go past level 22 due to a bug that leaves the user with not enough bonus time to complete the level."
    This is not an argument against using unsigned, the oposite is true. When you have the same problem with signed integers, it would cause an overflow and UB. In C++ you always have to make sure the type range covers all possible cases, this is the case for signed and unsigned variables. If you do not do that you get UB in case of a signed type or unexpected behaviour in case of a unsigned type.

    The argument with substracting 5 from 3 is not an argument against using unsigned. When you use signed you can run into a similar problem when the result underflows. This happens much less frequent and causes UB, at INT_MIN, which is the worst kind of bug. The bug is often not triggered durring testing and you will only notice it when it is too late.
    Bugs that occur around 0, for unsigned types, are much more likely to be seen during testing and they can be fixed, if they don't they do not cause UB right away.

    Trying to store -1 to a unsigned variable, which is out of the range of a unsigned variable, is also not an argument against unsigned. A worse problem happens when you want to store a value outside of the range of a signed variable into that variable, you get UB.

    "If you need to protect a function against negative inputs, use an assertion or exception instead. Both are covered later."
    Many platforms do not have a meaningfull way to deal with a failed assertion. What do you want to do on a microcontroller, with no user, no files and not even stdout?

    "Some modern programming languages (such as Java) and frameworks (such as .NET) either don’t include unsigned types, or limit their use."
    Should that be an argument for avoiding unsigned? If so, it is an appeal to popularity fallacy.

    I do not say don't use signed, they have their use cases, but in most cases unsigned is better suited for the task. You always have to think about the type you use and choose the right type. Do never just use the next best type without thinking about it.

  • Nomic

    The story of Civilization of Gandhi is interesting.
    I googled this story, then found a news said that the designer deny ,but like the rumor. :)

  • TN

    I am using my computer and sizeof(int) = 4; sizeof(long double) = 12. The values are different compare to what you have.
    Any recommendation to configurate again the system or use it as it is now?

  • Yousuf

    I am using GCC 10.2 on 64bit computer. I turned off all the compilation flags, I simply tried to compile my *.cpp file from terminal, and it shows error `error: narrowing conversion of ‘-1’ from ‘int’ to ‘short unsigned int’ [-Wnarrowing]` . Seems GCC 10.2 doesn't let me wraps around negative values into unsigned short int. Any thoughts?
    Here is my code:

    • nascardriver

      This conversion was never allowed in list initialization. That's what list initialization is there for. It only allows conversions that don't change the value. Use one of the methods shown in the lesson.

  • Alessandro

    In the aside note (which, since I'm a new programmer, I like them lot because they give me the possibility to grasp real-world issues) you write that Gandhi with the Democratic Gov gets a -2 modifer which wrapped to a 255 level of aggressity. But should't be this value 254 based on what is written in the main text?

    • nascardriver

      I don't know this video game, but I'm guessing that Gandhi isn't assigned a -2, but instead 2 is subtracted from his initial modifier of 1. Then the example makes sense, because `1 - 2 == 255`

      • NuclearGandhi

        The first Civilization game released in 1991 and it included a feature that turned a lot of heads. Mahatma Gandhi, famous in real life for leading a series of peaceful protests that eventually ended British occupation of India, would suddenly become the most aggressive leader in the game once he acquired nuclear weapons. The apparent glitch became famous throughout the internet for turning the conception of such a peaceful figure on its head, and was immortalized in future games which consistently saw Gandhi as a nuclear warmonger. Most players believed that Gandhi's murderous intent was the result of an integer overflow glitch; his base aggression score was so low that when it was lowered by the advent of democracy it wrapped around to a number far higher than any leader should be able to achieve, causing him to suddenly aspire to nuke the world.

        https://screenrant.com/civilization-nuclear-gandhi-glitch-not-bug-sid-meier/

    • david

      It has been mentioned during a CS50 class.

Leave a Comment

Put all code inside code tags: [code]your code here[/code]