Search

2.4a — Fixed-width integers and the unsigned controversy

In the previous lesson 2.4 -- Integers you learned that C++ only guarantees that integer variables will have a minimum size -- but they could be larger, depending on the target system.

Why isn’t the size of the integer variables fixed?

The short, non-technical answer is that this goes back to C, when performance was of utmost concern. C opted to intentionally leave the size of an integer open so that the compiler implementers could pick a size for int that performs best on the target computer architecture.

Doesn’t this suck?

Yes! As a programmer, it’s a little ridiculous to have to deal with variables whose size could vary depending on the target architecture.

Fixed-width integers

To help with cross-platform portability, C99 defined a set of fixed-width integers (in the stdint.h header) that are guaranteed to have the same size on any architecture.

These are defined as follows:

Name Type Range Notes
int8_t 1 byte signed -128 to 127 Treated like a signed char on many systems. See note below.
uint8_t 1 byte unsigned 0 to 255 Treated like an unsigned char on many systems. See note below.
int16_t 2 byte signed -32,768 to 32,767
uint16_t 2 byte unsigned 0 to 65,535
int32_t 4 byte signed -2,147,483,648 to 2,147,483,647
uint32_t 4 byte unsigned 0 to 4,294,967,295
int64_t 8 byte signed -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
uint64_t 8 byte unsigned 0 to 18,446,744,073,709,551,615

C++ officially adopted these fixed-width integers as part of C++11. They can be accessed by including the cstdint header, where they are defined inside the std namespace. Here’s an example:

Even though these weren’t adopted in C++ until C++11, because they were part of the C99 standard, some older C++ compilers offer access to these types, typically by including stdint.h. Visual Studio 2005 and 2008 do not include stdint.h, but 2010 does.

If you are using the boost library, boost provides these as part of <boost/cstdint.hpp>.

If your compiler does not include cstdint or stdint.h, the good news is that you can download Paul Hsieh’s pstdint.h cross-platform compatible version of the stdint.h header. Simply include the pstdint.h file in your project and it will define the fixed width integer types with the proper sizes for your platform.

Warning: int8_t and uint8_t may or may not behave like chars

Due to an oversight in the C++ specification, most compilers define and treat int8_t and uint8_t identically to types signed char and unsigned char respectively, but this is not required. Consequently, std::cin and std::cout may work differently than you’re expecting. Here’s a sample program showing this:

On most systems, this program will print ‘A’ (treating myint as a char). However, on some systems, this may print 65 as expected.

For simplicity, it’s best to avoid int8_t and uint8_t altogether (use int16_t or uint16_t instead). However, if you do use int8_t or uint8_t, you should be careful of any function that would interpret int8_t or uint8_t as a char instead of an integer (this includes std::cout and std:cin).

Hopefully this will be clarified by a future draft of C++.

Rule: Avoid int8_t and uint8_t. If you do use them, note that they are often treated like chars.

The downsides of fixed-width integers

Fixed-width integers may not be supported on architectures where those types can’t be represented. They may also be less performant than the built-in types on some architectures.

Fast and least

To help address the above downsides, C++11 also defines two alternative sets of fixed-width integers.

The fast type (int_fast#_t) gives you an integer that’s the fastest type with a width of at least # bits (where # = 8, 16, 32, or 64). For example, int_fast32_t will give you the fastest integer type that’s at least 32 bits.

The least type (int_least#_t) gives you the smallest integer type with a width of at least # bits (where # = 8, 16, 32, or 64). For example, int_least32_t will give you the smallest integer type that’s at least 32 bits.

Integer best practices

Now that fixed-width integers have been added to C++, the best practice for integers in C++ is as follows:

  • int should be preferred when the size of the integer doesn’t matter. For example, if you’re asking the user to enter their age, or counting from 1 to 10, it doesn’t matter whether int is 16 or 32 bits (the numbers will fit either way). This will cover the vast majority of the cases you’re likely to run across.
  • If you need a variable guaranteed to be a particular size and want to favor performance, use int_fast#_t.
  • If you need a variable guaranteed to be a particular size and want to favor memory conservation over performance, use int_least#_t. This is used most often when allocating lots of variables.
  • Only use unsigned types if you have a compelling reason.

Some compilers define their own version of fixed width integers -- for example, Visual Studio defines __int8, __int16, etc… You should avoid these like the plague.

The controversy over unsigned numbers

Many developers (and some large development houses, such as Google) believe that developers should generally avoid unsigned integers. This is largely because unexpected behavior can result when you mix signed and unsigned integers.

Consider the following snippet:

What happens in this case? -1 gets converted to some large number (probably 4294967295), and your program goes ballistic. But even worse, there’s no good way to guard against this condition from happening. C++ will freely convert between signed and unsigned numbers, but it won’t do any range checking to make sure you don’t overflow your type.

Many modern programming languages (such as Java and C#) either don’t include unsigned types, or limit their use. Bjarne Stroustrup, the designer of C++, said, “Using an unsigned instead of an int to gain one more bit to represent positive integers is almost never a good idea”.

This doesn’t mean you have to avoid unsigned types altogether -- but if you do use them, use them only where they really make sense, and take care not to mix signed and unsigned numbers.

Alex’s note: Most of this tutorial was written prior to these fixed-width integers being adopted into C++11, so we may use int in some examples where a fixed-width integer would be more appropriate.

2.5 -- Floating point numbers
Index
2.4 -- Integers

90 comments to 2.4a — Fixed-width integers and the unsigned controversy

  • Curiosity

    Alex?
    How do I store very large values such as 2^100 or 100! in a variable ?
    Can we do it through classes or structs? If yes, How?
    You just tell me the steps or you know, the algorithm to do it, I will practically ( in C++ ) do it myself !
    Thanks in Advance 🙂

    • Alex

      There are two ways:
      1) Use a large floating point number (just be wary about precision issues).
      2) Use a custom class that has big number support (use google to find one, there are plenty out there).

      • Curiosity

        Man…? I wanted to create a class for that purpose on my own ! Can you help me in that ?

        • Alex

          Here’s a starter approach. C++ gives you 32 bit variables, that can hold values between 0 and 2^x (where x is the number of bits). If you want larger values, you’ll have to find a way to decompose those larger values into multiple smaller values (for storage and manipulation). One way to do it would be to use 2 32-bit variables, and combine them to act like a 64-bit variable. You’ll need to provide overloaded operators for input, output, and arithmetic.

  • My dear teacher,
    Please let me this question:
    In name of fixed-width integer, what does last letter (t) mean?
    With regards and friendship

  • Toussaint

    I feel the advice that we should use fixed width integers whenever possible is exactly the opposite of what should be done.

    First, what type of fixed width integers should be used? The exact types (**intN_t**), the fast types (**int_fastN_t**) or the least types (**int_leastN_t**)?

    If you want exact representation (meaning the integer to hold exactly N bits) then use **intN_t**. This is important say in network applications or character encoding and so on. Are there many people writing bit sensitive applications? I suspect not.

    Now, remember that the compiler will add padding to an integer so it fits into memory in order to facilitate fast access by the processor (if needed). There are still architectures out there that are not multiples of 8 or whose natural word length is greater than 8. So sticking to exact fixed width integers without cause will result in a performance penalty because the processor needs to access the exact number of bits you want instead of the number of bits that it is natural for it. Heck, the compiler might even refuse to compile your code for that architecture.
    There are, for instance, machines that were based on 9 bits.

    Also, we have modern machines that are 16 bits instead 8 bits as natural word length. So **int8_t** will not work on them. If you use **int** though, the compiler will add padding so you get 16 bits.

    We have FPGAs that can be configured to have natural word length of 36 bits. So for **intN_t** where N < 36 will simply never work.

    C and C++ prize portability and speed over space usage so please use **int** unless you have a really good reason not to. You will know when you need to if for instance you need to worry about the endianness of the data you are working with.

    Also, neither C nor C++ will do bound checking for fixed width integers even if you use fixed width integers. So you need to be sure in advance that you have the exact size for your data.

    If you badly need to use fixed width integers, first consider using **int_fastN_t** because you are guaranteed that the compiler will try to fit your data into N bits and if it can’t, it will adapt to whatever the machine can handle AND you still keep some performance.
    Then if **int_fastN_t** doesn’t cut it, consider using **int_leastN_t**. It is the same as **int_fastN_t** but there is no performance guarantee.
    Consider **intN_t** last.

    And last, there is no guarantee that those **intN_t** and friends will even be present on with your compiler depending on the target architecture. They are optional as far the compiler is concerned.

    In short, prefer **int** until it is burden.

    • Alex

      This is a great comment, and I’ve updated the article recommendations accordingly. Thank you for sharing this perspective.

      • Toussaint

        I thought I’d drop in and hopefully add another useful remark for my fellow C++ learners.
        I wasn’t sure if this specific chapter was the right place or the one on loops so I will just do it here since we are talking about integers anyway.

        I would like to talk about integers and looping, specifically the **size_t** type.
        This type is talked about in the chapter on strings, though as a note.
        When we start learning C/C++, we generally use **int** for the loop counter.
        But eventually we realize that we rarely use negative numbers while looping but sometimes we need even greater numbers while looping, so we start using **unsigned int** more and more.
        And as we progress in our journey, we start using the standard library (aka STL) more frequently.
        And for that, we need to loop over strings, containers or other objects which can grow to enormous sizes.

        And there comes **size_t**.

        As mentioned in the chapter on strings, **size_t** is implementation defined.
        And unlike fixed-width integers (which are also implementation defined), this is a good thing (good in the sense we don’t have to overthink the issue).
        You see, **int** for example is likely to be limited to a width of 32 bits even on 64 bits architectures.
        An integer with a width of 64 bits will go from **−(2^63)** to **2^63 − 1** while a 32 bits integer will go from **-(2^31)** to ** 2^31 - 1**. So a 64 bits integer is bigger than a 32 bits integer.

        So what happens when you need even greater numbers offered by 64 bits or 128 bits architectures?
        **int** loses its luster!
        Enter **size_t**.

        **size_t** will always hold the greatest unsigned integer that your architecture supports.
        It will hold the size of any object.
        In fact, as of C++14, an object whose size cannot fit into **size_** is ill-formed.

        So my fellow learners, when you do not know in advance the size of your objects ([dynamic] arrays, strings, containers, etc) and you need to say loop over them or store their size, use **size_t**, never **unsigned int**, worse **int**.

        But if you are sure that it will fit into an **unsigned int**, go on ahead. The type must convey the nature of the data anyways.

        And if you really need to loop bellow zero, then by all means **int** is your friends.

        Cheers!

  • prince

    i have not included stdint.h header file in my program
    still int8_t data type worked how? i only included <iostream>i am using codeblocks.

    • Alex

      Your iostream may be including stdint.h itself. You should not rely on this, even if it works. Always include all of the headers you need.

  • Jeremy

    This section reminded me of something.

    1) Visual Studio 2017, and Code::Blocks 16.01 don’t seem to require the <cstdint> header to use fixed-width integers. Either these have become so popular that they’ve become part of the C++ Standard (and I looked through the latest draft, and can’t find this mentioned anywhere), or most compilers have decided to implement these as "built-in" types. If that’s the case, I wonder why the header still exists in those implementations?

    2) A sort of related thing, seeing as how uint8_t and int8_t are usually treated as chars. std::cout prints out char16_t and char32_t as numbers only, and not unicode characters. What’s the deal with that, anyway?

    • Alex

      1) I’m not seeing the same thing. I tried the following very minimal program on Visual Studio 2017:

      and it didn’t compile. I had to #include cstdint. However, it also compiles if I just #include iostream, so it looks like VS2017’s IOstream library is including cstdint. So you’re getting those definitions as a rider when you include iostream.

      2) Beats me. I’ve always considered it a serious flaw that uint8_t and int8_t print as chars. The C++ committee should have mandated that those be treated as separate types with their own numeric handling, not aliases for chars.

      • Jeremy

        That is very strange. I had not thought of removing the include for the iostream header, but it is as you say. Code::Blocks also does it.

        From what I’ve read about uint8_t and int8_t, a lot of compilers took the lazy route and chose to typedef uint8_t as unsigned char, and typedef int8_t as char. We already have char and unsigned char at our disposal, so I guess as long as it’s not specified in the standard, they will take the path of least resistance, lol.

        And having char16_t and char32_t not actually print chars to the screen is also boggling my mind. Maybe it depends on the operating system. I am stuck with Windows, but I wonder if Linux or Mac act differently.

  • Lakshmanan Kanthi

    How does C++ know what int, char and these fundamental data types are? Is it defined in the C++ standard library?

  • Sam

    Hi Alex, if integers were made to change size to fit the needs of the developer, then how does overflow occur? For example, if "int" can range from 2 bytes to ~, and I enter 40,000 (assume it’s signed). What stops the integer from simply changing its size to account for the increase. Unless if by size you mean storage space?

    Also, thanks a lot for creating and maintaining this website!

    • Alex

      Integers weren’t made to change size to fit the needs of the developer -- they were made to change size to fit the needs of various architectures. So on a given architecture, an integer will always be a fixed size (whatever size is most appropriate for that architecture).

      And by bytes, I do mean storage space (in memory) since with integers the range is directly correlated to the number of bits allocated.

  • sam

    i have an out of topic question…
    when i get bored i try to make some random programs for fun
    so i made this one

    the thing about this program that it will output numbers from 1 to the selected number.
    when i input any big number  (e.g :1500) the output doesn’t start from 1 as expected.
    when i input 1500 : the counting starts from 11
    when i input 2000 : int counting starts from 500something
    can you briefly explain to me why is this happening?
    my assumption is that the [std::cout] has a limited output value and its a little bit over 1500 but I’m not sure…

    • John

      I noticed that the "output window" in my IDE only writes out ~200 lines, e.g. if the output is 220 lines long, then only line 20 to 200 will be shown. Your example works as expected when the data is written to a text file.

    • Alex

      x is an int, and should be able to count up to at least 32,767 (but more likely in the 4 millions). I suspect what’s happening here is that you are outputting enough lines that the earlier lines are simply scrolling off the console. Your console may allow you to scroll up to see them.

Leave a Comment

Put C++ code inside [code][/code] tags to use the syntax highlighter