Search

4.3 — Object sizes and the sizeof operator

Object sizes

As you learned in the lesson 4.1 -- Introduction to fundamental data types, memory on modern machines is typically organized into byte-sized units, with each byte of memory having a unique address. Up to this point, it has been useful to think of memory as a bunch of cubbyholes or mailboxes where we can put and retrieve information, and variables as names for accessing those cubbyholes or mailboxes.

However, this analogy is not quite correct in one regard -- most objects actually take up more than 1 byte of memory. A single object may use 2, 4, 8, or even more consecutive memory addresses. The amount of memory that an object uses is based on its data type.

Because we typically access memory through variable names (and not directly via memory addresses), the compiler is able to hide the details of how many bytes a given object uses from us. When we access some variable x, the compiler knows how many bytes of data to retrieve (based on the type of variable x), and can handle that task for us.

Even so, there are several reasons it is useful to know how much memory an object uses.

First, the more memory an object uses, the more information it can hold.

A single bit can hold 2 possible values, a 0, or a 1:

bit 0
0
1

2 bits can hold 4 possible values:

bit 0 bit 1
0 0
0 1
1 0
1 1

3 bits can hold 8 possible values:

bit 0 bit 1 bit 2
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1

To generalize, an object with n bits can hold 2n (2 to the power of n, also commonly written 2^n) unique values. Therefore, with an 8-bit byte, a byte-sized object can hold 28 (256) different values. An object that uses 2 bytes can hold 2^16 (65536) different values!

Thus, the size of the object puts a limit on the amount of unique values it can store -- objects that utilize more bytes can store a larger number of unique values. We will explore this further when we talk more about integers.

Second, computers have a finite amount of free memory. Every time we define an object, a small portion of that free memory is used for as long as the object is in existence. Because modern computers have a lot of memory, this impact is usually negligible. However, for programs that need a large amount of objects or data (e.g. a game that is rendering millions of polygons), the difference between using 1 byte and 8 byte objects can be significant.

Key insight

New programmers often focus too much on optimizing their code to use as little memory as possible. In most cases, this makes a negligible difference. Focus on writing maintainable code, and optimize only when and where the benefit will be substantive.

Fundamental data type sizes

The obvious next question is “how much memory do variables of different data types take?”. You may be surprised to find that the size of a given data type is dependent on the compiler and/or the computer architecture!

C++ only guarantees that each fundamental data types will have a minimum size:

Category Type Minimum Size Note
boolean bool 1 byte
character char 1 byte Always exactly 1 byte
wchar_t 1 byte
char16_t 2 bytes C++11 type
char32_t 4 bytes C++11 type
integer short 2 bytes
int 2 bytes
long 4 bytes
long long 8 bytes C99/C++11 type
floating point float 4 bytes
double 8 bytes
long double 8 bytes

However, the actual size of the variables may be different on your machine (particularly int, which is more often 4 bytes).

Best practice

For maximum compatibility, you shouldn’t assume that variables are larger than the specified minimum size.

Objects of fundamental data types are generally extremely fast.

The sizeof operator

In order to determine the size of data types on a particular machine, C++ provides an operator named sizeof. The sizeof operator is a unary operator that takes either a type or a variable, and returns its size in bytes. You can compile and run the following program to find out how large some of your data types are:

Here is the output from the author’s x64 machine, using Visual Studio:

bool:           1 bytes
char:           1 bytes
wchar_t:        2 bytes
char16_t:       2 bytes
char32_t:       4 bytes
short:          2 bytes
int:            4 bytes
long:           4 bytes
long long:      8 bytes
float:          4 bytes
double:         8 bytes
long double:    8 bytes

Your results may vary if you are using a different type of machine, or a different compiler. Note that you can not use the sizeof operator on the void type, since it has no size (doing so will cause a compile error).

For advanced readers

If you’re wondering what ‘\t’ is in the above program, it’s a special symbol that inserts a tab (in the example, we’re using it to align the output columns). We will cover ‘\t’ and other special symbols in lesson 4.11 -- Chars.

You can also use the sizeof operator on a variable name:

x is 4 bytes

Fundamental data type performance

On modern machines, objects of the fundamental data types are fast, so performance while using these types should generally not be a concern.

As an aside...

You might assume that types that use less memory would be faster than types that use more memory. This is not always true. CPUs are often optimized to process data of a certain size (e.g. 32 bits), and types that match that size may be processed quicker. On such a machine, a 32-bit int could be faster than a 16-bit short or an 8-bit char.


4.4 -- Signed integers
Index
4.2 -- Void

179 comments to 4.3 — Object sizes and the sizeof operator

  • Vincent C

    Hi,

    I have questions on the maximum value of int type in C++. You mentioned that the minimum byte size of int is 2.

    However, the documentation in Microsoft:
    https://docs.microsoft.com/en-us/cpp/c-language/cpp-integer-limits?view=vs-2019

    stated the INT_MAX constant refers to 2147483647, which means the integer is 4 bytes.

    If the minimum byte size of integer is 2, why would the constant be defined as a 4-byte value? Does it mean that the program using that constant in some computers will crash if the int type in the computer architecture is only 2 bytes?

    • INT_MAX is a define of msvc++, it's not standard, don't use it. As long as that compiler uses 4 bytes for ints, there is no problem. If microsoft decides to use a different sized integer, they need to update the define too.

  • Louis Cloete

    @Alex, in your "Key insight" block, you explain why it isn't necessary on a modern machine to try to use the smallest variable possible. It is, as I understand, not only not necessary, but also less performant in some cases. As far as I know, 4-byte integers are the fastest on modern 32-bit and 64-bit platforms. I was also told not to worry about sizes and just use Integer when I learnt Pascal, but the penny dropped only when I read that it is slower to use other types (and my teacher programmed in C in the '70s, so he would know what he is talking about when he says you can use four-byte integers safely today ;-)).

    This long-winded thing is to suggest you drop a mention about the speed benefit of int over short and char into the block.

    • Alex

      While you're not wrong, any optimizations that have a negligible impact on memory usage also likely have a negligible impact on performance.

      I added a point about performance to a new section at the end of the lesson.

  • Em

    Hi,

    You wrote that:
    An object that uses 2 bytes can hold 2^16 (65536) different values!

    I believe that this was a typo, and you meant to say 16 bytes, not 2 bytes.

  • Jules

    I hope someone like nascardriver or Alex sees this,
    so I quite don't understand the relationship between number of bytes that a datatype has v/s actual values it can hold.
    for eg:
    An standard int is of 4 bytes, so the number of values it can hold are 2^4 = 16.
    but the value range an integer is -
    -32,768 to 32,767 or -2,147,483,648 to 2,147,483,647

    Also what is up with there being different datatypes of char?,if a char holds only one character why are there different types of chars like - char,wchar_t,char16_t and char32_t, If i wanted to store a string wont i just declare an array?

    i'm kinda lost here.

    • Hi Jules!

      > An standard int is of 4 bytes, so the number of values it can hold are 2^4 = 16.
      Both nope. An int doesn't have to be 4 bytes and your calculation is off.
      Assuming a 4 byte int:
      4 bytes are 32 bits. That makes 2^32 possible values. The range is -(2^31) to (2^31)-1

      > char
      1 byte

      > wchar_t
      2 bytes (No guarantees, I'm doing this from memory)

      > char16_t
      2 bytes

      > char32_t
      4 bytes

      The larger char types are required to store non-ascii characters, because those can take up more than 1 byte.
      You can store unicode strings in char arrays, but the individual characters will be split into multiple chars.

      • Jules

        Hi nascar!

        thanks for replying on such a short notice, it seems i mistook bits for bytes here.
        as you said assuming that an integer has 4 bytes, the possible values would be 2^32 = 4294967296
        now wont the range be: -2147483648 to +2147483647, your definition would have the range as -4294967296 to +4294967296, wont it?

        also what do you mean by -
        >but the individual characters will be split into multiple chars.
        thanks for your time.

        ~Jules.

        • > your definition would have the range as -4294967296 to +4294967296, wont it?
          It won't. I used 2^31 in my range, not 2^32

          > also what do you mean by

          The lightning symbol has a value of 0x21AF ( https://unicode-table.com/en/21AF/ ). Too much for a single char.

          Running the code and giving a lightning as input results in

          input: ↯
          length: 3
          0: � (-30)
          1: � (-122)
          2: � (-81)

          We input a single character, but we need 3 chars to store it. Each char on it's own doesn't make a whole lot of sense. But my terminal (and presumably yours too) knows how to print the 3 successive chars as a single character.

  • Quyết

    Hi, i have a question...
    look gt1 function and gt2 function...

    which good ???
    Thanks!

  • Hi Alex!

    I have doubts about your "C++ guarantees that the basic data types will have a minimum size" table.
    The standard only states
    "There are five standard signed integer types : “signed char”, “short int”, “int”, “long int”, and “long long int”. In this list, each type provides at least as much storage as those preceding it in the list."
    N4762 § 6.7.1 (2)

    To me this sounds like a long long int could be a 1 byte sized type.

    • Alex

      The C++ standard does only explicitly state as you say. However, the C++ standard apparently references the C standard in this regard, and the C standard implies a minimum range of numbers that each type must be able to hold. Implicitly, that implies a minimum size.

      Here's the minimum sizes from the C standard: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf (see page 22)

      Also see https://stackoverflow.com/questions/50930891/what-is-the-minimum-size-of-an-int-in-c for some discussion about this.

      Finally, note that https://en.cppreference.com/w/cpp/language/types corroborates this understanding.

Leave a Comment

Put all code inside code tags: [code]your code here[/code]