Search

2.3 — Variable sizes and the sizeof operator

As you learned in the lesson 2.1 -- basic addressing and variable definition, memory on modern machines is typically organized into byte-sized units, with each unit having a unique address. Up to this point, it has been useful to think of memory as a bunch of cubbyholes or mailboxes where we can put and retrieve information, and variables as names for accessing those cubbyholes or mailboxes.

However, this analogy is not quite correct in one regard -- most variables actually take up more than 1 byte of memory. Consequently, a single variable may use 2, 4, or even 8 consecutive memory addresses. The amount of memory that a variable uses is based on it’s data type. Fortunately, because we typically access memory through variable names and not memory addresses, the compiler is largely able to hide the details of working with different sized variables from us.

There are several reasons it is useful to know how much memory a variable takes up.

First, the more memory a variable takes up, the more information it can hold. Because each bit can only hold a 0 or a 1, we say that bit can hold 2 possible values.

2 bits can hold 4 possible values:

bit 0 bit 1
0 0
0 1
1 0
1 1

3 bits can hold 8 possible values:

bit 0 bit 1 bit 2
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1

To generalize, a variable with n bits can hold 2n (2 to the power of n, also commonly written 2^n) possible values. Because a byte is 8 bits, a byte can store 28 (256) possible values.

The size of the variable puts a limit on the amount of information it can store -- variables that utilize more bytes can hold a wider range of values. We will address this issue further when we get into the different types of variables.

Second, computers have a finite amount of free memory. Every time we declare a variable, a small portion of that free memory is used as long as the variable is in existence. Because modern computers have a lot of memory, this often isn’t a problem, especially if only declaring a few variables. However, for programs that need a large amount of variables (eg. 100,000), the difference between using 1 byte and 8 byte variables can be significant.

However, you may find on your system that the variables are larger (particularly for int, which is often 4 bytes).

The size of C++ basic data types

The obvious next question is “how much memory do variables of different data types take?”. You may be surprised to find that the size of a given data type is dependent on the compiler and/or the computer architecture!

C++ guarantees that the basic data types will have a minimum size:

Category Type Minimum Size Note
boolean bool 1 byte
character char 1 byte May be signed or unsigned
wchar_t 1 byte
char16_t 2 bytes C++11 type
char32_t 4 bytes C++11 type
integer short 2 bytes
int 2 bytes
long 4 bytes
long long 8 bytes C99/C++11 type
floating point float 4 bytes
double 8 bytes
long double 8 bytes

However, the actual size of the variables may be different on your machine. In order to determine the size of data types on a particular machine, C++ provides an operator named sizeof. The sizeof operator is a unary operator that takes either a type or a variable, and returns its size in bytes. You can compile and run the following program to find out how large some of your data types are:

Here is the output from the author’s x64 machine (in 2015), using Visual Studio 2013:

bool:           1 bytes
char:           1 bytes
wchar_t:        2 bytes
char16_t:       2 bytes
char32_t:       4 bytes
short:          2 bytes
int:            4 bytes
long:           4 bytes
float:          4 bytes
long long:      8 bytes
double:         8 bytes
long double:    8 bytes

Your results may vary if you are using a different type of machine, or a different compiler. Note that you can not take the sizeof the void type, since is has no size (doing so will cause a compile error).

If you’re wondering what ‘\t’ is in the above program, it’s a special symbol that inserts a tab (in the example, we’re using it to align the output columns). We will cover ‘\t’ and other special symbols when we talk about the char data type.

Interestingly, the sizeof operator is one of only three operators in C++ that is a word instead of a symbol. The other two are new and delete.

You can also use the sizeof operator on a variable name:

x is 4 bytes

We’ll discuss the size of different types in the upcoming lessons, as well as a summary table at the end.

2.4 -- Integers
Index
2.2 -- Void

52 comments to 2.3 — Variable sizes and the sizeof operator

  • Abhishek

    Never heard of wchar_t before……is that a new data type?What kind of data does it hold?

    • There’s more information about wchar_t on wikipedia. In short, it was meant to be used to hold “wide characters” (eg. those that take more than 8 bits to represent). However, the size varies depending on platform (and can be as small as 8 bits), so I’m not sure I see the practical use.

      Edit: To be clear, I’m not sure I see the practical use of using wchar_t instead of a variable that’s guaranteed to be at least 16 bits (e.g. char16_t).

      • chris

        Pocket PC/Windows Mobile only uses wide characters. Beyond that the practical use is for foreign languages. Some languages (like Chinese) have a lot more than 128 characters.

  • Nikki

    “On most 32-bit machines …. ”
    I am not sure if this is the right place to ask this question .
    What is a “32-bit machine ” ?
    Thanks,
    Nikki

    • Computers work by moving binary digits (bits) around. However, most computers do not work with individual bits -- rather, they move data around in chunks. This chunk size is called a “word”. Typically, when we speak of the bit-ness of a machine, we speak of the size of a word. Thus, a 32-bit machine has a 32-bit word size, which means it moves information around 32-bits at a time.

      Typically, modern computers use one word to address memory. With a 32-bit word, this means there are about 2^32 (4 billion) unique memory addresses that can be addressed. This is why 32-bit machines generally can’t make use of more than 4GB of memory.

      • Frederik

        The amount of memory a machine can address has nothing to do with the “bitness” of the machine -- in fact there is no real consensus on what it means to be of a certain “bitness”.

        Some 32 bits machines (such as the Pentium Pro and later 32 bit x86s) can address more than 4 GiB, but they are still only considered 32 bit computers.

        I (personally) only consider a machine to be an n bit machine, if it is able to (at least) address up to 2^n individual bytes, is able to hold at least n bits in all registers and can do all operations on n bit (or larger) registers.

      • JD

        So, if I understand this correctly, the computer moves data around in 32-bit chunks (4 bytes), but in my C++ program I can assign variables to a single byte of memory. This seems contradictory to me.

        I guess my question boils down to: Say I have a program that uses a large number of variables, so space matters. One byte will suffice for the data I need to carry so to save space I assign them as chars (1 byte). But, if the smallest “chunk” that the computer passes around is 4 bytes, does this actually save any space? Or would this effectively use the same amount of memory as if I made my variables ints (4 bytes, or one full word)?

        This is a somewhat subtle question, so let me know if I’m not being clear.

        PS -- Thanks for a clear, organized, and well-written tutorial! I’ve really enjoyed it so far.

        • Alex

          Yes, using 1-byte variables saves space. Modern architectures generally have memory that is byte-addressable, meaning you can read/write memory one byte at a time. x86 architecture computers can move different size “chunks” around (1 byte, 2 byte, 4 bytes, etc…).

          Generally you won’t have to worry about how the computer internally accesses memory and moves data around. Your compiler should be smart enough to use the best set of assembly instructions for whatever task you’re throwing at it.

      • zingmars

        It would be easier to explain the whole thing using the architecture name x86(x86_64) rather than calling them ’32-bit’ machines. Confuses people.

  • vader347

    my long double takes 12 bytes

  • Syed

    Hi, I would like to know what is the sizeof(long double); ? In this tutorial it is mentioned 8 bytes. Where as i tried in Dev C++ compiler it is giving 12.

    Thanks,
    syed

    • The size of a long double can vary from machine to machine. On most machines, it is either 8 or 12. The only way to know for sure is to use sizeof(long double) just as you have.

  • Mitul Golakiya

    My int takes 2 bytes.
    I am working with tourbo C .
    My PC is 32-bit.

  • Julian

    Wow, imagine how many variables I could store in a kilobyte :D

    I know that years ago when they used to use punchtape in stuff like CNC machines, having a strip of tape that was over 12 kilobytes of information was very impracticle :P

    Now that I think about it, punchtape was just a lot like bytes, where each row on the tape would have 8 circles, some of which were punched out (which I guess would be the equivalent to a bit with value 1) and the computer would read the values off it… At least I think that’s how it worked ;S

    • rameye

      Back in the late 1970s we had an air show at the Air Force base I was stationed at. We had a paper tape punch machine set up so that attendees could type in their name or whatever and have it punched out on the tape as the actual letters, not the binary. Was a hit.

  • Adam

    What do you mean when you use ^ and n in your equations? Are these standard operators and variables?

  • Ranjan

    what the resone behind it, why you can’t overload the size of operator ?

    REgards,
    Ranjan

    • Why would you want to? :)

      Stroustrup says here:

      Sizeof cannot be overloaded because built-in operations, such as incrementing a pointer into an array implicitly depends on it. Consider:

      X a[10];
      X* p = &a[3];
      X* q = &a[3];
      p++; // p points to a[4]
      // thus the integer value of p must be
      // sizeof(X) larger than the integer value of q

      Thus, sizeof(X) could not be given a new and different meaning by the programmer without violating basic language rules.

  • Anon

    So does it really matter what variable type you use? Dont get me wrong i want to use the right one, but it seems a task to remember all of the maximum values etc.

    is there a trick to it?

    There’s an awful lot to choose from.

    • PReinie

      It depends on the use or application of the code and your engineering design. (You can’t put 5# of @&*!*^ in a 1# bag.) As Alex said above “However, for programs that need a large amount of variables (eg. 100,000), the difference between using 1 byte and 8 byte variables can be significant.” that’s an 8-times increase of memory.

      If you use the maximum word size for each piece of memory, your “product” may cost more because you need more memory to hold or run your program. The additional memory may also require larger circuit boards to hold the memory chips and power for the boards/memory which increases the weight and may require a fan (which takes power) to cool the components. This might be important when go to market or have to carry it to a space station, a battlefield, hiking up a mountain or as a cell phone or iPod.

      If you’re just providing code for existing computers it may not be that big a deal.

      (Sorry for the long winded explanaition. I’ve programmed to the bit level in assembly and in C for limited memory machines.)

  • This tutorial is so easy. It makes learning programing so easy. The problem comes once you get out of C++ and try to understand msdn help. These guys from Microsoft are from outer space. Just try looking up some of the things you learn here and see if you can get anything that explains it in common english. I could sure use a tutorial on how to use Microsoft help, once you get to the help you need. Sometimes they just go round and round.

  • kemawalker

    I am on a MAC (64bit) so my values are significantly larger for 4 of the types.

    QUESTION -- how do you manage the risk that you might code for too large a type that won’t run on other machines? In other words, if I code using “long double” which is 16 bytes on my machine, but only 8 bytes on yours, will there be an issue when the program runs or no?

    • Alex

      This is a challenging question with no easy answer.

      If you’re writing for portability, the best thing to do is assume that variables are only as large as their guaranteed minimum sizes and avoid the problem in the first place.

      If you do need to assume long double is 16 bytes for your code, you could use assert() to “document” that:

      That way anybody who compiles the code on an architecture or compiler that is using a different size would be aware of the issue.

  • daksh


    bool: 1 bytes
    char: 1 bytes
    wchar_t: 2 bytes
    short: 2 bytes
    int: 4 bytes
    long: 4 bytes
    float: 4 bytes
    double: 8 bytes
    long double: 8 bytes

    Your results may vary if you are using a different type of machine, or a different compiler.

    I can understand that it depends on the machine.. But how does it depend on the compiler??

    • rameye

      Most modern compilers have options you can specify on the command line to target different architectures.

      There is a lot of information in the g++ documentation about this.

  • sanjeev_e

    Hi Alex/Members,

    Is there any other way to find the size of a variable without using sizeof(operator).

    I have seen in some of the websites that the below way we can find the same.

    int i = 1;

    size_t size1 = (char*)(&i+1)-(char*)(&i);

    size_t size2 = (int*)(&i+1)-(int*)(&i);

    cout<<size1<<"\t"<<size2<<endl;

    Output: 4 1

    Why it is varying here by typecasting with different datatypes?

    I want to know where it is restricted to use sizeof() operator anywhere? and also whether the above code internally uses sizeof() operator like

    size1 = address-diff/sizeof(int); size2 = address-diff/sizeof(char);

    Please clarify.

    Thanks,
    Sanjeev.

    • rameye

      Any time you use mathematical operators with pointer types, you then are working with pointer arithmetic, and the expressions will evaluate differently depending on the size of the type pointed to.


      int i;
      size_t size1 = &i+1; // size1 is assigned address of next byte in storage space of the integer i
      size_t size2 = (int*)(&i+1); // size2 is assigned address of the next following integer, perhaps in an array.

      In the case of the size2 assignment in your posting, pointer arithmetic will resolve the size of the type, by taking the difference of the addresses of two identical data types that are adjacent in memory.

  • jimbo

    First of all thanks Alex for this amazing tutorial…your awesome!!

    I was just wandering if one bit can store a 0 or a 1 so two values and two bits can store 4 different values i.e 0101. How come it says 3 bits can store 8 values i.e 01010101? Surely 3 bits would store 6 different values if 1 bit stores 2 different values. 3*2 = 6. If I am missing the point can someone please try and explain where I am going wrong!?
    Many thanks in advance.

  • jimbo

    It’s ok I just worked it out. one bit can store two values i.e a 0 or a 1 i.e 2^1 2*1=2. 2 bits can store 4 different values i.e 2^2 2*2= 4 and 3 bits can store 8 different values i.e 2^3 2*2*2= 8.

    One more thing..just wondering what to the nth power means? is it like 2^9 or 2*2*2*2*2*2*2*2*2? Many thanks in advance.

    • tsdrifter

      n doesn’t stand for 9, it just stands for any integer. what he is saying is if you have a given number of bits, you can store 2^(number of bits). if you had nine bits, then n would equal 9, but this is not always the case.

    • rameye

      Since ^ is being so freely used here in the comments, I must mention to be careful using ^ as an operator in C++ expressions.

      C++ has NO built-in operator for exponentiation. The ^ operator in C++ is a binary (operating on two values) operator performing a bitwise XOR.

      For example what does 2^3 resolve to in C++?

      int x = 2^3;

      x will not contain the value 8, it will contain the value 1

      Why is this?

      Assuming 8 bit wide integers this is what happened:


      00000010 ^
      00000011
      --------
      00000001 <== bitwise XOR of the two integers results in value of 1

  • bacia

    Using long doubles I made a program which can compute factorials up to 170!, more than my TI-831!

  • machello

    Hi Alex!

    I must say you have highly systematic tutorials which are very pedagogical written!

    I just have one remark which is worth to mention.
    If trying to determine the size of a structure or in general any object, you probably
    may not get the amount of bytes used by the object as you probably expected.
    Example:

    struct Something
    {
    int nX, nY, nZ;
    double dX;
    };

    Something sSomeStuff;

    You expect (64bit, gcc 4.7.2): sizeof( sSomeStuff ) == 20 (= 3*4 + 8)
    (i.e. sizeof(int) == 3, sizeof(double) == 8 ).
    The object uses 20 bytes of memory or at least
    that must is allocated for each variable in the structure. However you won’t that value!
    Instead you get: sizeof( sSomeStuff ) == 24.

    I won’t explain in details (internet is used for this purpose) but it is related with
    how CPU can most efficienty collect variables (blocks) or more techical, is related with
    data structure alignment or internal padding.

    In the example, the higest variable size (double) is sizeof( double ) == 8. Total
    effective size of the structure (without padding) is 20. Structure block size must
    be (or should, for most systems) multiple of the highest variable size in that structure,
    in the example, multiple of 8: 8, 16, 24, 32,…
    Optimal structure size is not 16, which is to small,
    but 24, which is higher than effective structure size.

    With proper order of type declarations, additional padding bytes could be minimized.

    With regards,
    M

  • FrostByte

    Shouldn't
        cout << "char16_t:\t" << sizeof(wchar_t) << " bytes" << endl; // C++11, may not be supported by your compiler
        cout << "char32_t:\t" << sizeof(wchar_t) << " bytes" << endl; // C++11, may not be supported by your compiler
    be
        cout << "char16_t:\t" << sizeof(char16_t) << " bytes" << endl; // C++11, may not be supported by your compiler
        cout << "char32_t:\t" << sizeof(char32_t) << " bytes" << endl; // C++11, may not be supported by your compiler

    Otherwise it prints out wchar_t size instead of char16_t and char32_t!

  • Catreece

    Alright, that's weird…

    So I ran the program, curious as to what was different. I expected my values to probably have one or two higher than normal. Instead, it popped up minimum values for everything… except one. (Well, two; int was 4 instead of 2; made sense after checking the next chapter though; that’s not what bothers me, though.)

    Somehow, char32_t shows up as only 2 bytes for my machine, but in the tutorial it states that C++ architecture ensures char32_t will have an absolute minimum of 4 bytes.

    So… yeah, I dunno what's up. Is it feeding me incorrect information, or has there somehow been a change where it was found to actually be possible to compress char32_t into 2 bytes instead of 4? Something's strange regardless of the answer.

    ------------
    NEVER MIND, I'd typed that all out and figured out the problem:

    In the above code, there's an error.

    What you have listed is:

        cout << "bool:tt" << sizeof(bool) << " bytes" << endl;
        cout << "char:tt" << sizeof(char) << " bytes" << endl;
        cout << "wchar_t:t" << sizeof(wchar_t) << " bytes" << endl;
        cout << "char16_t:t" << sizeof(wchar_t) << " bytes" << endl; // C++11, may not be supported by your compiler
        cout << "char32_t:t" << sizeof(wchar_t) << " bytes" << endl; // C++11, may not be supported by your compiler

    What SHOULD be listed is:

        cout << "bool:tt" << sizeof(bool) << " bytes" << endl;
        cout << "char:tt" << sizeof(char) << " bytes" << endl;
        cout << "wchar_t:t" << sizeof(wchar_t) << " bytes" << endl;
        cout << "char16_t:t" << sizeof(char16_t) << " bytes" << endl; // C++11, may not be supported by your compiler
        cout << "char32_t:t" << sizeof(char32_t) << " bytes" << endl; // C++11, may not be supported by your compiler

    It was feeding me the stats for wchar_t not char16_t or char32_t, hence why it popped up 2.

    Once I fed in the updated code, it gave back the correct 4 bytes for char32_t, and char16_t remained at 2 bytes.

    I double-checked all the others just to be on the safe side and the rest of them are good, it was just the char16_t and char32_t that were off.

    Anyway, problem solved! Yay! Shame it wasn't possible to compress char32_t down to 2 bytes, though. =P

    • Alex

      Yup, copy/paste error in the sample code. It’s fixed now. Thanks for debugging. :)

      • Catreece

        Welcome, thanks for teaching me how to in the first place! =P

        I'm starting to think coding's a bit more of an artform than I already thought, though. Like it was already clear that it requires a lot of logic, but also a fair bit of grace. At this point, I'm starting to think it follows the rule for art:

        Art is never "finished", it simply runs out of time, resources or patience.

        Debugging seems to be a similar endeavour -- you'll never get 100% of it cleared out on really complex programs (Maya, Windows, whatever), which explains why they're always coming out with new patches to fix stuff.

        I kind of knew before that changing anything in a program could inadvertently break something else in the process if you didn't especially carefully comment the hell out of everything, hence why so many games break something during a patch (LoL and WoW are especially notorious for such). I'm starting to see why it happens so often now, though, and even good commenting isn't always enough to completely avoid the damage.

        I suppose that's why alpha testers are so important though, and why wide-scale beta tests are essential. A million people trying to break something are simply vastly more effective at finding ways to do so casually, than a handful of dedicated testers that are actively trying to break something.

        Anyway, thanksies for the tutorial as normal. I'd almost suggest the average person should try to go through the first 2-3 chapters just to get a feel for how many problems there can be when writing a program, so that they're a little more understanding of the issues faced. =P

  • PixelHero

    It’ll be a problem if I use the C++11 language standard to compile my programs for the rest of the tutorial? That was the only way i could make the code work, otherwise it’ll not compile at all.

    • Alex

      It’s fine if your compiler doesn’t support C++11. You’ll occasionally need to modify an example to remove C++11 specific lines (as in the case above), but I’ll generally try to keep the C++11 stuff separate so the examples will work whether your compiler supports C++11 or not.

  • Chris

    Hey Alex,

    Just a heads up, the print out from your computer is missing the "long" output. It goes from "float" straight to "long long".

Leave a Comment

  

  

  

8 + 10 =

Put C++ code inside [code][/code] brackets to use the syntax highlighter