4.8 — Floating point numbers

Integers are great for counting whole numbers, but sometimes we need to store very large numbers, or numbers with a fractional component. A floating point type variable is a variable that can hold a real number, such as 4320.0, -3.33, or 0.01226. The floating part of the name floating point refers to the fact that the decimal point can “float”; that is, it can support a variable number of digits before and after the decimal point.

There are three different floating point data types: float, double, and long double. As with integers, C++ does not define the actual size of these types (but it does guarantee minimum sizes). On modern architectures, floating point representation almost always follows IEEE 754 binary format. In this format, a float is 4 bytes, a double is 8, and a long double can be equivalent to a double (8 bytes), 80-bits (often padded to 12 bytes), or 16 bytes.

Floating point data types are always signed (can hold positive and negative values).

Category Type Minimum Size Typical Size
floating point float 4 bytes 4 bytes
double 8 bytes 8 bytes
long double 8 bytes 8, 12, or 16 bytes

Here are some definitions of floating point numbers:

When using floating point literals, always include at least one decimal place (even if the decimal is 0). This helps the compiler understand that the number is a floating point number and not an integer.

Note that by default, floating point literals default to type double. An f suffix is used to denote a literal of type float.

Best practice

Always make sure the type of your literals match the type of the variables they’re being assigned to or used to initialize. Otherwise an unnecessary conversion will result, possibly with a loss of precision.


Make sure you don’t use integer literals where floating point literals should be used. This includes when initializing or assigning values to floating point objects, doing floating point arithmetic, and calling functions that expect floating point values.

Printing floating point numbers

Now consider this simple program:

The results of this seemingly simple program may surprise you:


In the first case, the std::cout printed 5, even though we typed in 5.0. By default, std::cout will not print the fractional part of a number if the fractional part is 0.

In the second case, the number prints as we expect.

In the third case, it printed the number in scientific notation (if you need a refresher on scientific notation, see lesson 4.7 -- Introduction to scientific notation).

Floating point range

Assuming IEEE 754 representation:

Size Range Precision
4 bytes ±1.18 x 10-38 to ±3.4 x 1038 6-9 significant digits, typically 7
8 bytes ±2.23 x 10-308 to ±1.80 x 10308 15-18 significant digits, typically 16
80-bits (typically uses 12 or 16 bytes) ±3.36 x 10-4932 to ±1.18 x 104932 18-21 significant digits
16 bytes ±3.36 x 10-4932 to ±1.18 x 104932 33-36 significant digits

The 80-bit floating point type is a bit of a historical anomaly. On modern processors, it is typically implemented using 12 or 16 bytes (which is a more natural size for processors to handle).

It may seem a little odd that the 80-bit floating point type has the same range as the 16-byte floating point type. This is because they have the same number of bits dedicated to the exponent -- however, the 16-byte number can store more significant digits.

Floating point precision

Consider the fraction 1/3. The decimal representation of this number is 0.33333333333333… with 3’s going out to infinity. If you were writing this number on a piece of paper, your arm would get tired at some point, and you’d eventually stop writing. And the number you were left with would be close to 0.3333333333…. (with 3’s going out to infinity) but not exactly.

On a computer, an infinite length number would require infinite memory to store, and typically we only have 4 or 8 bytes. This limited memory means floating point numbers can only store a certain number of significant digits -- and that any additional significant digits are lost. The number that is actually stored will be close to the desired number, but not exact.

The precision of a floating point number defines how many significant digits it can represent without information loss.

When outputting floating point numbers, std::cout has a default precision of 6 -- that is, it assumes all floating point variables are only significant to 6 digits (the minimum precision of a float), and hence it will truncate anything after that.

The following program shows std::cout truncating to 6 digits:

This program outputs:


Note that each of these only have 6 significant digits.

Also note that std::cout will switch to outputting numbers in scientific notation in some cases. Depending on the compiler, the exponent will typically be padded to a minimum number of digits. Fear not, 9.87654e+006 is the same as 9.87654e6, just with some padding 0’s. The minimum number of exponent digits displayed is compiler-specific (Visual Studio uses 3, some others use 2 as per the C99 standard).

The number of digits of precision a floating point variable has depends on both the size (floats have less precision than doubles) and the particular value being stored (some values have more precision than others). Float values have between 6 and 9 digits of precision, with most float values having at least 7 significant digits. Double values have between 15 and 18 digits of precision, with most double values having at least 16 significant digits. Long double has a minimum precision of 15, 18, or 33 significant digits depending on how many bytes it occupies.

We can override the default precision that std::cout shows by using the std::setprecision() function that is defined in the iomanip header.



Because we set the precision to 16 digits, each of the above numbers is printed with 16 digits. But, as you can see, the numbers certainly aren’t precise to 16 digits! And because floats are less precise than doubles, the float has more error.

Precision issues don’t just impact fractional numbers, they impact any number with too many significant digits. Let’s consider a big number:



123456792 is greater than 123456789. The value 123456789.0 has 10 significant digits, but float values typically have 7 digits of precision (and the result of 123456792 is precise only to 7 significant digits). We lost some precision! When precision is lost because a number can’t be stored precisely, this is called a rounding error.

Consequently, one has to be careful when using floating point numbers that require more precision than the variables can hold.

Best practice

Favor double over float unless space is at a premium, as the lack of precision in a float will often lead to inaccuracies.

Rounding errors make floating point comparisons tricky

Floating point numbers are tricky to work with due to non-obvious differences between binary (how data is stored) and decimal (how we think) numbers. Consider the fraction 1/10. In decimal, this is easily represented as 0.1, and we are used to thinking of 0.1 as an easily representable number with 1 significant digit. However, in binary, 0.1 is represented by the infinite sequence: 0.00011001100110011… Because of this, when we assign 0.1 to a floating point number, we’ll run into precision problems.

You can see the effects of this in the following program:

This outputs:


On the top line, std::cout prints 0.1, as we expect.

On the bottom line, where we have std::cout show us 17 digits of precision, we see that d is actually not quite 0.1! This is because the double had to truncate the approximation due to its limited memory. The result is a number that is precise to 16 significant digits (which type double guarantees), but the number is not exactly 0.1. Rounding errors may make a number either slightly smaller or slightly larger, depending on where the truncation happens.

Rounding errors can have unexpected consequences:


Although we might expect that d1 and d2 should be equal, we see that they are not. If we were to compare d1 and d2 in a program, the program would probably not perform as expected. Because floating point numbers tend to be inexact, comparing floating point numbers is generally problematic -- we discuss the subject more (and solutions) in lesson 5.6 -- Relational operators and floating point comparisons.

One last note on rounding errors: mathematical operations (such as addition and multiplication) tend to make rounding errors grow. So even though 0.1 has a rounding error in the 17th significant digit, when we add 0.1 ten times, the rounding error has crept into the 16th significant digit. Continued operations would cause this error to become increasingly significant.

Key insight

Rounding errors occur when a number can’t be stored precisely. This can happen even with simple numbers, like 0.1. Therefore, rounding errors can, and do, happen all the time. Rounding errors aren’t the exception -- they’re the rule. Never assume your floating point numbers are exact.

A corollary of this rule is: be wary of using floating point numbers for financial or currency data.

NaN and Inf

There are two special categories of floating point numbers. The first is Inf, which represents infinity. Inf can be positive or negative. The second is NaN, which stands for “Not a Number”. There are several different kinds of NaN (which we won’t discuss here). NaN and Inf are only available if the compiler uses a specific format (IEEE 754) for floating point numbers. If another format is used, the following code produces undefined behavior.

Here’s a program showing all three:

And the results using Visual Studio 2008 on Windows:


INF stands for infinity, and IND stands for indeterminate. Note that the results of printing Inf and NaN are platform specific, so your results may vary.

Best practice

Avoid division by 0 altogether, even if your compiler supports it.


To summarize, the two things you should remember about floating point numbers:

1) Floating point numbers are useful for storing very large or very small numbers, including those with fractional components.

2) Floating point numbers often have small rounding errors, even when the number has fewer significant digits than the precision. Many times these go unnoticed because they are so small, and because the numbers are truncated for output. However, comparisons of floating point numbers may not give the expected results. Performing mathematical operations on these values will cause the rounding errors to grow larger.

4.9 -- Boolean values
4.7 -- Introduction to scientific notation

393 comments to 4.8 — Floating point numbers

  • Math

    Is the f suffix necessary or can I not put it when outputting float variables?

    • nascardriver

      The f suffix makes the literal a `float`. Without the suffix, it's a `double`. Unless you need a `float`, there's no reason to use one.

  • Math

    I don't understand why in this code

    the first 3 numbers are are displayed as expected but the last two numbers are displayed in scientific notation I don't understand why.

  • Great post. I was checking continuously this blog and I'm impressed! Extremely helpful info specially the last part :) I care for such information much. I was looking for this particular information for a very long time. Thank you and best of luck.

  • Matt

    Really don't want to be a nitpick but on  this page the examples for
    "Rounding errors make floating point comparisons tricky"
    "NaN and Inf"
    use the old style 'endl' for newline instead of '\n'.
    Also doubles d1 and d2 in the former are being initialized using the deprecated method of () rather than {}.

    Just thought you'd wanna know :)

  • Confused!

    "Floating point numbers often have small rounding errors, even when the number has fewer significant digits than the precision."

    I don't get this definition "even when the number has fewer significant digits than the precision."

  • sami (extra word)

    NaN and Inf (and) only available if the compiler uses a specific format (IEEE 754) for floating point numbers. If another format is used, the following code produces undefined behavior.

    I think the second and - I put in parentheses- is extra.

  • typo

    "the float exhibits (THAT IT ) has  more error."

  • Thank you

    After 6 years of being computer science student, I just found what is the reason of calling real numbers "floating" point number! Thank you so much for such fabulous, freaking astonishing tutorials!

  • Centriax

    what is the difference between using std::cout << std::setprecision(17); and std::cout.precision(17);

  • Innervate

    What data type should be used for currency/financial data? would the double data type suffice for rounding errors for tasks involving financial data? or would we need to use a different data type?

    • Alex

      There are many different answers to this question, all of which have tradeoffs.

      For a play project, I'd just use double. For a professional project, do some research on the pros and cons of different options.

  • Alek

    Hello,considering this lesson how did programmers create mathematical programs with almost 100% prcision ? like PhotoMath which I used personally they can calculate math complex expressions let alone adding ".1" ten times to represent the value "1".

  • antiriad7

    Hi, do I need to remember how to set precision? Because I think I will forget the name of the header easily.

  • SZS

    Although we might expect that d1 and d2 should be equal, we see that they are not."

    UMM AKTCHUALLY they are the same number

    jk, thanks for the tutorials, they have been a godsend in these homestuck times

  • Taras

    Dear Nascardriver and Alex, thx for your great support in learning c++!
    I got some hard point can`t comprehend. Here is code above:

    I got no presumption of how it has happend to output 123456792 . I suppose the 7 significant precision 1234567 but from where it
    took 92? Is it of the decimal representation cout produced to output the non significants stored as binary?

    • nascardriver

      A `float` can't store 123456789.0f on your system. You can see that the value of the float, not just the printed value, is different by adding

      This prints the same value as your line. Floating point numbers are stored as a base with an exponent (x*(y^z)). Your number is stored as
      1.8396495580673218 * 2^26 = 123456792
      Read a tutorial about IEEE 754 for more information about storing floating point numbers in binary.

  • Tim

    needed to read this! Thank you for helping me understand why the output for a float is only 6 digits when the minimum value is clearly 1.4E-45, which has way more decimal digits. Basically, the minimum value is that number, but can only store up to around 6 digits because a float takes up 4 bytes of memory. Thank you!! Understanding this concept in Java btw

  • Chayim

    You did not explain what and for what inf and Nan are.

  • Chayim

    Can you explain in detail how from:
    std::cout << 9876543.21 << '\n'
    Becomes 9.87654e+06 in scientific notation?
    Where did the 3 go? How from 3.21 became included in 06?

  • Chayim

    What’s the difference between floating point and double point?

    • nascardriver

      `double` and `float` are floating point numbers. A `double` is usually wider than a `float` (64 bits vs 32 bits), so you get more precision or a wider range from a `double`.

      • Chayim

        The precision difference I knew it's mentioned in the session. What I meant to ask is:
        What float is I know, it's a decimal and numbers left and right, but what is double? is it exactly the the float point but named double because it has double range?

        • nascardriver

          `float`, `double`, and `long double` are all numbers with a decimal and numbers left and right. They're usually implemented in the same way, just with different widths. In code, you can tell them apart by the suffixes of their literals.

          `double` is the most common of the three, and the easiest to use, because it doesn't have suffix. As to the origins of the names,
          float: _float_ing point number (Single precision).
          double: floating point number ("Double" precision).
          As for `long double`, I don't think there's a logical explanation. There are `long` integers, so there are `long` floating point numbers.

  • akil

    There are three different floating point data types: float, double, and long double. As with integers, C++ does not define the actual size of these types (but it does guarantee minimum sizes). What do you mean by (but it does guarantee minimum sizes) ??

  • scarlet johnson

    This program on compiling producing no. Explain Please!

  • Steve Roy

    Hii, can you explain me the significance of the suffix f after floating point numbers. And why we are using them while initializing float type variables and cout statements.

  • Andrei

    Why the hell we're setting precision like this??

    Until now we where using std::cout with operator<< to OUTPUT something. And code above looks like we're trying to output the result of the function! Why no output happens? (Some people in comments even go with endl after: std::cout << std::setprecision(8) << std::endl;)
    Thank you in advance!

    • nascardriver

      `std::cout` can do what it wants with the value you give it. For the fundamental types, and types for which this functionality is added later (eg. std::string), it prints the value.
      If you find this confusing, you can also use

  • giang

    In the d1{1.0} and d2{0.1+....} example above, I still can't understand why the screen shows 1 (even d1 is setprecision to 17 digits), and why d2 gets a result of 0.99999999...?? Can anybody help me explain this situation??

    • Alex

      Because d1 doesn't have precision errors, but d2 does. 0.1 can't be stored precisely as a double (it's like 1/3rd is as a decimal value), so every time we use 0.1 we accrue a little bit of precision loss.

  • Chrystian Żuberek

    From where are taken this random digits that are added to our number after breaching precision?

    • nascardriver

      They're caused by the way the numbers are stored. Read up on how floating point numbers are stored in binary if you're interested in where exactly those numbers come from.

      • Xavier

        But I thought floats are stored by separating the exponent and the significant digits? Like the number 1.97, the exponent is -2, and the significant digits will be 197. So if we have the number 0.1, the exponent will be -1, and the significant digits will be 1. I thought floats are not stored by directly storing the fractional binary version of the decimal. Did I miss something?

  • Cypeace

    Hi Alex and Nascar,

    I've been fiddling around some elementary school math problems and I get somehow unexpected behavior from this function:

    Can you please help me identify where am I going wrong?

    Thank you very much!
    PS: Merry christmass!!

    • nascardriver

      Line 13 is unlikely to happen. Floating point values have a limited accuracy, this is covered in detail in chapter 5. Don't compare floating points with equality. You can use `std::abs` (Removes the negative sign, if any) to check if the result is very close to 0.

      There's also a lesson about converting floating points to fractions later.

  • HolzstockG

    Can someone explain me why in computers everything (I know that's an exaggeration) is coming from the power of the number 2. Is it associated with Boolean's algebra?

    Also you have mentioned that we should omit using floating point data types for currencies calculators etc. So what should we use instead?

    Thanks in advance for reply.

    • nascardriver

      Power can be on (High) or off (Low). Each bit can be on (1) or off (0). That's easy to build and easy to work with.

      Express the number as an integer. There's a lesson about a fixed precision floating point number later on.

  • Sri charan Battu

    Hi, How did 5.0/0.0 return inf? Is it because the values used are float?. When I tried to print 5/0, the program terminated and main returned some bad value.

  • hellmet

    So std::setprecision only changes how the result is _displayed_, not the actual internal representation, correct? (By internal representation, I mean the IEEE bits representation. 1+11+52 bits for storing the number)

  • reformatorsystemu


    I am wondering why my code produces the output as below:

    Why the multiplication operation gives different result than addition?

  • Jose

    Hi, the conclusion says I should remember three things about floating point numbers but only mentions that a rounding error?

Leave a Comment

Put all code inside code tags: [code]your code here[/code]