Integers are great for counting whole numbers, but sometimes we need to store *very* large numbers, or numbers with a fractional component. A **floating point** type variable is a variable that can hold a real number, such as 4320.0, -3.33, or 0.01226. The *floating* part of the name *floating point* refers to the fact that the decimal point can “float”; that is, it can support a variable number of digits before and after the decimal point.

There are three different floating point data types: **float**, **double**, and **long double**. As with integers, C++ does not define the actual size of these types (but it does guarantee minimum sizes). On modern architectures, floating point representation almost always follows IEEE 754 binary format. In this format, a float is 4 bytes, a double is 8, and a long double can be equivalent to a double (8 bytes), 80-bits (often padded to 12 bytes), or 16 bytes.

Floating point data types are always signed (can hold positive and negative values).

Category | Type | Minimum Size | Typical Size |
---|---|---|---|

floating point | float | 4 bytes | 4 bytes |

double | 8 bytes | 8 bytes | |

long double | 8 bytes | 8, 12, or 16 bytes |

Here are some definitions of floating point numbers:

1 2 3 |
float fValue; double dValue; long double ldValue; |

When using floating point literals, always include at least one decimal place (even if the decimal is 0). This helps the compiler understand that the number is a floating point number and not an integer.

1 2 3 |
int x{5}; // 5 means integer double y{5.0}; // 5.0 is a floating point literal (no suffix means double type by default) float z{5.0f}; // 5.0 is a floating point literal, f suffix means float type |

Note that by default, floating point literals default to type double. An f suffix is used to denote a literal of type float.

Best practice

Always make sure the type of your literals match the type of the variables they’re being assigned to or used to initialize. Otherwise an unnecessary conversion will result, possibly with a loss of precision.

Warning

Make sure you don’t use integer literals where floating point literals should be used. This includes when initializing or assigning values to floating point objects, doing floating point arithmetic, and calling functions that expect floating point values.

Printing floating point numbers

Now consider this simple program:

1 2 3 4 5 6 7 8 |
#include <iostream> int main() { std::cout << 5.0 << '\n'; std::cout << 6.7f << '\n'; std::cout << 9876543.21 << '\n'; } |

The results of this seemingly simple program may surprise you:

5 6.7 9.87654e+06

In the first case, the std::cout printed 5, even though we typed in 5.0. By default, std::cout will not print the fractional part of a number if the fractional part is 0.

In the second case, the number prints as we expect.

In the third case, it printed the number in scientific notation (if you need a refresher on scientific notation, see lesson 4.7 -- Introduction to scientific notation).

Floating point range

Assuming IEEE 754 representation:

Size | Range | Precision |
---|---|---|

4 bytes | ±1.18 x 10^{-38} to ±3.4 x 10^{38} |
6-9 significant digits, typically 7 |

8 bytes | ±2.23 x 10^{-308} to ±1.80 x 10^{308} |
15-18 significant digits, typically 16 |

80-bits (typically uses 12 or 16 bytes) | ±3.36 x 10^{-4932} to ±1.18 x 10^{4932} |
18-21 significant digits |

16 bytes | ±3.36 x 10^{-4932} to ±1.18 x 10^{4932} |
33-36 significant digits |

The 80-bit floating point type is a bit of a historical anomaly. On modern processors, it is typically implemented using 12 or 16 bytes (which is a more natural size for processors to handle).

It may seem a little odd that the 80-bit floating point type has the same range as the 16-byte floating point type. This is because they have the same number of bits dedicated to the exponent -- however, the 16-byte number can store more significant digits.

Floating point precision

Consider the fraction 1/3. The decimal representation of this number is 0.33333333333333… with 3’s going out to infinity. If you were writing this number on a piece of paper, your arm would get tired at some point, and you’d eventually stop writing. And the number you were left with would be close to 0.3333333333…. (with 3’s going out to infinity) but not exactly.

On a computer, an infinite length number would require infinite memory to store, and typically we only have 4 or 8 bytes. This limited memory means floating point numbers can only store a certain number of significant digits -- and that any additional significant digits are lost. The number that is actually stored will be close to the desired number, but not exact.

The precision of a floating point number defines how many *significant digits* it can represent without information loss.

When outputting floating point numbers, std::cout has a default precision of 6 -- that is, it assumes all floating point variables are only significant to 6 digits (the minimum precision of a float), and hence it will truncate anything after that.

The following program shows std::cout truncating to 6 digits:

1 2 3 4 5 6 7 8 9 10 11 |
#include <iostream> int main() { std::cout << 9.87654321f << '\n'; std::cout << 987.654321f << '\n'; std::cout << 987654.321f << '\n'; std::cout << 9876543.21f << '\n'; std::cout << 0.0000987654321f << '\n'; return 0; } |

This program outputs:

9.87654 987.654 987654 9.87654e+006 9.87654e-005

Note that each of these only have 6 significant digits.

Also note that std::cout will switch to outputting numbers in scientific notation in some cases. Depending on the compiler, the exponent will typically be padded to a minimum number of digits. Fear not, 9.87654e+006 is the same as 9.87654e6, just with some padding 0’s. The minimum number of exponent digits displayed is compiler-specific (Visual Studio uses 3, some others use 2 as per the C99 standard).

The number of digits of precision a floating point variable has depends on both the size (floats have less precision than doubles) and the particular value being stored (some values have more precision than others). Float values have between 6 and 9 digits of precision, with most float values having at least 7 significant digits. Double values have between 15 and 18 digits of precision, with most double values having at least 16 significant digits. Long double has a minimum precision of 15, 18, or 33 significant digits depending on how many bytes it occupies.

We can override the default precision that std::cout shows by using the std::setprecision() function that is defined in the *iomanip* header.

1 2 3 4 5 6 7 8 9 |
#include <iostream> #include <iomanip> // for std::setprecision() int main() { std::cout << std::setprecision(16); // show 16 digits of precision std::cout << 3.33333333333333333333333333333333333333f <<'\n'; // f suffix means float std::cout << 3.33333333333333333333333333333333333333 << '\n'; // no suffix means double return 0; } |

Outputs:

3.333333253860474 3.333333333333334

Because we set the precision to 16 digits, each of the above numbers is printed with 16 digits. But, as you can see, the numbers certainly aren’t precise to 16 digits! And because floats are less precise than doubles, the float has more error.

Precision issues don’t just impact fractional numbers, they impact any number with too many significant digits. Let’s consider a big number:

1 2 3 4 5 6 7 8 9 10 |
#include <iostream> #include <iomanip> // for std::setprecision() int main() { float f { 123456789.0f }; // f has 10 significant digits std::cout << std::setprecision(9); // to show 9 digits in f std::cout << f << '\n'; return 0; } |

Output:

123456792

123456792 is greater than 123456789. The value 123456789.0 has 10 significant digits, but float values typically have 7 digits of precision (and the result of 123456792 is precise only to 7 significant digits). We lost some precision! When precision is lost because a number can’t be stored precisely, this is called a rounding error.

Consequently, one has to be careful when using floating point numbers that require more precision than the variables can hold.

Best practice

Favor double over float unless space is at a premium, as the lack of precision in a float will often lead to inaccuracies.

Rounding errors make floating point comparisons tricky

Floating point numbers are tricky to work with due to non-obvious differences between binary (how data is stored) and decimal (how we think) numbers. Consider the fraction 1/10. In decimal, this is easily represented as 0.1, and we are used to thinking of 0.1 as an easily representable number with 1 significant digit. However, in binary, 0.1 is represented by the infinite sequence: 0.00011001100110011… Because of this, when we assign 0.1 to a floating point number, we’ll run into precision problems.

You can see the effects of this in the following program:

1 2 3 4 5 6 7 8 9 10 11 |
#include <iostream> #include <iomanip> // for std::setprecision() int main() { double d{0.1}; std::cout << d << '\n'; // use default cout precision of 6 std::cout << std::setprecision(17); std::cout << d << '\n'; return 0; } |

This outputs:

0.1 0.10000000000000001

On the top line, std::cout prints 0.1, as we expect.

On the bottom line, where we have std::cout show us 17 digits of precision, we see that d is actually *not quite* 0.1! This is because the double had to truncate the approximation due to its limited memory. The result is a number that is precise to 16 significant digits (which type double guarantees), but the number is not *exactly* 0.1. Rounding errors may make a number either slightly smaller or slightly larger, depending on where the truncation happens.

Rounding errors can have unexpected consequences:

1 2 3 4 5 6 7 8 9 10 11 12 13 |
#include <iostream> #include <iomanip> // for std::setprecision() int main() { std::cout << std::setprecision(17); double d1{ 1.0 }; std::cout << d1 << '\n'; double d2{ 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 }; // should equal 1.0 std::cout << d2 << '\n'; } |

1 0.99999999999999989

Although we might expect that d1 and d2 should be equal, we see that they are not. If we were to compare d1 and d2 in a program, the program would probably not perform as expected. Because floating point numbers tend to be inexact, comparing floating point numbers is generally problematic -- we discuss the subject more (and solutions) in lesson 5.6 -- Relational operators and floating point comparisons.

One last note on rounding errors: mathematical operations (such as addition and multiplication) tend to make rounding errors grow. So even though 0.1 has a rounding error in the 17th significant digit, when we add 0.1 ten times, the rounding error has crept into the 16th significant digit. Continued operations would cause this error to become increasingly significant.

Key insight

Rounding errors occur when a number can’t be stored precisely. This can happen even with simple numbers, like 0.1. Therefore, rounding errors can, and do, happen all the time. Rounding errors aren’t the exception -- they’re the rule. Never assume your floating point numbers are exact.

A corollary of this rule is: be wary of using floating point numbers for financial or currency data.

NaN and Inf

There are two special categories of floating point numbers. The first is Inf, which represents infinity. Inf can be positive or negative. The second is NaN, which stands for “Not a Number”. There are several different kinds of NaN (which we won’t discuss here). NaN and Inf are only available if the compiler uses a specific format (IEEE 754) for floating point numbers. If another format is used, the following code produces undefined behavior.

Here’s a program showing all three:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
#include <iostream> int main() { double zero {0.0}; double posinf { 5.0 / zero }; // positive infinity std::cout << posinf << '\n'; double neginf { -5.0 / zero }; // negative infinity std::cout << neginf << '\n'; double nan { zero / zero }; // not a number (mathematically invalid) std::cout << nan << '\n'; return 0; } |

And the results using Visual Studio 2008 on Windows:

1.#INF -1.#INF 1.#IND

*INF* stands for infinity, and *IND* stands for indeterminate. Note that the results of printing *Inf* and *NaN* are platform specific, so your results may vary.

Best practice

Avoid division by 0 altogether, even if your compiler supports it.

Conclusion

To summarize, the two things you should remember about floating point numbers:

1) Floating point numbers are useful for storing very large or very small numbers, including those with fractional components.

2) Floating point numbers often have small rounding errors, even when the number has fewer significant digits than the precision. Many times these go unnoticed because they are so small, and because the numbers are truncated for output. However, comparisons of floating point numbers may not give the expected results. Performing mathematical operations on these values will cause the rounding errors to grow larger.

4.9 -- Boolean values |

Index |

4.7 -- Introduction to scientific notation |

Dear Mr Alex!!!!!

I am studding C++ again in an amazing way i have never learnt before! Thank you so much and God bless you indeed, Mr!!!

I have one question today! I am using visual 2017. And when i practicing your example codes it couldn't run and out put like you teach me here. What is wrong with my IDE? Can you give me some tips to see what your example expected me to see ?

stay blessed sir!!!

Dear Mr Alex!!!!!

Please note some of your example codes my IDE couldn't run and out put like you teach me here. Like precision example using header file #include <iomanip // for std::setprecision()

What is wrong with my IDE? Can you give me some tips to see what your example expected me to see ?

stay blessed sir!!

I'm not sure what's wrong with your IDE. Were you able to compile the programs successfully and they just didn't run? If that's the case, maybe try turning off your virus scanner or malware detection temporarily. They could be interfering.

Thank you so much Mr Alex!

Fixed with some corrections to my mistakes too .

I always amazed on your speed, patience,kindness and passion to answer our questions! Even for a silly once like mine!

Bless you more,sir. !!!!!!

Hello! I want to thank you again for your awesome resources Alex!

However, I have a question.

Taking your code from the second rounding errors example (when you add 0.1 ten times), I tried changing the repeated addition with multiplication, so instead of adding 0.1 ten times, I just did 10 * 0.1 to check if there would be any difference. And it actually did change something, there were no rounding errors!

Can you please explain why this works like this?

Thank you, and keep up the work on maintaining this awesome site!

This issue is this: Any floating point arithmetic operation may (or may not) cause precision errors.

When we add 0.1 ten times, we have 10 chances to hit an intermediary result that can't be represented precisely. When that happens, we get a precision error that ends up being carried into future results. If you print out all of the intermediary results, you'll see this happen.

When we do a single multiplication, there's only one chance for a precision error (as we're only doing a single operation) -- in this case, since 1.0 can be represented precisely, we avoid precision issues.

Note that if we'd chosen some other numbers, the results could vary.

Hi Alex!

Are compile-time and run-time rounding errors guaranteed to be the same?

I can't find anything to indicate to indicate either way. I would assume they could differ until proven otherwise.

So I understand that if we hit an intermediary result that cannot be represented precisely in memory (in binary), the "imprecision" will be carried along to the next result. Am I right?

If so, why can't the computer represent numbers internally using things like fractions or roots? For instance (I give this example in base ten, as I don't know much about binary), if we tried to represent 0.33333..., which would not be possible to represent exactly due to the infinite number of decimals, as 1/3 (which can be represented accurately), and then use 1/3 instead of 0.33333... when we want to do anything with it (like addition, multiplication, etc.) and only display it's decimal (approximate) value when requested?

> So I understand that if we hit an intermediary result that cannot be represented precisely in memory (in binary), the "imprecision" will be carried along to the next result. Am I right?

Yes. In some cases, precision issues may cancel each other out. But more often, they compound upon each other, with the actual number drifting farther away from the intended number with each successive operation.

Computer don't represent numbers internally using fractions because it's inefficient to store things that way. Of course, if you want fractions, you can always store your own results as fractions and convert fractions to decimals on demand.

I won't pretend to understand what Scientific Notion is or that I have anything more than the most basic concept of what was just covered in this lesson - but I got every single answer on the quiz right so clearly something stuck.

I just want to thank you again for authoring this site Alex. I've learned so much in just three weeks.

Hi Alex!,

Can you give me your definition of the function "long"

maybe show me how to use it in code

Hi Camron!

"long" isn't a function but a keyword, just like unsigned for example.

"long" tell's the compiler to make to variable at least 32 bits in size.

Exactly how much memory is used is up to the compiler.

Writing "long double" asks the compiler to use a little more memory than for a normal "double" which causes the variable to be more accurate and/or be able to store bigger numbers.

More of this is covered in chapters 2.4 and B.2

I don't understand the question. "long" isn't a function, it's a type specifier that can be used to define a long integer or long double.

if i divided two integer the result must be float but i could not get that result why?

let you see the program>

#include <iostream>

#include <iomanip>

using namespace std;

int main()

{

int a=8,b=7;

float area;

area= (a/b);

cout << area << endl;

return 0;

}

Nope, if you divide two integers, you'll get an integer. We talk about this more in the next chapter. If you want a float result, you have to cast one of the integers to a float _before_ doing the division:

Actually what C++ ( and C) does is a "Tail division", getting you a total times 7 goes into 8 being 1, AND a remainder you can get by 8%7 so using 2 statements you will get a verry precise answer 1 1/7

Is it not inconsistent to keep the zero value significant digits for numbers with a decimal point but ignore those zero value digits for numbers without a decimal point?

For example:

b) 0.004000 = 4.000e-3 (4 significant digits) - in c++ style scientific notation, we are stating that this value is accurate to 4 significant digits.

d) 146000 = 1.46e5 (3 significant digits) - here we are ignoring the fact that this value may actually be accurate to 6 significant digits. We are throwing away 3 digits of precision just because those digits happen to be 0's.

For notation consistency, should d) not be 1.46000e5?

Or alternatively, to be consistent, b) should be 4e-3 and d) should be 1.45e5?

Nope. With zeros immediately to the right of the decimal, we can be sure they're significant (because those numbers can't be rounded). With zeros immediately to the left of the the decimal, we can't be sure whether they're significant or rounded. Thus, standard convention is that we assume numbers to the left of the decimal aren't treated as significant unless otherwise stated.

Thanks for the reply, I think I get it now. I'm used to engineering where we we would be consistent, like I said above. I realise that digits after the decimal point will only have a limited precision in c++, so that's why you have to include the significant digits. Hope I've got that right.

Can you explain your sentence here:

"Note that by default, floating point literals default to type double. An f suffix is used to denote a literal of type float."

When you initialise a variable, you have to define it's type, e.g. int, float or double etc. So in what circumstance would my computer need to default to a variable type, because every time you create a variable syntax dictates that you have to define it's type? Hope that makes sense. So if I am initialising a variable:

In what circumstance would the compiler default to a double?

So let's differentiate two things. Your variable has a type. Your literals also have a type. In your sample code, your variable x has type float. But your literal (because it does not have an f suffix) has a default type of double. Therefore, your compiler will have to implicitly convert double 1.1 to a float value, so it can be used to initialize variable x. By using 1.1f, the literal has type float, which matches with the type of your variable, so no conversion (and chance for error) is necessary.

I think I get it now. Also, I didn't consider expressions, where I could use the literal 1.1 in an equation, and I wouldn't have given it a type myself, so it will default to be type double. Thanks

Hi Alex

First, your C++ tutorials are superb - by far the best I found on the web. Clear explanations with lots of great examples!!

At the top of this page you write:

I am still not sure why the f suffix needs to be there (I read your answers to comments about this in the thread)

Reason I ask is that using

my compiler tells me that it takes f as a float whether I include the f suffix or not.

Perhaps some compilers need the "f" and some don't....?

5.0f means 5.0 as a float. 5.0 (no f suffix) means 5.0 as a double. If you initialize a float variable with 5.0 (no suffix), then you're assigning a double literal to a float variable, which means a conversion happens.

Your typeid(f).name() is showing you the type of the variable f, not the f suffix. Perhaps it's a bit confusing to have the variable have the same name as the suffix. I'll update that.

Hi Alex

Many thanks. That makes sense.

I guess the fact that the f suffix avoids an implicit type conversion from double to float had escaped me!

Rex

Hi Alex,

"The size of the variable puts a limit on the amount of information it can store -- variables that utilize more bytes can hold a wider range of values."

Data type : int

Size: 4 bytes

Range: -2,147,483,648 to 2,147,483,647

Date type: float

Size: 4 bytes

Range: ±1.18 x E-38 to ±3.4 x E38

I am wondering why floating point variables can hold much larger range of values than integer variables can since both have the same size of 4 bytes?

Thank you, Have a great day

Good question. The answer is that integers and floating point numbers use their bits differently, with each method having tradeoffs. Let's explore this with an exercise.

If I gave you 4 digits to make numbers out of, how many different numbers could you make? And what is the largest number you could make? (assume positive numbers only for simplicity). You might initially think: with 4 digits, you can make all the numbers between 0 and 9999, so 10000 numbers, with the highest being 9999. That's what integer representation does.

But consider what would happen instead if you used the first two digits as a base and the last two as an exponent. You could still make around 10000 different numbers, but now your largest number would be 99^99, which is much larger than 9999. However, the tradeoff is that you could no longer represent a number like 115 exactly in this scheme, because you no longer have enough precision to precisely represent this number. That's what floating point representation does.

Does that make sense?

Looks like we can't use scientific notation to assign values to integer variables?. Let's say if I'd like to assign 5973600000000000000000000 to an integer variable, I'd have to type 5973600000000000000000000, not 5.9736 x E24?

For example:

ll x = 5973600000000000000000000;

ll x = 5.9736 x E24; // is it ok?

Thank you.

You can do this, but it's not a good idea because these numbers are considered floating point literals, and thus subject to precision issues. With 32-bit integers, this isn't likely to be an issue, but with 64-bit integers, the precision of the floating point literal may be less than the number desired. Your compiler will probably also complain about the conversion (or reject it altogether if you're doing a uniform initialization, which disallows narrowing conversions)

Dear Teacher,

Please let me say you that I have a special interest in floating point numbers. Could you please suggest me a website about?

Regards.

What more do you want to learn about floating point numbers? In the comments of this article, there should be a link to an article on how floating point numbers are represented in memory, but it's pretty complicated.

Dear Teacher, please accept my thanks for you replied my comment. Already I have chosen Wikipedia's article "Single-precision floating-point format" as starting point for learn this subject. Regards.

Dear Teacher, could you please explain me why your second number in output of your program

is 3.333333333333334? I have used different platforms and output is 3.333333333333333. Regards.

Floating point operations are always imprecise to some degree. Some architectures may have better hardware support for precise calculations than others. They should never be different by much, but as you've just discovered, with floating point you can never assume a precise value.

Dear Teacher, please let me ask: by "In the comments of this article, there should be a link to an article on how floating point numbers are represented in memory, but it’s pretty complicated" do you mean https://en.wikipedia.org/wiki/Floating_point suggested by Mr. James Ray, posted by February 16, 2017 at 1:20 am? Regards.

No, not that one. I thought it was linked from the comments but apparently not, and I can't find it in my notes any more. If you look up "floating point representation ieee 754 denormalized" on Google I'm sure you'll get a ton of articles of interest.

Dear Georges, let me recommend you a good scholarly article about the topic, if you're interested and have enough knowledge to understand it https://docs.oracle.com/cd/E19422-01/819-3693/ncg_goldberg.html

Kind regards.

My dear Teacher,

Please let me answer your question: (is that 19 or 20 zeros?). That is 20 zeros!

With regards and friendship.

Hi Alex.

Whenever you talked about uniform initialization, you said that narrowing conversions were forbidden. Why is it that float value{ 1.0 } runs fine but float value{ 1.0 / 3 } does not? In both cases, aren't you converting a double to a float?

The standard says, "A narrowing conversion is an implicit conversion... from long double to double or float, or from double to float, except where the source is a constant expression and the actual value after conversion is within the range of values that can be represented (even if it cannot be represented exactly)"

So it looks like technically the 1.0 isn't considered a narrowing conversion in this context since it's a constant and within the range of a float. 1.0 / 3 doesn't work because it's not constant.

My dear c++ Teacher,

Can you please give me your definition of literal number?

With regards and friendship.

A fixed value that has been inserted into the code, such as 5, 6.7, or 'a'.

In the expression x + 5, 5 is an integer literal.

My dear Teacher,

Please let me thank you for replied my request, and say that phrase "When we assign literal numbers to floating point numbers" in the introduction of this lesson, is ambiguous. Do you mean "When literal numbers are floating point numbers"?

With regards and friendship.

I rewrote the sentence as, "When using floating point literals, it is convention to always include at least one decimal place". That should be a bit clearer. Thanks for pointing out the ambiguity.

Dear Teacher, please accept my thanks and congratulations for rewriting this sentence. Now it is clear. Regards.

Please write like a normal human.

Dear Teacher, please let me the question: Do you consider "dear Teacher" an abnormal human's expression? Regards.

No, but it's a bit overly formal for the internets. :)

Dear Teacher, please let me thank you for replied my comment.

1. In your comment there is not "reply" capability. Then I replied with Mr. William's "Reply" capability.

2. Also I please you forgive me for I use formal expressions. Although Greek, I live in France and like french savoir vivre as much I dislike cowboy's. Regards.

Dear Brother,

You're so funny.

With warm regards.

I had a hard time understanding this lesson. I got the jist of the first part but you lost me towards the end. Honestly, how important is it to understand ALL of this? Is it really going to be used much in everyday coding?

From a lesson standpoint, I don't think there are lessons that build on top of this one, so assuming you understand basic usage, you should be fine to continue the tutorials.

As for whether it's used in everyday coding, that depends entirely on what type of programs you're creating. In some cases, they're not used at all. In other cases, they are used all the time.

Yeah I suppose that makes sense. Alrighty then

Hey Alex,

In this code why did you use "n". I tried the code after omitting it and it worked just fine. Thanks in advance.

'\n' does a newline, same as std::endl. I realized I hadn't explained this yet, so I replaced the \n with std::endl for now.

Hi. Great set of tutorials. Thank you for sharing. I had a couple of questions.

1) I compiled and executed the following piece of code :

Giving the following output :

Shouldn't it have printed 1.23456 in the first line and 1.234567 in the second ?

2) Why do we need a suffix 'f' while initializing a variable of type float?

1) Floating point numbers round the last digit if more precision is available that can't be displayed. So 1.234568 rounds to 1.23457, not 1.23456.

2) The f suffix tells the compiler the literal is of type float. If you don't include the f suffix, your literal will be of type double instead.

Hey Alex, thank you very much for this wonderful website, but I’d like to ask you about this specific lesson and how a rounding error occurs.

I saw the code

d2 came up as 0.99999999999999989 as expected, and when I used

it also came as 0.99999999999999989, but when I tried

it came out as 1 perfectly fine, can you explain why this is? thank you very much for your tutorials and i hope this weird doesn't go unanswered, also if it is possible is there a desirable or better way of writing this code?

I'm not sure on this one. The topic of floating point representation of numbers (which is what produces precision issues) is an esoteric topic.

0.7 + 0.1 + 0.1 + 0.1; also produces 0.99999999999999989, but 0.8 + 0.1 + 0.1 produces 1.

That's a shame D: But thank you for answering regardless, have a good day :D

Question: doesnt std:: signify that something is in the standard library? yet we need <iomanip> for std::setprecision()?

Yes, std:: means it's in the standard library. However, the standard library is scattered across many different header files. std::setprecision is declared inside the std namespace in the iomanip header, so you should include that header if you want to use it.

Hi Alex, I have four problems about floating point numbers,

Here're my Questions:

1)In my computer architecture The ranges of long double and double are different(when I

sizeofthe long double and double types, I have12 bytesfor long double and8 bytesfor double) But the precision of double and long double is the same, Why?2)when I use float for summing the numbers 0.1 ten times I get 1 as expected result. Why this?

3)Can I have some explanations about how the compiler rounds the numbers, please if it's possible.

4) When I divide 0(integer) by 0

the program falls

but I divide 0.0(decimal) by 0

I have the expected result:

nanWhy an integer divided by 0 make the program falls?

1) That's the way it's defined. Presumably long double has a larger range.

2) Your compiler may be optimizing this and avoiding the precision error.

3) The details of how numbers are converted into floating point format is way outside the scope of these tutorials.

4) Dividing an integer by an integer produces an integer result. Dividing a floating point number by an integer produces a floating point result. NaN is a floating point result.

Thanks you very much for your replies Alex, I understood

Have a nice day!

So I've been trying to playing around with the precision of the various float types and how they can produce error, so I made this program:

and using VS Community 2015, it complied and output:

So three questions:

1) Why did I have to use 1.0f and 3.0f on all of my numbers? If I remove the 'f' on any of them (float, double float, or long double) then the code does not compile.

2) Why does it appear that my float type has the highest precision of the three results? I would think if 1/3 were a terminating decimal in binary and float had enough bits to hold it, I would simply get the same precision on all three results.

3) Why does it appear that both double and long double types have the same precision? Shouldn't long double allow for more bits to be used and would result in a more accurate decimal for 1/3?

1) With no f suffix, your floating point literals are treated as doubles. I'm not sure why this wouldn't compile -- perhaps your compiler is warning you about converting a double value back into a float, which can cause a loss of precision.

2) Most likely it has something to do with the fact that your literals are floats. Perhaps the compiler is doing some kind of optimization here helping to avoid a precision issue in that one case.

3) Your compiler may not support long double, and thus treat double and long double the same. Try doing a sizeof() each and see if they both come out as 8 bytes.

After carefully reading your definitions on double, float and long double I still don't know what they are for or how I can use them. I need some more in-depth confirmation on the subjects, but perhaps I found something that would kick me off to a start.

I wrote this code:

It will put up a prompt where I can type stuff, and it gathers 2 or more entries of numbers until I type "-1" where it will sum the numbers I've typed and it will display it. I noticed when I used "int a,z = 0;" that I was only able to type numbers without decimals. If I typed, "9" it would be fine, but if I typed, "9.2" the program would not work anymore. With double or float I am able to type a lot of numbers beyond the decimal, "9.2, 9.222, 9.1239123123123," anything goes. So my conclusion is, int cannot store enough memory for numbers with decimal points and other things like float and double can. Let me know if I'm correct or not, I'd love to know.

By the way, what should I refer commands like, "int, double, float, long double" to? I called them "things" in this question because I don't know what to call them. Info on that would be appreciated too. Thanks for the tutorials.

Integers (int, long, etc...) are designed for storing integer values (those with no decimal points). If you don't need to store decimal points, you should use an integer.

Floating point numbers are used for storing numbers that need a decimal. The only difference between float, double, and long double is in how many decimals they can store before running out of room.

int, double, float, double, etc... are called types (short for data types).

1. WHY 87 AND 87.000 ARE TREATED SAME? ONE IS CONSIDERED AS FLOAT AND OTHER AS INTEGER.

2. WHY WE SPECIFY F? EG. float f=4.76f;

3. WHY float f=1.0;

cout<<f;

PRINTS "1" ?

4. WHY THE RESULTS ARE IRRELEVANT

EG. float f=3.3333333333333;

cout<<f;

OUTPUT

3.33333326347 ????

THE FLOAT SHOULD SHOW ONLY 6-7 NUMBERS LIKE 3.333334

1) 87 is an integer literal, and 87.000 is a double literal.

2) The f suffix tells the compiler that a floating point literal should be interpreted as a float instead of a double.

3) std::cout doesn't print trailing zeros.

4) When I tried this on Visual Studio 2015, I got 3.33333, which is correct -- the default precision should be 6. It sounds like either your compiler is defaulting to some other precision, or you have a statement in some code above that's changing the default precision for your program.

What is the meaning of this: "\n" when we write std::cout << posinf << "\n";

I cover '\n' in the lesson on chars, which is later in this chapter.

I was wondering why some of my answers were wrong…

a) 34.50 => 3.45e1

b) 0.004000 => 4e-3

c) 123.005 => 1.23005e2

d) 146000 => 1.46e5

e) 146000.001 => 1.46000001e5

f) 0.0000000008 => 8e-10

g) 34500.0 => 3.45e4

With a little Google-Fu, I found out that, in science, when a measurement is made, 42 means something while 42.0000 means the measurement has been done with a much higher precision than in the first case. But this is the scientific reason.

What is the C++ reason why removing trailing 0s is wrong? Can someone give me an example where removing trailing 0s from the scientific notation leads to errors?

I tried to hunt the error with:

I added a new subsection called, "Precision and trailing zeros after the decimal" that talks more about this. In short, C++ doesn't care about trailing zeros after the decimal. Whether you include them or not is up to you.

f = 9.87654321f;

Why is f her??

The f suffix means the number is a float. I discuss this in a few lessons.

Thank you,

For your answer

Hi Alex,

When you set the precision of some floating point higher than its significant digits, where do all the numbers after the last significant digit come from (like in the example of 3.3333333333333)?

Not all decimal numbers can be represented precisely in floating point format. This can manifest in a few different ways:

1) Much like 1/3 ends up as 0.333333333 (repeating) in decimal, some numbers that have concise representations in decimal format have repeating representations in floating point format.

2) In some other cases, the exact decimal value can't be represented in floating point format, so the closest value that can be represented is picked. This can lead to cases where you wanted some number but get a number that's slightly smaller or larger than the one you were expecting.

"Each place you slide the decimal to the left increases the exponent by 1.

Each place you slide the decimal to the right decreases the exponent by 1."

Isn't it the other way around?

I r cunfosd

Nope.

Start with: 42030

This is the same as 42030.0

Slide decimal left 4 spaces: 4.2030e4

We slid the decimal 4 spaces to the left, so the exponent increased from 0 to 4.

that made me cringe.

5/1 = 5

5/0.5 = 10

5/-1 = -5

5/-0.5 = -10

as they approach 0 they are simultaneously negatively and positively 'infinite' and hence we say anything divided by 0 is undefined.

I am genuinely surprised and concerned that 5/0 does not return NaN or at least some error.

please watch this video;

https://www.youtube.com/watch?v=BRRolKTlF6Q

On Visual Studio 2015, Windows 10, Intel Core i3-2130 @3.40 GHz, 8 GB RAM, x64 processor, when I run:

I get the output:

inf

-inf

-nan(ind)

Learn more about NaN here: https://en.wikipedia.org/wiki/NaN.