Integers are great for counting whole numbers, but sometimes we need to store *very* large numbers, or numbers with a fractional component. A **floating point** type variable is a variable that can hold a real number, such as 4320.0, -3.33, or 0.01226. The *floating* part of the name *floating point* refers to the fact that the decimal point can “float”; that is, it can support a variable number of digits before and after the decimal point.

There are three different floating point data types: **float**, **double**, and **long double**. As with integers, C++ does not define the actual size of these types (but it does guarantee minimum sizes). On modern architectures, floating point representation almost always follows IEEE 754 binary format. In this format, a float is 4 bytes, a double is 8, and a long double can be equivalent to a double (8 bytes), 80-bits (often padded to 12 bytes), or 16 bytes.

Floating point data types are always signed (can hold positive and negative values).

Category | Type | Minimum Size | Typical Size |
---|---|---|---|

floating point | float | 4 bytes | 4 bytes |

double | 8 bytes | 8 bytes | |

long double | 8 bytes | 8, 12, or 16 bytes |

Here are some definitions of floating point numbers:

1 2 3 |
float fValue; double dValue; long double ldValue; |

When using floating point literals, always include at least one decimal place (even if the decimal is 0). This helps the compiler understand that the number is a floating point number and not an integer.

1 2 3 |
int x{5}; // 5 means integer double y{5.0}; // 5.0 is a floating point literal (no suffix means double type by default) float z{5.0f}; // 5.0 is a floating point literal, f suffix means float type |

Note that by default, floating point literals default to type double. An f suffix is used to denote a literal of type float.

Best practice

Always make sure the type of your literals match the type of the variables they’re being assigned to or used to initialize. Otherwise an unnecessary conversion will result, possibly with a loss of precision.

Warning

Make sure you don’t use integer literals where floating point literals should be used. This includes when initializing or assigning values to floating point objects, doing floating point arithmetic, and calling functions that expect floating point values.

Printing floating point numbers

Now consider this simple program:

1 2 3 4 5 6 7 8 |
#include <iostream> int main() { std::cout << 5.0 << '\n'; std::cout << 6.7f << '\n'; std::cout << 9876543.21 << '\n'; } |

The results of this seemingly simple program may surprise you:

5 6.7 9.87654e+06

In the first case, the std::cout printed 5, even though we typed in 5.0. By default, std::cout will not print the fractional part of a number if the fractional part is 0.

In the second case, the number prints as we expect.

In the third case, it printed the number in scientific notation (if you need a refresher on scientific notation, see lesson 4.7 -- Introduction to scientific notation).

Floating point range

Assuming IEEE 754 representation:

Size | Range | Precision |
---|---|---|

4 bytes | ±1.18 x 10^{-38} to ±3.4 x 10^{38} |
6-9 significant digits, typically 7 |

8 bytes | ±2.23 x 10^{-308} to ±1.80 x 10^{308} |
15-18 significant digits, typically 16 |

80-bits (typically uses 12 or 16 bytes) | ±3.36 x 10^{-4932} to ±1.18 x 10^{4932} |
18-21 significant digits |

16 bytes | ±3.36 x 10^{-4932} to ±1.18 x 10^{4932} |
33-36 significant digits |

The 80-bit floating point type is a bit of a historical anomaly. On modern processors, it is typically implemented using 12 or 16 bytes (which is a more natural size for processors to handle).

It may seem a little odd that the 80-bit floating point type has the same range as the 16-byte floating point type. This is because they have the same number of bits dedicated to the exponent -- however, the 16-byte number can store more significant digits.

Floating point precision

Consider the fraction 1/3. The decimal representation of this number is 0.33333333333333… with 3’s going out to infinity. If you were writing this number on a piece of paper, your arm would get tired at some point, and you’d eventually stop writing. And the number you were left with would be close to 0.3333333333…. (with 3’s going out to infinity) but not exactly.

On a computer, an infinite length number would require infinite memory to store, and typically we only have 4 or 8 bytes. This limited memory means floating point numbers can only store a certain number of significant digits -- and that any additional significant digits are lost. The number that is actually stored will be close to the desired number, but not exact.

The precision of a floating point number defines how many *significant digits* it can represent without information loss.

When outputting floating point numbers, std::cout has a default precision of 6 -- that is, it assumes all floating point variables are only significant to 6 digits (the minimum precision of a float), and hence it will truncate anything after that.

The following program shows std::cout truncating to 6 digits:

1 2 3 4 5 6 7 8 9 10 11 |
#include <iostream> int main() { std::cout << 9.87654321f << '\n'; std::cout << 987.654321f << '\n'; std::cout << 987654.321f << '\n'; std::cout << 9876543.21f << '\n'; std::cout << 0.0000987654321f << '\n'; return 0; } |

This program outputs:

9.87654 987.654 987654 9.87654e+006 9.87654e-005

Note that each of these only have 6 significant digits.

Also note that std::cout will switch to outputting numbers in scientific notation in some cases. Depending on the compiler, the exponent will typically be padded to a minimum number of digits. Fear not, 9.87654e+006 is the same as 9.87654e6, just with some padding 0’s. The minimum number of exponent digits displayed is compiler-specific (Visual Studio uses 3, some others use 2 as per the C99 standard).

The number of digits of precision a floating point variable has depends on both the size (floats have less precision than doubles) and the particular value being stored (some values have more precision than others). Float values have between 6 and 9 digits of precision, with most float values having at least 7 significant digits. Double values have between 15 and 18 digits of precision, with most double values having at least 16 significant digits. Long double has a minimum precision of 15, 18, or 33 significant digits depending on how many bytes it occupies.

We can override the default precision that std::cout shows by using the std::setprecision() function that is defined in the *iomanip* header.

1 2 3 4 5 6 7 8 9 |
#include <iostream> #include <iomanip> // for std::setprecision() int main() { std::cout << std::setprecision(16); // show 16 digits of precision std::cout << 3.33333333333333333333333333333333333333f <<'\n'; // f suffix means float std::cout << 3.33333333333333333333333333333333333333 << '\n'; // no suffix means double return 0; } |

Outputs:

3.333333253860474 3.333333333333334

Because we set the precision to 16 digits, each of the above numbers is printed with 16 digits. But, as you can see, the numbers certainly aren’t precise to 16 digits! And because floats are less precise than doubles, the float has more error.

Precision issues don’t just impact fractional numbers, they impact any number with too many significant digits. Let’s consider a big number:

1 2 3 4 5 6 7 8 9 10 |
#include <iostream> #include <iomanip> // for std::setprecision() int main() { float f { 123456789.0f }; // f has 10 significant digits std::cout << std::setprecision(9); // to show 9 digits in f std::cout << f << '\n'; return 0; } |

Output:

123456792

123456792 is greater than 123456789. The value 123456789.0 has 10 significant digits, but float values typically have 7 digits of precision (and the result of 123456792 is precise only to 7 significant digits). We lost some precision! When precision is lost because a number can’t be stored precisely, this is called a rounding error.

Consequently, one has to be careful when using floating point numbers that require more precision than the variables can hold.

Best practice

Favor double over float unless space is at a premium, as the lack of precision in a float will often lead to inaccuracies.

Rounding errors make floating point comparisons tricky

Floating point numbers are tricky to work with due to non-obvious differences between binary (how data is stored) and decimal (how we think) numbers. Consider the fraction 1/10. In decimal, this is easily represented as 0.1, and we are used to thinking of 0.1 as an easily representable number with 1 significant digit. However, in binary, 0.1 is represented by the infinite sequence: 0.00011001100110011… Because of this, when we assign 0.1 to a floating point number, we’ll run into precision problems.

You can see the effects of this in the following program:

1 2 3 4 5 6 7 8 9 10 11 |
#include <iostream> #include <iomanip> // for std::setprecision() int main() { double d{0.1}; std::cout << d << '\n'; // use default cout precision of 6 std::cout << std::setprecision(17); std::cout << d << '\n'; return 0; } |

This outputs:

0.1 0.10000000000000001

On the top line, std::cout prints 0.1, as we expect.

On the bottom line, where we have std::cout show us 17 digits of precision, we see that d is actually *not quite* 0.1! This is because the double had to truncate the approximation due to its limited memory. The result is a number that is precise to 16 significant digits (which type double guarantees), but the number is not *exactly* 0.1. Rounding errors may make a number either slightly smaller or slightly larger, depending on where the truncation happens.

Rounding errors can have unexpected consequences:

1 2 3 4 5 6 7 8 9 10 11 12 13 |
#include <iostream> #include <iomanip> // for std::setprecision() int main() { std::cout << std::setprecision(17); double d1{ 1.0 }; std::cout << d1 << '\n'; double d2{ 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 }; // should equal 1.0 std::cout << d2 << '\n'; } |

1 0.99999999999999989

Although we might expect that d1 and d2 should be equal, we see that they are not. If we were to compare d1 and d2 in a program, the program would probably not perform as expected. Because floating point numbers tend to be inexact, comparing floating point numbers is generally problematic -- we discuss the subject more (and solutions) in lesson 5.6 -- Relational operators and floating point comparisons.

One last note on rounding errors: mathematical operations (such as addition and multiplication) tend to make rounding errors grow. So even though 0.1 has a rounding error in the 17th significant digit, when we add 0.1 ten times, the rounding error has crept into the 16th significant digit. Continued operations would cause this error to become increasingly significant.

Key insight

Rounding errors occur when a number can’t be stored precisely. This can happen even with simple numbers, like 0.1. Therefore, rounding errors can, and do, happen all the time. Rounding errors aren’t the exception -- they’re the rule. Never assume your floating point numbers are exact.

A corollary of this rule is: be wary of using floating point numbers for financial or currency data.

NaN and Inf

There are two special categories of floating point numbers. The first is Inf, which represents infinity. Inf can be positive or negative. The second is NaN, which stands for “Not a Number”. There are several different kinds of NaN (which we won’t discuss here). NaN and Inf are only available if the compiler uses a specific format (IEEE 754) for floating point numbers. If another format is used, the following code produces undefined behavior.

Here’s a program showing all three:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
#include <iostream> int main() { double zero {0.0}; double posinf { 5.0 / zero }; // positive infinity std::cout << posinf << '\n'; double neginf { -5.0 / zero }; // negative infinity std::cout << neginf << '\n'; double nan { zero / zero }; // not a number (mathematically invalid) std::cout << nan << '\n'; return 0; } |

And the results using Visual Studio 2008 on Windows:

1.#INF -1.#INF 1.#IND

*INF* stands for infinity, and *IND* stands for indeterminate. Note that the results of printing *Inf* and *NaN* are platform specific, so your results may vary.

Best practice

Avoid division by 0 altogether, even if your compiler supports it.

Conclusion

To summarize, the two things you should remember about floating point numbers:

1) Floating point numbers are useful for storing very large or very small numbers, including those with fractional components.

2) Floating point numbers often have small rounding errors, even when the number has fewer significant digits than the precision. Many times these go unnoticed because they are so small, and because the numbers are truncated for output. However, comparisons of floating point numbers may not give the expected results. Performing mathematical operations on these values will cause the rounding errors to grow larger.

4.9 -- Boolean values |

Index |

4.7 -- Introduction to scientific notation |

I wanted to ask about the division by zero. While I am sure you understand the mathematics behind why it is undefined, here in your code it indicates that it will return infinity. Is it the double having a decimal that causes this to happen rather than just giving an error instead? I ask because I want to understand what the code is doing under the hood so that I never mistakenly fall into this trap. Thanks and great set of lessons!

The short answer is that it works that way because that's how IEEE 754 (the standard to which floating point number implementations adhere) defines it that way. See http://grouper.ieee.org/groups/754/faq.html#exceptions.

Thanks! I will bear this in mind when I wish to reprogram my coffee pot! ;)

It does make sense though from that perspective, hardware can be stupid and this has to be adaptable to that.

Dear Alex, Is there a way to compare integer value with float type. Like If I want to write a code which takes input number from user and finds out if its a prime/even/odd number? Consider if for prime, division by any number other than 1 and the entered integer, would result in a fraction Now if i had taken input from user in Int and it just got converted to a fraction (which should be stored in a Float variable) how would I knoe if any thing like that has happened? OfCourse one can use as manu If else statements to compare the division or even the inserted number but I want to write smalest code for frist 100 or 200 prime numbers.

Hi waqas,

> Now if i had taken input from user in Int and it just got converted to a fraction

I'm having a hard time understanding what you're asking.

Instead of doing a floating point division on an integer to see if the result is a fraction or not, you'd be better off using the modulus operator (%) to see if doing an integer division yields a modulus. You can use a for loop to loop through all of the possible divisors to see if any of them have a modulus of 0. Modulus is covered in chapter 3, and loops are covered in chapter 5.

Compiler says:

"error: 'setprecision' was not declared in this scope."

but when I delete "cout" from "std::cout << setprecision(x) " (e.g. std::setprecision(x)), it works as expected. Pretty confusing for beginners like me.

I updated the lesson to make it more clear that std::setprecision() lives in the std:: namespace.

f = 9876549.21f in this case would scientific notation be 9.87655e006 because of after 6th specific digit there is 9? Mathematically 0.49 can be equal to 5.So could it be same in c++ ?

The precise scientific notation would be 9.87654921e006.

However, because std::cout rounds to 6 significant digits, 9.87654921e006 would be rounded to 9.87655e006.

This is something that is easy to verify yourself with a simple program:

This is driving me crazy! Do floats have 6 or 7 digits of precision?

"When outputting floating point numbers, cout has a default precision of 6 -- that is, it assumes all variables are only significant to 6 digits, and hence it will truncate anything after that."

"The value 123456789.0 has 9 significant digits, but float only supports about 7 digits of precision"

"4 bytes ±1.18 x 10-38 to ±3.4 x 1038 7 significant digits"

The answer to this question is complicated due to the way floating point numbers are stored. Most floats have (at least) 7 digits of precision, however there is a subset of the floating point numbers that only have 6 digits of precision. I'll update the article to indicate 6 instead of 7.

Outputs:

0.10000000000000001

0.99999999999999989

In my machine. So, when I use double without setprecission(17), It shows the correct results: 0.1 and 1

So my question is, without using precission(17), do they store the same value? Or double without precission stores 0.10000000000000001 but somehow does a rounding to 0.1?

Greaat tutorials by the way! You are helping a lott!!

Cout rounds the numbers for display based on the precision. Internally, they're stored as 0.10000000000000001 and 0.99999999999999989.

I have to point this out, for the sake of correct math:

First off, infinity is not a number, it's a concept.

Second, x/0 is not and never will be infinity or -infinity.

Numberphile did a good video that explains the division by 0 problem: https://www.youtube.com/watch?v=BRRolKTlF6Q

Ah mathemagicians and their concepts. Almost as bad as physicist.

I understood everything perfectly except one thing: why are we leaving trailing zeroes untrimmed when the original number has a decimal point?

Good question. If we write a large number with no decimal point, e.g. 21,000,000,000, it's not clear whether we mean "exactly 21 billion", or "somewhere around 21 billion". More often than not, we mean "Somewhere around 21 billion" (because in most cases, the difference between 21,000,000,000 and 21,000,000,001 is insignificant).

So when we convert 21,000,000,000, we assume that the trailing zeros are not significant.

However, when the number has a decimal point, e.g. 21,000,000,000.01, or even 21,000,000,000.0, it's clear that this number is exact, otherwise we wouldn't have provided the decimal point. So in the case where a decimal is provided, we assume all of the trailing zeros are significant.

Make sense?

Your explanation makes sense, but does the compiler care? What does it do differently? Or is this just to let the code document what we know about the number?

I'm guessing that the compiler would consider 21,000,000,000.0 to be equivalent to 21,000,000,000. So although we would probably say one has a high precision and one doesn't, the compiler likely treats them both as low-precision numbers.

In your 0.1 example you forgot to add "#include <iostream>" at the top

Fixed.

So... Just to clarify, the varying results seen in the above examples are caused by varying error(s) in rounding?

Eg, each time you round to the hundreds/hundredths place, it will be different from rounding to the tens/tenths place. And with that, the higher you the place you round to, the more accurate the answer will be (As with standard math). Is this correct?

Also, I'm getting slightly different results than what's presented above. For instance, what's 3.333333333333334 to you is 3.333333333333333 on my machine, despite rounding to the same place. Just thinking about that makes me uncomfortable!

Edit: it's also worth noting that despite the fact that the comment box says we can use certain HTML tags, it automatically encodes the brackets.

I'm not sure which examples you're referring to. But generally speaking, the fact that floating point numbers have limited precision leads to rounding errors, especially in cases where a number is represented as an infinite sequence in binary (like 0.1).

re: The HTML tags issue, I used to allow HTML on the site, but I've installed a plugin to treat all text literally so that people who post code snippets won't have lines such as "#include" treated as invalid HTML. I'll see if I can remove/disable the misleading text.

Hello, I'm not sure if you'll read this but I'm confused about some of the outputs:

9.87654e+006

9.87654e-005

Why is 006 and 005 after the e?

How does that make sense?

I don't get what 9.87654e+006 means... I thought you could only have numbers like 6 or -3 and stuff after the e not numbers like 006.

I'm confused now... Shouldn't it just be 9.87654321e6 ? Why is it 006 ?

PS: I noticed your site is built on WordPress and I think it would be really helpful if you got a plugin that would email commenters when someone replies, would save me a lot of time, lol.

Anyway, thanks a lot, I'm finding this site really helpful :)

9.87654e+006 is the same as 9.87654e6. Visual Studio always prints exponents with at least 3 digits. Other compilers print exponents with at least 2 (as required by C99). In other words, it's implementation specific.

Oh right, Ok I understand thanks a lot!

SIR PLZ TEL ME HOW TO PRINT ON OUTPUT SCREEN 14 ND 0.5 IF THE GIVEN VALUE IS 14.5 BASICALLY I WANT TO PRINT IT SEPARATELY ON OT PUT SCREEN BY FLOAT ND INT DATA TYPE

Casting your float to an int via static_cast will drop the remainder, allowing you to print the integer. You can then subtract the integer from the floating point value to get just the remainder.

whats the difference between:

cout<<precision(17)

&

cout.setprecision(17)

With Visual Studio, cout << setprecision() calls cout.precision() internally. So they essentially do the same thing. Calling cout.precision() is probably slightly more efficient, but unless you're calling it hundreds of times, it won't really matter.

I'm perplexed! I'm working through Bjarne S's book and have done one of the drills but I don't understand what's going on with varying results.

Here's some output;

9.99

10

smaller is a 9.99 the larger is b 10

result 0.01

they are almost equal

99.99

100

smaller is a 99.99 the larger is b 100

result 0.01

199.99

200

smaller is a 199.99 the larger is b 200

result 0.01

they are almost equal

Why doesn't entering 99.99 and 100 give the message "they are almost equal"?

This is an interesting example of rounding errors.

10 - 9.99 = 0.01, but due to rounding error, C++ is representing this as 0.0099999999999997868

100 - 99.99 = 0.01, but due to rounding error, C++ is representing this as 0.0100000000000005116

One of this is larger than 0.01, and one is smaller.

Argggh! I almost went insane yesterday evening try to see where I'd gone wrong.

How does one avoid rounding errors? I can think of any number of applications where rounding errors even at the level of precision in my example might be disastrous.

There are a couple of ways to "avoid" rounding errors:

1) Avoid use of floating point numbers altogether (sometimes this is possible, sometimes it isn't).

2) Don't do raw comparisons like you're doing. In section 3.5 -- Relational operators (comparisons), we discuss how to tell if floating point numbers are equal. These can be extended to handle less then/greater than cases. This would help avoid the case you see above.

3) Ensure that when you use floating point numbers, you only treat them as accurate to a certain level of precision.

I don't know why all outputs are same.

Please someone help me.(VS2015,64 bit Windows)

Thank you!!

Your float doesn't have enough precision to store the entire number, so it's being truncated. If you change f to a double (and remove the f suffix on the literals) then you'll see that the numbers print differently.

Sorry, I still feel confused.

So even i use setprecision, i still cannot change precision.

Is my understanding correct?

Thanks for your answering!

setprecision can't show more significant digits than the underlying number has. A float only has 6 to 9 digits of precision, so you're generally only going to get between 6 and 9 digits of precision even if you ask for more.

The number you picked (9876543.21f) has 9 significant digits, but because you tried to put it in a float, it got truncated to 7 significant digits (9876543) internally. So regardless of whether you ask for 7, 8, or 9 significant digits of precision, you're only going to get 7 because that's all the underlying number that got stored can offer.

Got it.

Thank you Alex!

How to control the numbers after decimal point in C++?

Like in C language if i take a floating variable f = 123.4567 and i want to show only 2 numbers after decimal point than i will use printf("%0.2f").Then it will show 123.45.

In C++ i have to use setprecision().But It determines total numbers not just numbers after decimal point.So, it makes problem.Like if i dont know that what numbers my floating variable will contain after calculation it can contain 123.123 or 1234.123 so if i set precision to 5 for first case it will show 123.12 and for second case it will show 1234.1!But i always want to show 2 numbers after decimal points for every case.How can i do that in c++?

i have same question like you

Use the

`std::fixed`

stream manipulator and the member function`std::precision`

.For example if you want to display with 2 decimal places:

double pi=3.14159;

std::cout.precision(2);

std::cout << "Today's price for a slice of pi is $" << std::fixed << pi << std::endl;

and it should print:

`Today's price for a slice of pi is $3.14`

Hi people,

I am new to C++ so please don't flame me :)

I wrote a simple prog. to test this course but something isn't really working well and I can't figure out why...

#include<iostream>

#include<string>

#include <iomanip> // for setprecision()

using namespace std;

main()

{

cout<<setprecision(7); //7 decimals

float v = 1;

float j = 3;

float cc;

cc = v/j;

//TEST with FLOAT NUMMERS

float ff = 0.3333333; // 7 decimals as set in "setprecusion(7);"

if(cc<ff) {

cout<<"cc is smaller then ff"<<endl;

}

else if(cc>ff) {

cout<<"cc is bigger then ff"<<endl;

}

else{

cout<<"cc equals to ff"<<endl;

}

cout<<cc<<" = cc"<<endl;

cout<<ff<<" = ff"<<endl;

`return 0;`

}

The output gives me that cc is bigger than ff...

I don't understand why as I set precision to 7 and my var ff has also 7 decimals.

They should both be equal.

Any suggestions where I made an error?

Thanks!!

I Have Tasted Your Code.

No, You didn't make any wrong.

I think setprecision() function is only for setting the precision at time of showing your variable when you use cout.

I mean setprecision() cant change your variable.Like in your code cc = v/j so cc is stored as 0.33333333333333333333333333.......... & setprecision() cant change this.You stored ff as 0.3333333.

0.3333333333333333........ is greater than 0.3333333 isnt it?So your code show cc is bigger than ff.

Read the "Comparison of floating point numbers" part of this tutorial.It dosent say that you can use setprecision() for Comparison of floating point numbers.

Thank You

This is a very good article on the floating-point computation issue: "Microsoft Visual C++ Floating-Point Optimization", by Eric Fleegal, MSDN, 2004

http://msdn.microsoft.com/en-us/library/aa289157(v=vs.71).aspx

When I run this code, I get z = 0.333333 and q = 0

float x = 1;

float y = 3;

float z = (x/y);

float q = (1/3);

Can someone explain why? I realize that if I write

float q = (1.0/3.0);

that this problem doesn't occur, but I'm just wondering why I can't use (1/3) since q is defined as a float. This page says it's just a convention to have the decimal point.

Think it through as follows:

float x = 1 reads "put INT 1 into FLOAT x." This changes its type from int to float. The same is true for float y = 3.

Thus float z = x/y divides two floats and returns a float.

However for float q = (1/3), this is a two part statement.

The first part (1/3) reads "divide INT 1 by INT 3". Since this is division of two integers, this means it must return an integer (the floor), which in this case is 0.

The second part is then q = 0, which reads "put INT 0 into float q."

An important thing to keep in mind is that division on a float is different than division on an integer. The literal 1 is read as an integer, however, the literal 1.0 is read as a float/double. This is why q = (1.0/3.0) is different than q = (1/3).

Hope this helped.

thank you!

Yup, 1 / 3 performs integer division (which gives an answer of 0, as the fractional component is dropped) , whereas 1.0 / 3.0 performs floating point division (which gives an answer of 0.333333...)

Hi there! Congratulation, very good explanation!!! Just what I was looking for.

Thank you.

When I run the following code, the values seem really wrong when output. What is going wrong here?

your C ++ compiler has a tendency to roundoff 8th precision onwards.

For any value lesser then 8;

It will display 1 lesser than called for.

Didn't understand how 0.1 is represented in binary by 0.00011001100110011…

In decimal, .1 is tenths, .01 is hundredths, .001 is thousandths and so on. Likewise, in binary, .1 is halves, .01 is quarters, .001 is eights, and so on.

0.000110011... would be equal to 1/16 + 1/32 + 1/256 + 1/512 + ...

Ok, so in binary, we have to approximate the 1 of decimal 0.1 with an infinite sum. What I still don't understand is why we don't use all the "weights", that is, all the powers of 2, but only 1/16, 1/32, 1/256, 1/512 and so on, that is, the 4th position (2^4 = 16), the 5th, the 8th, the 9th, and so on. In other words, why don't we have 0.011111111........ which is equal to 1/2 + 1/4 + 1/8 + 1/16 + ...? It also approaches 1! (I am referring of course to the decimal part of 0.1, that is, the 1.)

That's exactly why you can't use every power of 1/2. The infinite sum would add up to 1, which is ten times the number we require. In order for the sum to add up to 0.1, you would need to add Sum[(1/2)^4n + (1/2)^(4n+1)], taking n from 1 to infinity. You can try it yourself if you want.

Yes, you are right, this sum indeed converges to 0.1, whereas the sum I used converges to 1.0. Thank you for the clear and concise explanation!

There is also a good explanation in Wikipedia (yes, sometimes - not often though - Wikipedia has good articles):

"Fractions in binary

Fractions in binary only terminate if the denominator has 2 as the only prime factor. As a result, 1/10 does not have a finite binary representation, and this causes 10 × 0.1 not to be precisely equal to 1 in floating point arithmetic. As an example, to interpret the binary expression for 1/3 = .010101..., this means: 1/3 = 0 × 2^(-1) + 1 × 2^(-2) + 0 × 2^(-3) + 1 × 2^(-4) + ... = 0.3125 + ... An exact value cannot be found with a sum of a finite number of inverse powers of two, and zeros and ones alternate forever."

Follows a table of the conversion (fractional approximations) for fractions from decimal to binary. For the ones who are interested:

http://en.wikipedia.org/wiki/Binary_numeral_system

How do I convert a Float say

x = 1234.567890123456789

to

y = 1234.5678901234 (small float ..10 decimal places only)

Something similar to setPrecision, to use NOT for display/Printing, but to use as a value for calculations / pass it on to a Database etc ?

I'm not sure what the best way to do this is. For small numbers, you can multiply by 10^x, cast to an integer to drop the remaining decimals, then divide by 10^x. However, if your number is too large you'll overflow the int when you do the casting so I won't say this is foolproof.

I set the precision level to 4, and added cout for the 2 values, fValue1 + fValue2.

I got fValue1 IS actually rounded off to 1.345 and fValue2 IS actually 1.123, expecting now the get the result of 2.468, but still reports 'fTotal is not 2.468'

Why is that?

Chris

Rounding error. The numbers printed on your screen by cout are rounded in this case, so you're not seeing the full representation. However, when you do the comparison, it does so with the actual numbers, not the rounded ones, which can lead to rounding issues.

how can i make the value a user inputs into a float?

then when i run the program from main() and i put in 2 values like eg. x = 10 y = 3 then the answer is 3 instead of 3.333333

You are already storing the user input values as a float. The problem is that your function is returning an integer, so it's truncating the result of x/y. Change your function to return a float and you will be good.

Hi Alex, really great guide. Enjoying the challenges its throwing up so far. With regards to functions returning integer's, how do you change a function to actually return a float? My little "add" program is working but it cuts of the decimal? so if I have x = 4.5 + y = 5.5 it will give me the answer of 9! im probably missing something ridiculously obvious but if you could help that would be great!

Just change the return type of the function from int to float (or even better, double).

However, if x and y are ints, and you try to assign fractional values to them (e.g. 4.5 or 5.5) the fractions will get lost. So you may need to make those floats/doubles as well.

this is very help full site

So with all the rounding errors and precision problems, how do programmers deal with operations that need to display something that would end up with a precision or rounding error? Or am I just over-thinking things?

Most of the time it's simply not necessary to display a number to the number of significant digits where precision/rounding errors creep in. Generally with floating point numbers, programs will truncate the display to 2-5 decimals.

For applications which need highly precise and extreme ranges, developers rely on libraries developed specifically to handle big and high precision numbers. Phrases describing them include: arbitrary-precision arithmetic, bignum arithmetic, multiple precision arithmetic, or sometimes infinite-precision arithmetic. That's how they get past issues of rounding and precision errors because they are written to maintain accuracy using a variety of techniques. Beyond this tutorial.

Coming form a Java background, I wonder if anyone can advise me a C++ library with a similar function as Java's BigDecimal.

Preferably one that works on Linux with gcc(so not the decimal type from Visual C++)

Is there something wrong with my code?

The compiler brings up a problem with setprecision()..

Thanks

You need to include iomanip.h to use setprecision() that way.

See the lesson on ostream for more info about output manipulators and stuff.

What do the f's after some of the float and double values mean?

By default, if you type a floating point value into C++ it's typed as a double. Consequently, if you do something like this:

You're assigning a double to a float, which loses precision, and the compiler will probably complain.

Putting an "f" after the value means that you intend that value to be a float, not a double. Then when you do this:

You're assigning a float value to a float variable, which makes more sense.

I missed something. How is it that 4.53 is a double and not a float?

The 4.53 is a literal constant of type

`double`

by default. When you add the f suffix to it like 4.53f it then becomes a literal constant of type`float`

it might be good if you add this explanation in the lesson instead of just a comment in the code. I was wondering myself about the usage of the f suffix due overlooking the comment.

Good idea. Done.

Alex,

It seems kind of dumb to inatialize a variable as a float, and then specify that you want a float to be output, by adding the f suffix. {example 9.87654321f }

Except for the fact that all floats are defaulted as double in C++. I've ran your example with all the numbers with the f suffix and without and get the same output. My PC has i7 processor. Why use float at all, since both use four bytes?

Your lesson are great, Thanks

I'm not quite sure I understand what you're getting at. Doubles are generally 8 bytes, and floats are normally 4.

It is far less confusing after reading this. Thank you. In class we covered this in about ten minutes and moved on to the next thing.

I've been to a few sites already trying to get a grasp on these (floats/doubles) and this summation really did the trick.

Thanks!