Integers are great for counting whole numbers, but sometimes we need to store very large numbers, or numbers with a fractional component. A **floating point** type variable is a variable that can hold a real number, such as 4.0, 2.5, 3.33, or 0.1226. There are three different floating point data types: **float**, **double**, and **long double**. A float is usually 4 bytes and a double 8 bytes, but these are not strict requirements, so sizes may vary. Long doubles were added to the language after it’s release for architectures that support even larger floating point numbers. But typically, they are also 8 bytes, equivalent to a double. Floating point data types are always signed (can hold positive and negative values).

Here are some declarations of floating point numbers:

1 2 3 |
float fValue; double dValue; long double dValue2; |

The *floating* part of the name *floating point* refers to the fact that a floating point number can have a variable number of decimal places. For example, 2.5 has 1 decimal place, whereas 0.1226 has 4 decimal places.

When we assign numbers to floating point numbers, it is convention to use at least one decimal place. This helps distinguish floating point values from integer values.

1 2 |
int nValue = 5; // 5 means integer float fValue = 5.0; // 5.0 means floating point |

How floating point variables store information is beyond the scope of this tutorial, but it is very similar to how numbers are written in scientific notation. **Scientific notation** is a useful shorthand for writing lengthy numbers in a concise manner. In scientific notation, a number has two parts: the significand, and a power of 10 called an exponent. The letter ‘e’ or ‘E’ is used to separate the two parts. Thus, a number such as 5e2 is equivalent to 5 * 10^2, or 500. The number 5e-2 is equivalent to 5 * 10^-2, or 0.05.

In fact, we can use scientific notation to assign values to floating point variables.

1 2 3 4 5 |
double dValue1 = 500.0; double dValue2 = 5e2; // another way to assign 500 double dValue3 = 0.05; double dValue4 = 5e-2; // another way to assign 0.05 |

Furthermore, if we output a number that is large enough, or has enough decimal places, it will be printed in scientific notation:

1 2 3 4 5 6 7 8 9 10 11 |
#include <iostream> int main() { using namespace std; double dValue = 1000000.0; cout << dValue << endl; dValue = 0.00001; cout << dValue << endl; return 0; } |

Outputs:

1e+006 1e-005

**Precision**

Consider the fraction 1/3. The decimal representation of this number is 0.33333333333333… with 3’s going out to infinity. An infinite length number would require infinite memory, and we typically only have 4 or 8 bytes. Floating point numbers can only store a certain number of digits, and the rest are lost. The **precision** of a floating point number is how many digits it can represent without information loss.

When outputting floating point numbers, cout has a default precision of 6 — that is, it assumes all variables are only significant to 6 digits, and hence it will truncate anything after that.

The following program shows cout truncating to 6 digits:

1 2 3 4 5 6 7 8 9 10 11 12 |
#include <iostream> int main() { using namespace std; float fValue; fValue = 1.222222222222222f; cout << fValue << endl; fValue = 111.22222222222222f; cout << fValue << endl; fValue = 111111.222222222222f; cout << fValue << endl; } |

This program outputs:

1.22222 111.222 111111

Note that each of these is only 6 digits.

However, we can override the default precision that cout shows by using the setprecision() function that is defined in a header file called iomanip.

1 2 3 4 5 6 7 8 9 10 11 |
#include <iostream> #include <iomanip> // for setprecision() int main() { using namespace std; cout << setprecision(16); // show 16 digits float fValue = 3.33333333333333333333333333333333333333f; cout << fValue << endl; double dValue = 3.3333333333333333333333333333333333333; cout << dValue << endl; |

Outputs:

3.333333253860474 3.333333333333334

Because we set the precision to 16 digits, each of the above numbers has 16 digits. But, as you can see, the numbers certainly aren’t precise to 16 digits!

Variables of type float typically have a precision of about 7 significant digits (which is why everything after that many digits in our answer above is junk). Variables of type double typically have a precision of about 16 significant digits. Variables of type double are named so because they offer approximately double the precision of a float.

Now let’s consider a really big number:

1 2 3 4 5 6 7 8 9 |
#include <iostream> int main() { using namespace std; float fValue = 123456789.0f; cout << fValue << endl; return 0; } |

Output:

1.23457e+008

1.23457e+008 is 1.23457 * 10^8, which is 123457000. Note that we have lost precision here too!

Consequently, one has to be careful when using floating point numbers that require more precision than the variables can hold.

**Rounding errors**

One of the reasons floating point numbers can be tricky is due to non-obvious differences between binary and decimal (base 10) numbers. In normal decimal numbers, the fraction 1/3rd is the infinite decimal sequence: 0.333333333… Similarly, consider the fraction 1/10. In decimal, this is easy represented as 0.1, and we are used to thinking of 0.1 as an easily representable number. However, in binary, 0.1 is represented by the infinite sequence: 0.00011001100110011…

You can see the effects of this in the following program:

1 2 3 4 5 6 7 8 |
#include <iomanip> int main() { using namespace std; cout << setprecision(17); double dValue = 0.1; cout << dValue << endl; } |

This outputs:

0.10000000000000001

Not quite 0.1! This is because the double had to truncate the approximation due to it’s limited memory, which resulted in a number that is not exactly 0.1. This is called a **rounding error**.

Rounding errors can play havoc with math-intense programs, as mathematical operations can compound the error. In the following program, we use 9 addition operations.

1 2 3 4 5 6 7 8 9 10 |
#include <iostream> #include <iomanip> int main() { using namespace std; cout << setprecision(17); double dValue; dValue = 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1; cout << dValue << endl; } |

This program should output 1, but it actually outputs:

0.99999999999999989

Note that the error is no longer in the last column like in the previous example! It has propagated to the second to last column. As you continue to do mathematical operations, this error can propagate further, causing the actual number to drift farther and farther from the number the user would expect.

**Comparison of floating point numbers**

One of the things that programmers like to do with numbers and variables is see whether two numbers or variables are equal to each other. C++ provides an operator called the equality operator (==) precisely for this purpose. For example, we could write a code snippet like this:

1 2 3 4 5 |
int x = 5; // integers have no precision issues if (x==5) cout << "x is 5" << endl; else cout << "x is not 5" << endl; |

This program would print “x is 5″.

However, when using floating point numbers, you can get some unexpected results if the two numbers being compared are very close. Consider:

1 2 3 4 5 6 7 8 |
float fValue1 = 1.345f; float fValue2 = 1.123f; float fTotal = fValue1 + fValue2; // should be 2.468 if (fTotal == 2.468) cout << "fTotal is 2.468"; else cout << "fTotal is not 2.468"; |

This program prints:

fTotal is not 2.468

This result is due to rounding error. fTotal is actually being stored as 2.4679999, which is not 2.468!

For the same reason, the comparison operators >, >=, <, and <= may produce the wrong result when comparing two floating point numbers that are very close.

**Conclusion**

To summarize, the two things you should remember about floating point numbers:

1) Floating point numbers offer limited precision. Floats typically offer about 7 significant digits worth of precision, and doubles offer about 16 significant digits. Trying to use more significant digits will result in a loss of precision. (Note: placeholder zeros do not count as significant digits, so a number like 22,000,000,000, or 0.00000033 only counts for 2 digits).

2) Floating point numbers often have small rounding errors. Many times these go unnoticed because they are so small, and because the numbers are truncated for output before the error propagates into the part that is not truncated. Regardless, comparisons on floating point numbers may not give the expected results when two numbers are close.

The section on relational operators has more detail on comparing floating point numbers.

2.6 — Boolean Values |

Index |

2.4 — Integers |

wow………to be honest this was extremely confusing man…….

It is far less confusing after reading this. Thank you. In class we covered this in about ten minutes and moved on to the next thing.

I’ve been to a few sites already trying to get a grasp on these (floats/doubles) and this summation really did the trick.

Thanks!

What do the f’s after some of the float and double values mean?

By default, if you type a floating point value into C++ it’s typed as a double. Consequently, if you do something like this:

You’re assigning a double to a float, which loses precision, and the compiler will probably complain.

Putting an “f” after the value means that you intend that value to be a float, not a double. Then when you do this:

You’re assigning a float value to a float variable, which makes more sense.

I missed something. How is it that 4.53 is a double and not a float?

The 4.53 is a literal constant of type

`double`

by default. When you add the f suffix to it like 4.53f it then becomes a literal constant of type`float`

Is there something wrong with my code?

The compiler brings up a problem with setprecision()..

Thanks

You need to include iomanip.h to use setprecision() that way.

See the lesson on ostream for more info about output manipulators and stuff.

/* Program to know real value of fTotal */

#include <iostream.h>

#include <conio.h>

void main()

{

clrscr();

double fValue;

float fValue1 = 1.345f;

float fValue2 = 1.123f;

float fTotal = fValue1 + fValue2; // should be 2.468

if (fTotal == 2.468)

{

cout << "n fTotal is 2.468";

}

else

{

cout << "n fTotal is not 2.468";

}

printf("n The real value of fTotal is %0.7f", fTotal);

getch();

}

/* End of Pgram */

/* Program to know real value of fTotal */

#include <iostream.h>

#include <conio.h>

void main()

{

clrscr();

float fValue1 = 1.345f;

float fValue2 = 1.123f;

float fTotal = fValue1 + fValue2; // should be 2.468

if (fTotal == 2.468)

{

cout << "n fTotal is 2.468";

}

else

{

cout << "n fTotal is not 2.468";

}

printf("n The real value of fTotal is %0.7f", fTotal);

getch();

}

/* End of Pgram */

Coming form a Java background, I wonder if anyone can advise me a C++ library with a similar function as Java’s BigDecimal.

Preferably one that works on Linux with gcc(so not the decimal type from Visual C++)

So with all the rounding errors and precision problems, how do programmers deal with operations that need to display something that would end up with a precision or rounding error? Or am I just over-thinking things?

Most of the time it’s simply not necessary to display a number to the number of significant digits where precision/rounding errors creep in. Generally with floating point numbers, programs will truncate the display to 2-5 decimals.

This could also be some reading, if interested.

http://babbage.cs.qc.edu/courses/cs341/IEEE-754references.html

And, “What Every Computer Scientist Should Know About Floating-Point Arithmetic”:

http://www.validlab.com/goldberg/paper.pdf

this is very help full site

how can i make the value a user inputs into a float?

then when i run the program from main() and i put in 2 values like eg. x = 10 y = 3 then the answer is 3 instead of 3.333333

You are already storing the user input values as a flow. The problem is that your function is returning an integer, so it’s truncating the result of x/y. Change your function to return a float and you will be good.

I set the precision level to 4, and added cout for the 2 values, fValue1 + fValue2.

I got fValue1 IS actually rounded off to 1.345 and fValue2 IS actually 1.123, expecting now the get the result of 2.468, but still reports ‘fTotal is not 2.468′

Why is that?

Chris

Rounding error. The numbers printed on your screen by cout are rounded in this case, so you’re not seeing the full representation. However, when you do the comparison, it does so with the actual numbers, not the rounded ones, which can lead to rounding issues.

How do I convert a Float say

x = 1234.567890123456789

to

y = 1234.5678901234 (small float ..10 decimal places only)

Something similar to setPrecision, to use NOT for display/Printing, but to use as a value for calculations / pass it on to a Database etc ?

I’m not sure what the best way to do this is. For small numbers, you can multiply by 10^x, cast to an integer to drop the remaining decimals, then divide by 10^x. However, if your number is too large you’ll overflow the int when you do the casting so I won’t say this is foolproof.

Didn’t understand how 0.1 is represented in binary by 0.00011001100110011â€¦

In decimal, .1 is tenths, .01 is hundredths, .001 is thousandths and so on. Likewise, in binary, .1 is halves, .01 is quarters, .001 is eights, and so on.

0.000110011… would be equal to 1/16 + 1/32 + 1/256 + 1/512 + …

Ok, so in binary, we have to approximate the 1 of decimal 0.1 with an infinite sum. What I still don’t understand is why we don’t use all the “weights”, that is, all the powers of 2, but only 1/16, 1/32, 1/256, 1/512 and so on, that is, the 4th position (2^4 = 16), the 5th, the 8th, the 9th, and so on. In other words, why don’t we have 0.011111111…….. which is equal to 1/2 + 1/4 + 1/8 + 1/16 + …? It also approaches 1! (I am referring of course to the decimal part of 0.1, that is, the 1.)

That’s exactly why you can’t use every power of 1/2. The infinite sum would add up to 1, which is ten times the number we require. In order for the sum to add up to 0.1, you would need to add Sum[(1/2)^4n + (1/2)^(4n+1)], taking n from 1 to infinity. You can try it yourself if you want.

Yes, you are right, this sum indeed converges to 0.1, whereas the sum I used converges to 1.0. Thank you for the clear and concise explanation!

There is also a good explanation in Wikipedia (yes, sometimes – not often though – Wikipedia has good articles):

“Fractions in binary

Fractions in binary only terminate if the denominator has 2 as the only prime factor. As a result, 1/10 does not have a finite binary representation, and this causes 10 Ã— 0.1 not to be precisely equal to 1 in floating point arithmetic. As an example, to interpret the binary expression for 1/3 = .010101…, this means: 1/3 = 0 Ã— 2^(?1) + 1 Ã— 2^(?2) + 0 Ã— 2^(?3) + 1 Ã— 2^(?4) + … = 0.3125 + … An exact value cannot be found with a sum of a finite number of inverse powers of two, and zeros and ones alternate forever.”

Follows a table of the conversion (fractional approximations) for fractions from decimal to binary. For the ones who are interested:

http://en.wikipedia.org/wiki/Binary_numeral_system

The question marks are the minus sign. I do not know why this has been false printed in my previous comment.

why does the float work.its a simple program to calculate the area of a triangle.but if i put the value of base as 3 and height as 3 ,i get the result as 4 instead of 4.5.here is the code.i don’t get the answer in decimal

[code] #include “stdafx.h”

#include

#include “add.h”

#include

int main()

{

using namespace std;

cout <> b;

cout <>h;

float a=divide(multiply(b,h),2); //area of triangle is half into base into height

cout << “area of triangle :” << a <<endl;

}[code]

When I run the following code, the values seem really wrong when output. What is going wrong here?

your C ++ compiler has a tendency to roundoff 8th precision onwards.

For any value lesser then 8;

It will display 1 lesser than called for.

In the code example right after “rounding errors,” why is there no “#include ” when there is a cout later and also “using namespace std”? Is it a miss type or is there a reason…?

Sorry after include it is suppsed to say io stream in angled brackets

Guess all authors are prone to typo errors.

hi there, can you give me an example of addition with no outputs?

I’m confused by your use of the term ‘real number’.

In mathematics a real number is any number which isn’t an imaginary number, which means that both C++ integer and floating-point data-types can hold real numbers, and the only difference between them is that floating-point data-types can represent decimal fractions, as opposed to integer data-types which can only hold natural numbers (aka counting numbers, or whole numbers).

Also – a general stylistic point – you’ve often used “it’s” where “its” is the correct word to use, cf. http://www.buckingham.ac.uk/english/guide/its.html.

Perform the following floating-point divisions:

a) (0.2233*0.01)\(0.6611*103)

b) (111.99*100)\(44.888*100)

Assume the computer truncates the signficand to four decimal digits and show your results as normalized decimal floating-point numbers.Symbols * and \ denote multiplication and division,respectively.CAN YOU HELP ME TO SOLVE THIS PROBLEM.BECAUSE I M LOST IN IT.I DIDN T UNDERSTAND.

THANK YOU..

thanks sir…Awsum work…a question…

i want to write a full story as input what should i do????

Hi there! Congratulation, very good explanation!!! Just what I was looking for.

Thank you.

When I run this code, I get z = 0.333333 and q = 0

float x = 1;

float y = 3;

float z = (x/y);

float q = (1/3);

Can someone explain why? I realize that if I write

float q = (1.0/3.0);

that this problem doesn’t occur, but I’m just wondering why I can’t use (1/3) since q is defined as a float. This page says it’s just a convention to have the decimal point.

Think it through as follows:

float x = 1 reads “put INT 1 into FLOAT x.” This changes its type from int to float. The same is true for float y = 3.

Thus float z = x/y divides two floats and returns a float.

However for float q = (1/3), this is a two part statement.

The first part (1/3) reads “divide INT 1 by INT 3″. Since this is division of two integers, this means it must return an integer (the floor), which in this case is 0.

The second part is then q = 0, which reads “put INT 0 into float q.”

An important thing to keep in mind is that division on a float is different than division on an integer. The literal 1 is read as an integer, however, the literal 1.0 is read as a float/double. This is why q = (1.0/3.0) is different than q = (1/3).

Hope this helped.

thank you!

Regarding the example where you added two floats and got a “rounding error,” the problem is that you did not include “f” at the end of the number.

#include

#include

using namespace std;

int main()

{

cout << setprecision(7);

float fValue1 = 1.345f;

float fValue2 = 1.123f;

float fTotal = fValue1 + fValue2; // should be 2.468

if (fTotal == 2.468f)

{

cout << "fTotal is 2.468" << endl;

cout << fTotal;

}

return 0;

}

It will evaluate properly. If you're checking for floats, you need to check it using a float value (ie, it needs the f on the end.) It is not a rounding error.

Excellent!

Here is some code I wrote to demonstrate this:

# include <iostream>

# include <iomanip>

using namespace std;

int main() {

float fValue1 = 1.345f;

float fValue2 = 1.123f;

float fTotal = fValue1 + fValue2;

for (int y=0;y<2;y++){

cout << "fTotal vs. " << (y?"2.468":"2.468f") << "\n\n";

for (int x=7;x<17;++x){

cout << setprecision(x);

cout << "precision(" << x << "):\t";

cout.width(30);

cout << left << fTotal << right << (y?2.468:2.468f) << endl;

}

cout << endl;

}

`return 0;`

}

This is a very good article on the floating-point computation issue: “Microsoft Visual C++ Floating-Point Optimization”, by Eric Fleegal, MSDN, 2004

http://msdn.microsoft.com/en-us/library/aa289157(v=vs.71).aspx

Hi people,

I am new to C++ so please don’t flame me

I wrote a simple prog. to test this course but something isn’t really working well and I can’t figure out why…

#include<iostream>

#include<string>

#include <iomanip> // for setprecision()

using namespace std;

main()

{

cout<<setprecision(7); //7 decimals

float v = 1;

float j = 3;

float cc;

cc = v/j;

//TEST with FLOAT NUMMERS

float ff = 0.3333333; // 7 decimals as set in "setprecusion(7);"

if(cc<ff) {

cout<<"cc is smaller then ff"<<endl;

}

else if(cc>ff) {

cout<<"cc is bigger then ff"<<endl;

}

else{

cout<<"cc equals to ff"<<endl;

}

cout<<cc<<" = cc"<<endl;

cout<<ff<<" = ff"<<endl;

`return 0;`

}

The output gives me that cc is bigger than ff…

I don’t understand why as I set precision to 7 and my var ff has also 7 decimals.

They should both be equal.

Any suggestions where I made an error?

Thanks!!

I Have Tasted Your Code.

No, You didn’t make any wrong.

I think setprecision() function is only for setting the precision at time of showing your variable when you use cout.

I mean setprecision() cant change your variable.Like in your code cc = v/j so cc is stored as 0.33333333333333333333333333………. & setprecision() cant change this.You stored ff as 0.3333333.

0.3333333333333333…….. is greater than 0.3333333 isnt it?So your code show cc is bigger than ff.

Read the “Comparison of floating point numbers” part of this tutorial.It dosent say that you can use setprecision() for Comparison of floating point numbers.

Thank You

How to control the numbers after decimal point in C++?

Like in C language if i take a floating variable f = 123.4567 and i want to show only 2 numbers after decimal point than i will use printf(“%0.2f”).Then it will show 123.45.

In C++ i have to use setprecision().But It determines total numbers not just numbers after decimal point.So, it makes problem.Like if i dont know that what numbers my floating variable will contain after calculation it can contain 123.123 or 1234.123 so if i set precision to 5 for first case it will show 123.12 and for second case it will show 1234.1!But i always want to show 2 numbers after decimal points for every case.How can i do that in c++?

i have same question like you

Use the

`std::fixed`

stream manipulator and the member function`std::precision`

.For example if you want to display with 2 decimal places:

double pi=3.14159;

std::cout.precision(2);

std::cout << "Today's price for a slice of pi is $" << std::fixed << pi << std::endl;

and it should print:

`Today's price for a slice of pi is $3.14`

I saw this code

#include

#include // for setprecision()

int main()

{

using namespace std;

cout << setprecision(16); // show 16 digits

float fValue = 3.33333333333333333333333333333333333333f; <————————–

cout << fValue << endl;

double dValue = 3.3333333333333333333333333333333333333;

cout << dValue << endl;

Please tell me what is "f" in first float fvalue

Noticed a couple of errors in the examples in this article, that generate compile errors when copy/paste is used. I actually think this is a good idea to purposefully do. This way, when we blindly copy/paste an example to compile ourselves, we might get an error. This would allow us to encounter common compiler errors and know what they mean and how to correct them.

Hi, I have been trying to make a very simple program to multiply two real numbers (i.e. a decimal). Here is my code (it doesn’t work), could someone please help?

#include “stdafx.h”

#include

#include

int multiply(float fFirstNumber, float fSecondNumber)

{

return fFirstNumber * fSecondNumber;

}

int main()

{

using namespace std;

cout << "This simple program multiplies 2 numbers together." << endl;

Sleep(3000);

cout << "Enter the first number you want to multiply:" <> fFirstInput;

cout << "Enter the second number you want to multiply:" <> fSecondInput;

cout << "The answer is: " << endl;

cout << multiply(fFirstInput, fSecondInput) << endl;

Sleep(1000);

cout << "5" << endl;

Sleep(1000);

cout << "4" << endl;

Sleep(1000);

cout << "3" << endl;

Sleep(1000);

cout << "2" << endl;

Sleep(1000);

cout << "1" << endl;

Sleep(1000);

cout << "Bye Bye!!!" << endl;

Sleep(1000);

return 0;

}

Copy and Paste didn’t work properly!!!

Alex, the first example demonstrating setprecision() is missing the closing curly bracket for main()