Integers are great for counting whole numbers, but sometimes we need to store *very* large numbers, or numbers with a fractional component. A **floating point** type variable is a variable that can hold a real number, such as 4320.0, -3.33, or 0.01226. The *floating* part of the name *floating point* refers to the fact that the decimal point can “float”; that is, it can support a variable number of digits before and after the decimal point.

There are three different floating point data types: **float**, **double**, and **long double**. As with integers, C++ does not define the size of these types. On modern architectures, floating point representation almost always follows IEEE 754 binary format. In this format, a float is 4 bytes, a double is 8, and a long double can be equivalent to a double (8 bytes), 80-bits (often padded to 12 bytes), or 16 bytes.

Floating point data types are always signed (can hold positive and negative values).

Category | Type | Minimum Size | Typical Size |
---|---|---|---|

floating point | float | 4 bytes | 4 bytes |

double | 8 bytes | 8 bytes | |

long double | 8 bytes | 8, 12, or 16 bytes |

Here are some definitions of floating point numbers:

1 2 3 |
float fValue; double dValue; long double dValue2; |

When using floating point literals, it is convention to always include at least one decimal place. This helps distinguish floating point values from integer values.

1 2 3 |
int x(5); // 5 means integer double y(5.0); // 5.0 is a floating point literal (no suffix means double type by default) float z(5.0f); // 5.0 is a floating point literal, f suffix means float type |

Note that by default, floating point literals default to type double. An f suffix is used to denote a literal of type float. When assigning a literal value to a variable, always make sure the type of the literal matches the type of the variable.

**Printing floating point numbers**

Now consider this simple program:

1 2 3 4 5 6 7 8 |
#include <iostream> int main() { std::cout << 5.0 << std::endl; std::cout << 6.7f << std::endl; std::cout << 9876543.21 << std::endl; } |

The results of this seemingly simple program may surprise you:

5 6.7 9.87654e+06

In the first case, the std::cout printed 5, even though we typed in 5.0. By default, std::cout will not print the fractional part of a number if the fractional part is 0.

In the second case, the number prints as we expect.

In the third case, what the heck? It turns out that by default, when printing some floating point numbers (we’ll cover which number later in the lesson), they are printed in scientific notation.

**Scientific notation**

How floating point variables store information is beyond the scope of this tutorial, but it is very similar to how numbers are written in scientific notation. **Scientific notation** is a useful shorthand for writing lengthy numbers in a concise manner. And although scientific notation may seem foreign at first, understanding scientific notation will help you understand how floating point numbers work, and more importantly, what their limitations are.

Numbers in scientific notation take the following form: *significand* x 10^{exponent}. For example, in the scientific notation `1.2 x 10`

, ^{4}`1.2`

is the significand and `4`

is the exponent. This number evaluates to 12,000.

By convention, numbers in scientific notation are written with one digit before the decimal, and the rest of the digits afterward.

Consider the mass of the Earth. In decimal notation, we’d write this as `5973600000000000000000000 kg`

. That’s a really large number (too big to fit even in an 8 byte integer). It’s also hard to read (is that 19 or 20 zeros?). In scientific notation, this would be written as `5.9736 x 10`

, which is much easier to read. Scientific notation has the added benefit of making it easier to compare the magnitude of two really large or really small numbers simply by comparing the exponent.^{24} kg

Because it can be hard to type or display exponents in C++, we use the letter ‘e’ or ‘E’ to represent the “times 10 to the power of” part of the equation. For example, `1.2 x 10`

would be written as ^{4}`1.2e4`

, and `5.9736 x 10`

would be written as ^{24}`5.9736e24`

.

For numbers smaller than 1, the exponent can be negative. The number `5e-2`

is equivalent to `5 * 10`

, which is ^{-2}`5 / 10`

, or ^{2}`0.05`

. The mass of an electron is `9.1093822e-31 kg`

.

In fact, we can use scientific notation to assign values to floating point variables.

1 2 3 4 5 |
double d1(5000.0); double d2(5e3); // another way to assign 5000 double d3(0.05); double d4(5e-2); // another way to assign 0.05 |

**How to convert numbers to scientific notation**

Use the following procedure:

- Your exponent starts at zero.
- Slide the decimal so there is only one non-zero digit to the left of the decimal.
- Each place you slide the decimal to the left increases the exponent by 1.
- Each place you slide the decimal to the right decreases the exponent by 1.
- Trim off any leading zeros (on the left end)
- Trim off any trailing zeros (on the right end) only if the original number had no decimal point. We’re assuming they’re not significant unless otherwise specified.

Here’s some examples:

Start with: 42030 Slide decimal left 4 spaces: 4.2030e4 No leading zeros to trim: 4.2030e4 Trim trailing zeros: 4.203e4 (4 significant digits)

Start with: 0.0078900 Slide decimal right 3 spaces: 0007.8900e-3 Trim leading zeros: 7.8900e-3 Don't trim trailing zeros: 7.8900e-3 (5 significant digits)

Start with: 600.410 Slide decimal left 2 spaces: 6.00410e2 No leading zeros to trim: 6.00410e2 Don't trim trailing zeros: 6.00410e2 (6 significant digits)

Here’s the most important thing to understand: The digits in the significand (the part before the E) are called the **significant digits**. The number of significant digits defines a number’s **precision**. The more digits in the significand, the more precise a number is.

**Precision and trailing zeros after the decimal**

Consider the case where we ask two lab assistants each to weigh the same apple. One returns and says the apple weighs 87 grams. The other returns and says the apple weighs 87.000 grams. Assuming the weighings were correct, in the former case, we know the apple actually weighs somewhere between 86.50 and 87.49 grams. Maybe the scale was only precise to the nearest gram. Or maybe our assistant rounded a bit. In the latter case, we are confident about the actual weight of the apple to a much higher degree (it weighs between 86.9950 and 87.0049 grams, which has much less variability).

So in scientific notation, we prefer to keep trailing zeros after a decimal, because those digits impart useful information about the precision of the number.

However, in C++, 87 and 87.000 are treated exactly the same, and the compiler will store the same value for each. There’s no technical reason why we should prefer one over the other (though there might be scientific reasons, if you’re using the source code as documentation).

**Precision and range**

Consider the fraction 1/3. The decimal representation of this number is 0.33333333333333… with 3’s going out to infinity. An infinite length number would require infinite memory to store, and we typically only have 4 or 8 bytes. Floating point numbers can only store a certain number of significant digits, and the rest are lost. The **precision** of a floating point number defines how many *significant digits* it can represent without information loss.

When outputting floating point numbers, std::cout has a default precision of 6 -- that is, it assumes all floating point variables are only significant to 6 digits, and hence it will truncate anything after that.

The following program shows std::cout truncating to 6 digits:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
#include <iostream> int main() { float f; f = 9.87654321f; // f suffix means this number should be treated as a float std::cout << f << std::endl; f = 987.654321f; std::cout << f << std::endl; f = 987654.321f; std::cout << f << std::endl; f = 9876543.21f; std::cout << f << std::endl; f = 0.0000987654321f; std::cout << f << std::endl; return 0; } |

This program outputs:

9.87654 987.654 987654 9.87654e+006 9.87654e-005

Note that each of these is only 6 significant digits.

Also note that cout will switch to outputting numbers in scientific notation in some cases. Depending on the compiler, the exponent will typically be padded to a minimum number of digits. Fear not, 9.87654e+006 is the same as 9.87654e6, just with some padding 0’s. The minimum number of exponent digits displayed is compiler-specific (Visual Studio uses 3, some others use 2 as per the C99 standard).

However, we can override the default precision that cout shows by using the std::setprecision() function that is defined in a header file called iomanip.

1 2 3 4 5 6 7 8 9 10 11 |
#include <iostream> #include <iomanip> // for std::setprecision() int main() { std::cout << std::setprecision(16); // show 16 digits float f = 3.33333333333333333333333333333333333333f; std::cout << f << std::endl; double d = 3.3333333333333333333333333333333333333; std::cout << d << std::endl; return 0; } |

Outputs:

3.333333253860474 3.333333333333334

Because we set the precision to 16 digits, each of the above numbers is printed with 16 digits. But, as you can see, the numbers certainly aren’t precise to 16 digits!

The number of digits of precision a floating point variable has depends on both the size (floats have less precision than doubles) and the particular value being stored (some values have more precision than others). Float values have between 6 and 9 digits of precision, with most float values having at least 7 significant digits (which is why everything after that many digits in our answer above is junk). Double values have between 15 and 18 digits of precision, with most double values having at least 16 significant digits. Long double has a minimum precision of 15, 18, or 33 significant digits depending on how many bytes it occupies.

Precision issues don’t just impact fractional numbers, they impact any number with too many significant digits. Let’s consider a big number:

1 2 3 4 5 6 7 8 9 10 |
#include <iostream> #include <iomanip> // for std::setprecision() int main() { float f(123456789.0f); // f has 10 significant digits std::cout << std::setprecision(9); // to show 9 digits in f std::cout << f << std::endl; return 0; } |

Output:

123456792

123456792 is greater than 123456789. The value 123456789.0 has 10 significant digits, but float values typically have 7 digits of precision. We lost some precision!

Consequently, one has to be careful when using floating point numbers that require more precision than the variables can hold.

Assuming IEEE 754 representation:

Size | Range | Precision |
---|---|---|

4 bytes | ±1.18 x 10^{-38} to ±3.4 x 10^{38} |
6-9 significant digits, typically 7 |

8 bytes | ±2.23 x 10^{-308} to ±1.80 x 10^{308} |
15-18 significant digits, typically 16 |

80-bits (12 bytes) | ±3.36 x 10^{-4932} to ±1.18 x 10^{4932} |
18-21 significant digits |

16 bytes | ±3.36 x 10^{-4932} to ±1.18 x 10^{4932} |
33-36 significant digits |

It may seem a little odd that the 12-byte floating point number has the same range as the 16-byte floating point number. This is because they have the same number of bits dedicated to the exponent -- however, the 16-byte number offers a much higher precision.

*Rule: Favor double over float unless space is at a premium, as the lack of precision in a float will often lead to inaccuracies.*

**Rounding errors**

One of the reasons floating point numbers can be tricky is due to non-obvious differences between binary (how data is stored) and decimal (how we think) numbers. Consider the fraction 1/10. In decimal, this is easily represented as 0.1, and we are used to thinking of 0.1 as an easily representable number. However, in binary, 0.1 is represented by the infinite sequence: 0.00011001100110011… Because of this, when we assign 0.1 to a floating point number, we’ll run into precision problems.

You can see the effects of this in the following program:

1 2 3 4 5 6 7 8 9 10 11 |
#include <iostream> #include <iomanip> // for std::setprecision() int main() { double d(0.1); std::cout << d << std::endl; // use default cout precision of 6 std::cout << std::setprecision(17); std::cout << d << std::endl; return 0; } |

This outputs:

0.1 0.10000000000000001

On the top line, cout prints 0.1, as we expect.

On the bottom line, where we have cout show us 17 digits of precision, we see that d is actually *not quite* 0.1! This is because the double had to truncate the approximation due to its limited memory, which resulted in a number that is not exactly 0.1. This is called a **rounding error**.

Rounding errors can have unexpected consequences:

1 2 3 4 5 6 7 8 9 10 11 12 13 |
#include <iostream> #include <iomanip> // for std::setprecision() int main() { std::cout << std::setprecision(17); double d1(1.0); std::cout << d1 << std::endl; double d2(0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1 + 0.1); // should equal 1.0 std::cout << d2 << std::endl; } |

1 0.99999999999999989

Although we might expect that d1 and d2 should be equal, we see that they are not. If we were to compare d1 and d2 in a program, the program would probably not perform as expected. We discuss this more in section 3.5 -- Relational operators (comparisons).

One last note on rounding errors: mathematical operations (such as addition and multiplication) tend to make rounding errors grow. So even though 0.1 has a rounding error in the 17th significant digit, when we add 0.1 ten times, the rounding error has crept into the 16th significant digit.

**NaN and Inf**

There are two special categories of floating point numbers. The first is **Inf**, which represents infinity. Inf can be positive or negative. The second is **NaN**, which stands for “Not a Number”. There are several different kinds of NaN (which we won’t discuss here).

Here’s a program showing all three:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
#include <iostream> int main() { double zero = 0.0; double posinf = 5.0 / zero; // positive infinity std::cout << posinf << std::endl; double neginf = -5.0 / zero; // negative infinity std::cout << neginf << std::endl; double nan = zero / zero; // not a number (mathematically invalid) std::cout << nan << std::endl; return 0; } |

And the results using Visual Studio 2008 on Windows:

1.#INF -1.#INF 1.#IND

INF stands for infinity, and IND stands for indeterminate. Note that the results of printing Inf and NaN are platform specific, so your results may vary.

**Conclusion**

To summarize, the two things you should remember about floating point numbers:

1) Floating point numbers are great for storing very large or very small numbers, including those with fractional components, so long as they have a limited number of significant digits (precision).

2) Floating point numbers often have small rounding errors, even when the number has fewer significant digits than the precision. Many times these go unnoticed because they are so small, and because the numbers are truncated for output. Consequently, comparisons of floating point numbers may not give the expected results. Performing mathematical operations on these values will cause the rounding errors to grow larger.

**Quiz**

1) Convert the following numbers to C++ style scientific notation (using an e to represent the exponent) and determine how many significant digits each has (keep trailing zeros after the decimal):

a) 34.50

b) 0.004000

c) 123.005

d) 146000

e) 146000.001

f) 0.0000000008

g) 34500.0

**Quiz Answers**

D.2.6 -- Boolean values and an introduction to if statements |

Index |

D.2.4a -- Fixed-width integers and the unsigned controversy |

Dear Teacher, please let me following question:

I want to create a program asking user enter a double number in format a/b e.g. 3/2 and program output 1.5.

With regards and friendship

Georges Theodosiou

Without error checking:

Mr. nascardriver,

Please accept my many thanks for you replied and for your message.

Then, it's not possible type numerator and denominator together, e.g. 3/2.

I have to learn a lot from you.

With regards and friendship

Georges Theodosiou

Yes, with the code I posted, you can enter

3/2

and the output is

1.5

Dear nascardriever,

Please accept my many-many thanks for you replied and that immediately. And many more for you solved my problem.

Really I have to learn a lot from you.

With regards and friendship

Georges Theodosiou

Dear Associate Teacher,

Please let me say you that when I put white spaces (one or more) between nominator and denominator, e.g. 25 24, output is correct.

With regards and friendship

Georges Theodosiou

@std::cin.operator>> ignores whitespace during extraction and @std::cin.ignore without arguments ignores 1 character, no matter what it is. You could replace

Chars are covered in lesson 2.7.

Dear Associate Teacher,

Please let me express my sincere gratitude for your instructive answers.

With regards and friendship

Georges Theodosiou

P.S. Indeed Chars are covered in lesson D.2.7. G.T.

First of all thanks a lot for this site and very instructive and easy to understand tutorials! It's like I'm schooling myself and learning really fast in such amount of time.

I tried to create a program that takes in as input your current Year/day/time and Year/day/time of birth to output the age with as much knowledge as I mustered so far. I'd like to ask if possible if I could improve my code or if I can do better in some areas.

Note that the format for days is 365 so when it asks for days you have to do some math. For instance if it's February 18 then you have to count all days in this year up to feb 18. Same with hours which is 24 hour format.

Thank you for the tutorials once again!

#include <iostream>

using namespace std;

double age(int year, int day, int hour)

{

double h{1};

double d{h * 24};

double y{365.25};

cout << "Enter current year: ";

double cyear{};

cin >> cyear;

cout << "Enter current day: ";

double cday{};

cin >> cday;

cout << "Enter current hour: ";

double chour{};

cin >> chour;

double thours{cyear * y * d * h + cday + chour};

double pthours{year * y * d * h + day + hour};

double hage{pthours - thours};

return hage;

}

double yearDayHours(double totalHours)

{

return totalHours / 24 / 365;

}

int main()

{

int y{};

int d{};

int h{};

cout << "Enter year of birth: ";

cin >> y;

cout << "Enter day of birth: ";

cin >> d;

cout << "Enter hour of birth: ";

cin >> h;

cout << yearDayHours(age(y, d, h));

return 0;

}

Hi!

Please use code tags when posting code.

* Don't use "using namespace".

* Use double literals when calculating with doubles (1.0 instead of 1, 24.0 instead of 24 etc.)

* Initialize variables to a specific (0) value.

* You're using the same name style for functions and variables, this can lead to confusion.

* Prints a line feed when you program has finished.

* Don't use abbreviations unless their meaning is obvious.

Thank you that's very helpful!

I've always hated these datatypes, most of all since they appear to be... how to say that in proper English (my first language is Dutch)... unreliable....

Now I began coding in the 1980s in BASIC (like many of us back then), and there it was often recommended to use integers as they are faster than non-integers (and of all issues the old BASIC had, speed was its biggest issue), and when I moved to other languages I often see the numbers coming after the decimal point change, seemingly randomly, sometimes even leading to 1.x becoming 2.x, but it did cause me to avoid them, and even when I only need 3 decimals to use 1000-folds of the numbers in the calculation and the 'cheat' on the output making users believe they see a non-interger. I don't think that's the most elegant solution, but at least I was sure the results shown on screen were always correct.

To demonstrate, this code in BlitzMax:

Outputs: 1.00199997 which is NOT the number I asked for.... Now the difference is small, but in complex calculations this did lead to wrong outcomes with pretty terrible results in the functioning of the program as a whole, and BlitzMax is not the only language in which I encountered this issue.

Now are C and C++ in this issue? Are these more precise, or is that compiler dependent? Especially since I plan to work open source (which I mostly already do in the languages I used so far) different settings with compilers can haunt me on this one, so that's why I wanna be sure before I start a "serious" project in C++ :)

I did try this (basically an exact translation of the code in BlitzMax to C++) and that gave the correct outcome, but I want to make sure all compilers are the same on this to prevent any issues ;)

Hi!

* Line 3: Initialize your variables with brace initializers.

* Line 4: Missing 'f' postfix

Quoting the standard § 6.7.1 63N4791

"The type double provides at least

as much precision as float, and the type long double provides at least as much precision as double. The set of values of the type float is a subset of the set of values of the type double; the set of values of the

type double is a subset of the set of values of the type long double. The value representation of floating-point types is implementation-defined."

So, no guarantees about the precision about individual types.

You can use @std::numeric_limits::is_iec559 to check if the compiler uses a standardized floating point representation, and, if it doesn't, fall back to a custom type.

I have to say that these chapters about numbers are really helpful. I am learning so much! I am intentionally spending much time to go over each line written. The writer has done a very meticulous job quite sincerely.

Speaking about INF, how come the program is not crashing even though a number is being divided by a zero in order to attain infinity?

> how come the program is not crashing even though a number is being divided by a zero

The behavior of division by 0 is undefined. Most compilers use a floating point representation that support infinity, so they use that. But you shouldn't assume anything about undefined behavior.

@Alex

This should be noted in your example. Or just use @std::numeric_limits<double>::infinity() along with @std::numeric_limits<double>::has_infinity to get the infinity values instead of division by 0.

C++20 will standardize signed integer representations. Let's hope we'll get standardized floating points eventually.

Noted. This chapter is next on my rewrite list, so I'll note this and amend when I do the rewrite shortly.

HELLO (^+^)

Does it overflow ?

No

Thank you for replying,

But what are the following errors for ?

For mixing types.

Line 7 converts a float to double.

Line 8 converts a long double to double and a double to a float.

Increasing floating point precision isn't a problem, but you should use casts when you do so (Casts are covered later).

You'll get problems when you decrease precision (eg. double to float).

Thanks again,

Do you mean 'static_cast()' ?

Is one of the problems when I decrease the precision overflow or it has a another problems ?

The problem is the loss of precision.

@db can store the given number, but @fl can't. It will be converted to the closest number that a float can store, which can be way off.

On my system, the above code outputs

We're off by 100 billion.

The same happens with the fractional part.

Hello there,

Between double and float data type, Is the limitation in decimal fraction number or on the piece of integer ?

double has a guaranteed higher precision than float. So the fractional part is more precise.

Other than that there are no requirements to the types.

@std::numeric_limits gives you all kind of information about numeric types.

http://www.cplusplus.com/reference/limits/numeric_limits/

Hello!

Thanks for this tutorial.

I have a problem, I'm using Qt Creator 4.7 and When I write following program I get an error and I don't know why it occurs.

And I get this following error, but the above program is compiled and run as well but the following error is a warning:

When I initialize the variable dv as an integer, there's no problem:

Thank you

* Line 13, 14: Initialize your variables with uniform initialization. You used copy initialization.

1.88888888888888 is a double. @dv is a long double.

Use 1.88888888888888l. The 'l' in the end means that it's a long double.

Quick question on the rounding of floating point numbers.

As we know that rounding errors occur as well as a cut off point on the number of decimal places depending on the data type used

How does the computer know that (1 / 3) * 3 = 1?

Since 1/3 is 0.3333333... and there is a cut off point on those decimal places, how does it calculate it precisely back to 1 if we multiply that result by 3 instead of ending up with something like 0.999999999?

Hi Jeremy!

The result is so close to 1, that because of the error caused by the multiplication, the result is 1.

Example with 1/3:

operation = real = computer

1.0 / 3.0 = 0.3... = 0.333333333333333333342

0.333333333333333333342 * 3.0 = 1.000000000000000000026 = 1.0

1.000000000000000000026 is so close to 1.0, that for your computer, it is 1.0

Once you get to bigger numbers, and bigger errors, you won't be that lucky.

Example with 1/41:

operation = real = computer

1.0 / 41.0 = ... = 0.0243902439024390243894

0.0243902439024390243894 * 41.0 = 0.9999999999999999999654 = 0.999999999999999999946

We don't get 1.0 as a result.

The internal representation of floating point values is implementation defined, you might experience different results.

how to print exactly 12.56464564896465465465498465465461 value same as

C++ doesn't have a built-in type that supports numbers that precise. You'd have to write your own type (Covered later).

If you just want to print that value and don't do any processing with it, you can print it as a string.

Hi

Please to explain why we do literals for f value? You said that the compiler should treat f as float value! but we already declared it as float. so why we need to confirm to the complier to treat it as float ? I read about literals but did not fully get it. Thanks

float f;

f = 9.87654321f; // f suffix means this number should be treated as a float

- The type of @f can be deduced off of the value, this is covered later.

- You might want to do parts of a calculation in higher/lower precision than another part.

- The type of the value is determined before the assignment happens. ie. The type of the variable is ignored until after the calculation has finished.

I'm having a problem with floating-point numbers in this code:

The specific problem is with idiot-proofing the entry of fractional amounts of -1 < x < 1. Larger fractions are truncated just fine but fractions more than -1 and less than 1 cause an infinite loop. Please advise.

Your program doesn't support floating point numbers. You're using long, which is an integer type.

Your program encounters an infinite loop, for numbers like 0.1, because

line extracts the 0

-> line 47 fails

-> ".1" is left in the input stream

-> '.' cannot be extracted into an int

-> the stream enters a failed state (covered later) in which it is now stuck

You need to use double or float

Thank you for your reply and your assistance.

I'm specifically using longs instead of floating point numbers because there is an infinite number of floating-point divisors for every floating point number.

I managed to solve the problem after reading lesson 4.4 on strings, as follows:

Here is the entire program revised after reading lesson 4.4 (I tested it and it works fine):

That partially fixes the problem. The proper solution is covered in lesson 5.10.

Suggestions

* Unnecessary forward declarations. Move @main above other functions.

* Same name style for functions and variables. This can lead to confusion.

* Avoid global variables

* Line 49, 64: Initialize your variables with uniform initialization

* Line 66: Don't pass 32767 to @std::cin.ignore. Pass @std::numeric_limits<std::streamsize>::max().

* Line 67 doesn't do anything

* Line 81-86: Should be a for-loop

Here's the cleaned-up source code. I didn't initialize the variable on line 61 (old line 49) because of the console input.

> I didn't initialize the variable on line 61 (old line 49) because of the console input.

You mean 31? Although @std::cin overrides variables on failed extraction, you shouldn't assume any function can handle uninitialized variables. Initializing all variables helps preventing you from writing unpredictable code, which is terrible to debug. Line 98, 99 should be merged.

The backslash in line 36 is more confusing than helpful. Non-string lines can be split without using a backslash.

Line 62-66 should be a for-loop (I'm assuming you have previous programming knowledge or read ahead).

You can use raw strings to preserve formatting

that way you don't have to add \n\ everywhere

I cleaned up the code again, but am not sure what value I should initialize

to. I tried

and that didn't work. Instead I've initialized the variables to 0, as that's a value that gets thrown out by the console input getter.

0 is correct.

Line 105, 106:

So far this has been the only topic I've found it difficult to understand. Do we use float if we want less precision (like small decimal numbers, 3.4) and to use up less memory? And double to give us more precision such as 99.67462775848, but it'll use up more of the memory?

I tried to scientific notations quiz:

scientificnotation.cpp

I'm still not sure when to use the f suffix. Whenever I used it for "double h(666.262002f)", it printed out the same value without the f suffix. Also for

double f(0.0000000008);

std::cout << std::setprecision(11);

f = 8e-10;

std::cout << "f: " << f << std::endl;

, it prints out 8e-10 and not the value in the brackets. What did I do wrong?

Also I'm not sure how to keep the trailing zeros!

Lastly I keep getting this warning : scientificnotation.cpp(24): warning C4244: '=': conversion from 'double' to 'int', possible loss of data

Any help would be appreciated and sorry for any inconvenience. I just want to perfect my knowledge on every topic before I move on.

You're initializing your variables but overriding their values before using them. That's pointless. Initialize them with their desired value.

Initialize your variables with uniform initialization.

> I'm still not sure when to use the f suffix

Whenever you write a number that's supposed to be a float. Not for doubles, not for ints, only for floats.

> it prints out 8e-10 and not the value in the brackets

You initialized @f to 0.0000000008 but changed its value to 8e-10 before printing it.

> I'm not sure how to keep the trailing zeros

Use @std::fixed to print exactly the precision set via @std::setprecision

> I keep getting this warning

float and double are different types. Don't mix them.

Hi. I've made improvement to the code and got the exact values in the quiz. Thanks for all your help! Could you check for any further improvement? Thanks for taking your time to help beginners! Idk how you have the time to do it :)

* Initialize your variables with uniform initialization. (Lesson 2.1)

Other than that your code looks fine now.

Regarding question number 6 for those who got it wrong because they thought that the answer should have had six significant digits.

Trailing zeros in a whole number with no decimal shown are not significant, which is why the answer has three significant digits.

Had the number included a decimal at the end, the number would have had six significant digits:

EX: 146000.

1.46000E5

I was not able to understand how you Inf is equal to 5.0, Also, if you only used cout for the double value, how come you got #Inf with it.

In mathematics, any positive number divided by 0 is infinity. inf is not equal to 5.0, 5.0 / 0.0 = inf. The inf that gets printed is how the floating point representation is telling you that you did something to generate an infinite value.

When I have this program:

case 1)

float f = 12.345678901f;

std::cout << f << std::endl;

output is :

12.3457

case 2)

std::cout << std::setprecision(8) << std::endl;

float f = 12.345678901f;

std::cout << std::fixed << f << std::endl;

the output is:

12.34567928

case 3)

when I change the program to:

std::cout << std::setprecision(8) << std::endl;

float f = 123.45678901f;

std::cout << std::fixed << f << std::endl;

the output is:

123.45678711

As per case 1) output, it seems like float has 6 digits of precision on my compiler.

as per case 2) output, it seems like float has 8 digits of precision on my compiler.

I have 2 questions:

question 1 :

I don't understand what's happening in case 3), how many digits of precision is being seen in case 3?

question 2:

I don't understand why I'm getting 2 different digits of precision for float in case 1) and case 2)

Please help.

The definition of precision is how many significant digits the variable can hold without error. Floats have between 6 to 9 decimals of precision, and std::cout defaults to 6 digits of precision (being conservative).

The answer to both questions is the same: when you use std::fixed, this causes the number to print with as many numbers to the right of the decimal as you've assigned via std::precision. This may exceed the actual precision of the underlying number itself.

Got it! Thanks for replying!

I now understand that in case 2), although the output is 123.45678711, it does not mean that the precision for float changed from 6 (as seen in case 1) to 8, its rather that by chance the output was 123.45678711 when it could easily have been 123.45600000.

Please correct me if I'm wrong!

If I understand what you're getting at, I think so. Precision governs how many significant digits are guaranteed to be preserved. Anything beyond the precision may get rounded to the nearest representable number. Essentially, the end of your number is garbage.

Got it, thanks again :)

can someone help me how to do this with C, please Pick 3 food items (main meal, side, drink, for example)

Display them for the user, like this example:

Dear Alex,

What do you recommend C or C++ and why ? I recently checked TIOBE index and it says JAVA and C are the most popular language to learn. But I want to understand malware and maybe later do something with AI or gamehacking.

Thanks

Hi Bjorn!

Java is pretty much only used in programming courses and on Android.

C is used on some embedded systems and regular programs/libraries.

C++ is C (almost), but with more features, allowing you to spend you time more efficiently.

> malware

C++ (Or JS if you're targeting the web)

> AI

Python is pretty popular there I think.

> gamehacking

You're best of using the language the game was written in. In most cases, it's C or C++, but you can use C++ for a C game.

My 2c: I recommend C++. I find object oriented programming results in better organization and maintainability of large-scale programs. It's much easier to learn C (if needed) than vice-versa. These days, I'd argue C is only better for niche use cases: embedded code, when you need to interact with other languages that only have C bindings, etc...

Dear Teacher, please let me say you regarding number 0.0000000008's scientific notation, Scientific Notation Converter at

https://www.calculatorsoup.com/calculators/math/scientific-notation-converter.php

for 8e-10 (c++ output) returns 0.000000000. For 8.0e-10 returns 0.00000000080, and for 8.0e-9 returns 0.0000000080 (1 zero less, left to 8). Only for 8.e-10 returns 0.0000000008, the correct value. My conclusion is that scientific notation is not universal, there are slight differences in the various software. Regards.

Dear Teacher, please let me send you following strange program with strange result output by

https://www.onlinegdb.com/online_c++_compiler

Result:

-2147483648

-2147483648

0

Regards.

Hi Georges!

You're dividing by zero. Since your dividend is a double your result will be a double too. x / 0.0 returns std::numeric_limits<double>::infinity(). When converting this to an int you'll get -2147483648 (all ones in binary).

I assume 0 / 0 works, because your computer stops executing when it notices that the divided is zero. Division by 0 with it's will cause your program to crash.

Dear nascardriver, please let me two comments:

1. Program

outputs

5

It means machine accepts only the integer part of number for, type is int.

2. Strange is that in my first message first value of b is positive and second negative, but in output both results are negative.

5.0 / a is a floating point division. int b = 5.0 / a is a floating point division that is implicitly cast to an integer upon assignment to int b.

a / a is an integer division because a is an int.

Dear Teacher, please accept my many thanks for you replied, and for your helpful answer. Fact is that c++ does not agree with math, in latter division by zero is undefined. In former, situation is complicated depending on whether numbers are integer or rational.

So basically he is trying to use a decimal in an integer (to my understanding illegal in terms of cpp)

When I need to know why on something, I can't move forward until I do. This is driving me nuts.

int x(5); // 5 means integer

double y(5.0); // 5.0 is a floating point literal (no suffix means double type by default)

float z(5.0f); // 5.0 is a floating point literal, f suffix means float type

In the second definition, we use the word double to declare y. Why does having no suffix mean it is a double by default even matter? We used double. It declares it that way.

Following that question, in the third definition, we use float to define. WHY do we need to use the f suffix if we're already declaring it as a float type of a floating literal? Which begs the question, why is there even a default type or suffix needed in either case, if we are using the appropriate descriptor in the initial definition in the first place? It seems to me like we're specifying what type it is twice. What am I missing?

A follow up thought, to better explain my confusion, would be thus:

If no f suffix means double type as default, would then float y(5.0); // mean that it is a double type by default since the f suffix wasn't used?

Or that double z(5.0f); // is a float type, even though we declared with double?

(I think) Obviously not, but see how I am getting confused? I read further in the lesson, and googled. I don't think I know enough to know what I need to google, to figure out what the hell I'm confused on. Ha.

No. If you declare a variable as float that variable will always be a float. You can initialize/assign it other types, but you shouldn't.

There is currently no way to tell the compiler "make this literal the same type as the variable it is initializing".

However, the inverse is possible: you can tell C++ to make the type of a variable the same type as the initializer using the auto keyword:

This is one way to address the redundancy you're noticing.

Mister, please add this info in the main text! It looks very relevant. :)

Hi Zane!

A little example:

You'll learn about function overloading where you have two functions with the same name but different parameters.

I'm sure there are more reasons for having different numbers, just use the suffix that corresponds to your data type.

Annotation, reference, and identification. Good enough reasons as any. Thank you for the explanation and answer to both of my comments. I appreciate the time you take out of your day to answer my questions.

Though I am curious as to how we'll manage to get our compilers to not bitch about functions with the same name. Unless it is that it is really just the same function with different parameters.

Are function(int a) and function (int b) considered different functions entirely? I understood them to be the same function with different parameters.

Thanks again!

The name of the parameters doesn't matter. The compiler cares about

* function name

* parameter types

* parameter order

* things that are covered in future lessons

If all these things match for two functions you'll get a compilation error, because you can't have two definitions of the same function. If at least one thing is different, it is considered an entirely different function. The compiler knows which function you want to call based on the arguments passed.

Thank you!

I must admit, I was confused at this point too.

Now I understand this:

Declaring a variable as float does not automatically SET the value assigned to it to the same type (float). A float variable must be assigned a float value ("f" suffix).

> Declaring a variable as float does not automatically SET the value assigned to it to the same type (float).

This is not correct. Float variables can ONLY hold float values. If you try to assign or initialize a non-float value to a float variable, the compiler will attempt to convert that value to a float. If the compiler is successful in doing so, the code will compile (possibly with a warning). If the compiler is unsuccessful, you will get a compile error.

Since you mention "f" suffix, let's take a look at this using some literal values:

In the f1 case, 5.0f is a float literal, so the float value 5.0 is initialized to float variable f1. Since they are the same type, no conversion is necessary.

In the f2 case, 5.0 is a double literal. So when double value 5.0 is initialized to float variable f2, the double value 5.0 must be converted to a float value 5.0 before it can be assigned to f2. This may result in a loss of precision or rounding errors.

Can you give me the coding solution of how to convert the quiz problems into scientific notation?