Real and Complex Floating Types
Three
types are defined to represent non-integer
real numbers: float
,
double
, and long
double
. These three types are called the
real floating types
.
The storage size and the internal representation of these types are not specified in the C standard, and may vary from one compiler to another. Most compilers follow the IEEE 754-1985 standard for binary floating-point arithmetic, however. Table 1-5 is also based on the IEEE representation.
|
Type |
Storage size |
Value range(decimal, unsigned) |
Precision (decimal) |
float |
4 bytes |
1.2E-38 to 3.4E+38 |
6 decimal places |
double |
8 bytes |
2.3E-308 to 1.7E+308 |
15 decimal places |
long double |
10 bytes |
3.4E-4932 to 1.1E+4932 |
19 decimal places |
The header file float.h defines symbolic constants that describe all aspects of the given representation (see Section 1.17).
Internal representation of a real floating-point number
The representation of a floating-point number x is
always composed of a sign
s,
a mantissa
m,
and an exponent
exp
to base 2:
x = s * m * 2exp, where 1.0 <= m < 2 or m = 0
The precision of a floating type is determined by the number of bits used to store the mantissa. The value range is determined by the number of bits used for the exponent.
Figure 1-2 shows the storage format for the
float type (32-bit) in IEEE representation.
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access