IEEE754 Floating point > Special values and Ranges
As I mentioned in my previous blog, we shall now discuss details regarding range of values for IEEE754 floating point numbers, Denormalized forms, NAN and a simple algorithms for conversion between IEEE754 hexadecimal representations and decimal FloatingPoint Numbers.
Let’s take a look at the special values in IEEE754 floating point representation.
Special values:
Exponent field values of all 0s and all 1s are used to denote special values in this scheme
Zero
Zero is represented with an exponent field of zero and a mantissa field of zero. Depending on the sign bit, it can be a positive zero or a negative zero. Thus, 0 and +0 are distinct values, though they are treated as equal.
Infinity
Infinity is represented with an exponent of all 1s and a mantissa of all 0s. Depending on the sign bit, it can be a positive infinity(+¥) or negative infinity (¥). The infinity is used in case of the saturation on maximum representable number so that the computation could continue.
NaN
The value NaN (Not a Number) is used to represent a value that does not represent a real number. They are used in computations that generate undefined results so that with NaN the operations are defined for it to let the computations continue. NaN's are represented by a bit pattern with an exponent of all 1s and a nonzero mantissa. There are two categories of NaN: QNaN (Quiet NaN) and SNaN (Signalling NaN).
A QNaN is a NaN with the most significant fraction bit set (denotes indeterminate operations).
An SNaN is a NaN with the most significant fraction bit clear (denotes invalid operations).
Denormalized
If the exponent is all 0s, and the mantissa is nonzero, then the value is treated as a denormalizednumber. The denormalized numbers does not have an assumed leading 1 before the binary point. For Single precision, this represents a number (1)^{s} × 0.m × 2^{126}, where s is the sign bit and m is the mantissa. For double precision, it represents as (1)^{s} × 0.m × 2^{1022}.
Thus, following are the values corresponding to a given representation:
(Note that b used in the table is the bias)
Sign(s) 
Exponent (e) 
Mantissa (m) 
Range for Single Precision values in binary 
Range Name 
1 
11..11 
10..00:11.11 
__ 
QNaN 
1 
11..11 
00..01:01..11 
__ 
SNaN 
1 
11..11 
00..00 
< (22^{23}) × 2^{127} 
Infinity (Negative Overflow) 
1 
11..10:00..01 
11..11:00..00 
(22^{23}) × 2^{127}:2^{126} 
Negative Normalized 1.m × 2^{(eb)} 
1 
00..00 
11..11:00..01 
(12^{23}) × 2^{126}:2^{149} 
Negative Denormalized 0.m × 2^{(b+1)} 
__ 
__ 
__ 
2^{150}:< 0 
Negative Underflow 
1 
00..00 
00..00 
0 
0 
0 
00..00 
00..00 
+0 
+0 
__ 
__ 
__ 
> +0:2^{150} 
Positive Underflow 
0 
00..00 
00..01:11..11 
2^{149}: (12^{23}) × 2^{126} 
Positive Denormalized 0.m × 2^{(b+1)} 
0 
00..01:11..10 
00..00:01..11 
2^{126}:(22^{23}) × 2^{127} 
Positive Normalized 1.m × 2^{(eb)} 
0 
11..11 
00..00 
> (22^{23}) × 2^{127} 
+Infinity (Positvie Overflow) 
0 
11..11 
00..01:01..11 
__ 
SNaN 
0 
11..11 
10..00:11.11 
__ 
QNaN 
Range:
As, mentioned in the table above, range for the positive normalized no for single precision float is
2^{126} to (22^{23}) × 2^{127} . Note that the bias(b) here is 127.
Let’s see how did we arrive we arrive at these ranges. As mentioned table above, the positive normalized form would be represented as 1.m × 2^{(eb)} where m is mantissa, e is exponent and b is bias.
Thus, smallest normalized no for single precision would come out as 1.0…0( all 0’s after decimal) x 2^{1127} such that mantissa is 0 as and exponent is 1, thus:
1.0 x 2^{1127} à 2^{126}
Now, the largest normalized no for single precision would come out as 1.1…..1( 23 1’s after decimal) x 2^{254127} such that mantissa is all ones and exponent is also all 1s except the least significant bit(254), thus this no equals :
1……1( 24 ones) 2^{24}  1
 x 2^{254127} à  X 2^{127} à (22^{23}) × 2^{127}
^{ } 2^{23} 2^{23}
Again as mentioned in the table above, range for the positive denormalized no for single precision float is 2^{149} to (12^{23}) × 2^{126} .Note that the bias(b) here is 127 and denormalized form would be represented as: 0.m × 2^{(b+1)}
Thus, smallest denormalized no for single precision would come out as 0.00…..1 x 2^{127+1} such that mantissa has all the bits as 0 except the least significant bit and exponent is anyways 0, thus:
0.00…..1 x 2^{127+1} à 2^{23} x 2^{126} à 2^{149}
^{}
And the largest denormalized no for single precision would come out as 0.11…..1 x 2^{127+1} such that mantissa has all the bits as 1 and exponent is anyways 0, thus:
^{}
1……1( 23 ones) 2^{23}  1
 x 2^{127+1} à  X 2^{126} à (12^{23}) × 2^{126}
^{ } 2^{23} 2^{23}
^{}
Similarly, you can derive the ranges for double precision floats as well. The following table shows the ranges for the single as well the double precision floats for their positive as well as negative values.
Single Precision 
Double Precision 

Normalized form 
± 2^{126} to (22^{23})×2^{127} 
± 2^{1022} to (22^{52})×2^{1023} 
Denormalized form 
± 2^{149} to (12^{23})×2^{126} 
± 2^{1074} to (12^{52})×2^{1022} 
Algorithms:
^{}
Let’s take a look into an algorithm(written in C++) which takes a 32bit integer (which contains the simple bit representation for a single precision float) and returns an equivalent float value.
float single_float_from_storage_bits(int storagebits)
{
//Check the sign bit and assign the same to sign
int sign = ((storagebits & 0x80000000) == 0) ? 1 : 1;
// get the exponent value, bit postion 30  23
int exponent = ((storagebits & 0x7f800000) >> 23);
// get the mantissa value, bit position 22  00
int mantissa = (storagebits & 0x007fffff);
//if exponent is 0, it could be either 0 or denormalized form.
if (exponent == 0)
{
// since matissa is also 0, definitely this is a 0
if (mantissa == 0)
{
// We would decide +ve or ve 0 depending on sign
return (sign * 0.0f);
}
else
// else return the calculated denormalized value
return (float)(sign * mantissa * pow(2, 149));
}
//if exponent is all 1, then it could be either Infinity or NaN
else if (exponent == 0xff)
{
//if mantissa is 0, then it is +infnity or infinity
if (mantissa == 0)
{
// Use sign to decide +infnity or infinity
return ((sign == 0) ? INFINITY : +INFINITY) ;
}
// Else its a NaN, you can also check SNaN or QNan here
else return NaN;
}
// Now we are sure this is a Normalized no
else
{
// add the implied 24th bit of mantissa here
mantissa = 0x00800000;
// return the normalized form here
return (float)(sign * mantissa * pow(2, exponent150));
}
}
In a similar manner, you can do the vice versa, i.e given a single precision float, you can find out its storage bits representation layout.