Saurabh Sharma blogs: IEEE 754 explained

At last there is one very intersting practical example is given but go through start for full understanding.

First we will start from some basics and than we will carry on to the topic of its range.

IEEE 754 Floating representation

It is Standard for floating point storage

There are two representation :
Single precision 1+8+23 format
Double precision 1+11+52 format

They are arranged in form of:
S + E + M
where; s=sign E=exponent M=mantissa
Bias= 2E-1 -1
Exponent is adjusted so that leading bit (MSB) of mantissa is 1
Since MSB of mantissa is always 1, there is no need to store it

The exponent represents a range of numbers; thus a bias value must be subtracted from the stored exponent to yield the actual exponent. The single precision bias is 127, and the double precision bias is 1,023.

more bits for significand gives more accuracy and more bits for exponent increases range.

Example
Represent 100.25 in the IEEE 754 Floating point format.

• Step 1:
Convert decimal to binary
100 = 1100100
.25= .01
100.25 = 1100100.01

• Step 2
Normalise the binary number obtained
+1100100.01 = +1.10010001x2^6

• Step 3
Convert exponent:
Single precision bias is 127 or 7F .
Exponent : 127 +6 = 133
or in Hex: 7f h +6h = 85 h
133=85h=10000101

• Step 4
(a) Zero pad exponent to 8-bits on left.
(b) Zero pad mantissa to 23 bits on right.

Result is 42C88000h

Range of float variables

So now coming to the topic of range of float variables. Internally the float variables are represented in processor register as ieee 754 format.

We know that number their in hardware register must be subtracted from 127
x-127=max
so x will be 255 to make max as maximum
hence max=255-127=128
so maximum number can be as
2128 * [1+(1-2-23)]
=6.8 * 1038

Similarly for min we do as
x-127=min
x can be 0 to make min as minimum
min=-127
so minimum number is 2-127 * 1 = 5.8 * 10-39

so they appear to be the minimum and maximum value of float variables.
don’t start jumping its not the end of problem…….
As u have seen in some C books that max float number is 3.4 * 1038

The reason is the IEEE standard reduces this above our calculated range slightly to free bit patterns that have special meanings.

These specials are:
1. +/- 0 M & E =0

2. +/-∞ M=0 , E=255

Wait a minute .
If 255 is reserved for infi than I cant have x=255 for finding max
I must take it as 254 so that max is 127
2127 * [1+(1-2-23)]
=3.4 * 1038

3. Very small numbers and criteria for 1 as implied digit for mantissa is not there for this special case
E=0 , M ≠0
Than Number= (-1)s * (2-126) * (0.M)
See that from 1 and 3 we cant assign reserved 0 exponent for x , we must take x=1
So min =-126
so minimum number is 2-126 * 1 = 1.12 * 10-38

4. NAN E=255 and M ≠0

So we have found the minimum and maimum values of float variables.

Saurabh Sharma blogs

Tuesday, July 7, 2009

IEEE 754 explained

No comments:

Post a Comment

About Me

Followers

Blog Archive