8

Floating Point Operations

Floating point operations are widely applied in scientific computations. With limited number of digits, the range and precision of the numbers represented by floating point systems can be improved. In this chapter, we introduce floating point addition, subtraction, multiplication and division.

8.1 FLOATING POINT ADDITION/SUBTRACTION

Let X1 = (M1, E1) and X2 = (M2, E2) be two numbers in floating point representation, where Mi = Si|Mi| and Xi = (−1)Si · |Mi| · rEibias. We are to find Xout = X1 ± X2.

Two floating point numbers cannot be added/subtracted unless the two exponents of them are equal. An alignment is needed if the exponents of the two given numbers are different. Usually, we let the bigger exponent remain unchanged, and adjust the smaller exponent to be the same as the bigger one. For a number with exponent enlarged, its mantissa should be reduced in order to keep the value of the number as same as before. That is, the mantissa should be shifted right. The exponent was increased by |E1E2|, resulting in r|E1E2| times enlargement. The number of digit positions to be right shifted in mantissa should be |E1E2| as well resulting in r|E1E2| times reduction (indicated by a factor of r−(|E1E2|) below).

Let Xout = X1 ± X2 = (Mout, Eout). We have

Eout = max{E1, E2},

and

image

The addition/subtraction procedure includes the following steps.

  1. Alignment. ...

Get Arithmetic and Logic in Computer Systems now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.