Floating Point Operations

Floating point operations are widely applied in scientific computations. With limited number of digits, the range and precision of the numbers represented by floating point systems can be improved. In this chapter, we introduce floating point addition, subtraction, multiplication and division.


Let X1 = (M1, E1) and X2 = (M2, E2) be two numbers in floating point representation, where Mi = Si|Mi| and Xi = (−1)Si · |Mi| · rEibias. We are to find Xout = X1 ± X2.

Two floating point numbers cannot be added/subtracted unless the two exponents of them are equal. An alignment is needed if the exponents of the two given numbers are different. Usually, we let the bigger exponent remain unchanged, and adjust the smaller exponent to be the same as the bigger one. For a number with exponent enlarged, its mantissa should be reduced in order to keep the value of the number as same as before. That is, the mantissa should be shifted right. The exponent was increased by |E1E2|, resulting in r|E1E2| times enlargement. The number of digit positions to be right shifted in mantissa should be |E1E2| as well resulting in r|E1E2| times reduction (indicated by a factor of r−(|E1E2|) below).

Let Xout = X1 ± X2 = (Mout, Eout). We have

Eout = max{E1, E2},



The addition/subtraction procedure includes the following steps.

  1. Alignment. ...

Get Arithmetic and Logic in Computer Systems now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.