# 8

# Floating Point Operations

Floating point operations are widely applied in scientific computations. With limited number of digits, the range and precision of the numbers represented by floating point systems can be improved. In this chapter, we introduce floating point addition, subtraction, multiplication and division.

## 8.1 FLOATING POINT ADDITION/SUBTRACTION

Let *X*_{1} = (*M*_{1}, *E*_{1}) and *X*_{2} = (*M*_{2}, *E*_{2}) be two numbers in floating point representation, where *M*_{i} = *S*_{i}|*M*_{i}| and *X*_{i} = (−1)^{Si} · |*M*_{i}| · *r*^{Ei−bias}. We are to find *X*_{out} = *X*_{1} ± *X*_{2}.

Two floating point numbers cannot be added/subtracted unless the two exponents of them are equal. An *alignment* is needed if the exponents of the two given numbers are different. Usually, we let the bigger exponent remain unchanged, and adjust the smaller exponent to be the same as the bigger one. For a number with exponent enlarged, its mantissa should be reduced in order to keep the value of the number as same as before. That is, the mantissa should be shifted right. The exponent was increased by |*E*_{1} − *E*_{2}|, resulting in *r*^{|E1−E2|} times enlargement. The number of digit positions to be right shifted in mantissa should be |*E*_{1} − *E*_{2}| as well resulting in *r*^{|E1−E2|} times reduction (indicated by a factor of *r*^{−(|E1−E2|)} below).

Let *X*_{out} = *X*_{1} ± *X*_{2} = (*M*_{out}, *E*_{out}). We have

*E*_{out} = *max*{*E*_{1}, *E*_{2}},

and

The addition/subtraction procedure includes the following steps.

*Alignment. ...*

Get *Arithmetic and Logic in Computer Systems* now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.