Fundamental rule: a floating-point operation must approximate the corresponding real number arithmetic operation by rounding any result that is not a floating-point number to the nearest floating-point number.
In short: a fl(op) b = fl(a op b)
, where op
= +,*,-,/
.
A simple formula can be used to estimate the error resulting from floating point arithmetic.
Therefore: a fl(op) b = a op b + (a op b), where , and:
- is called the unit roundoff.
- in single precision, and in double precision.
Floating point numbers, Floating point arithmetic is different from regular arithmetic