Fundamental rule: a floating-point operation must approximate the corresponding real number arithmetic operation by rounding any result that is not a floating-point number to the nearest floating-point number.

In short: a fl(op) b = fl(a op b), where op = +,*,-,/.

A simple formula can be used to estimate the error resulting from floating point arithmetic.

Therefore: a fl(op) b = a op b + (a op b), where , and:

  • is called the unit roundoff.
  • in single precision, and in double precision.

Floating point numbers, Floating point arithmetic is different from regular arithmetic