#### bagofwater

##### New member

"For each corresponding test, the L1 loss is computed between the target and actual outputs produced by the deep learning models."

Ideally, when computing error of a [rounded] result (whatever metric you like), it should be compared to an exact result, not to a result that could have similar [rounding] errors. That usually isn't feasible, but just comparing to the result that should have significantly smaller rounding errors (to exact) than the actual outputs is likely good enough.

Results (forward-pass/inference) from training also have error. Even with IEEE-754 compliance, using the same types going from one platform to another can result in differences. Section 11 of IEEE-754-2019 talks about reproducibility and typically is not feasible across platforms for ML operations. The main problem is that addition isn't associative - dot-products/summations (from matrix-multiply or convolutional layer) results are sensitive to order/grouping.

For <=16-bit inference, I suggest producing reference results that use at least FP32 intermediates/outputs. And if you are trying to determine error from FP32 inference, you'd need FP64 as a reference. Training could still happen in FP16, but a final inference using more accurate FP32 would be best. Maybe FP16 reference is good enough for int8, but FP16 only has 11 bits of precision for unsigned data so it may be only marginally more accurate.

If FP16 references are used, maybe this is only a problem FP16 inference?