You need to update to TensorFlow 2.4 first.
Almost every TF version contains a number of critical model quantization issues. Since TF / TFLite teams have some great problems with a meaningful strategy here, they prefer to change the entire quantization logic in almost every TF release. As a result, the same quantization code might produce a model with floating-point input / output nodes, inserted quantize / dequant ops, or even completely corrupted network when you use different TF builds. The provided code was developed and tested with TF 2.4 as this is the latest official build supporting the majority of key TFLite ops, and it is possible to get almost fully-quantized model using it. If you have some issues with updating to this version - you can just create a separate python virtual environment with tf-cpu-2.4 and use it only for model conversion.
As was mentioned in the tutorial, you should try to convert your model both with the new MLIR and the old TOCO converters as the resulting model will also be different in these two cases for the majority of architectures. I guess the old TOCO converter (
experimental_new_converter = False
) might be also a solution to this problem.
One additional tip - you can use
Netron for visualizing your TFLite model and checking the type of all its nodes. It's really convenient.