Real-Time Image Super-Resolution Challenge

Andrey Ignatov

Administrator
Staff member
What can I do

Either rename your model to model.tflite and put it into the Download folder, or select the Show internal storage option in the prompted file explorer and then go to your model location.

How can I generate the dynamic input tflite model for full image evaluation because it requires us to have a fixed input shape during the conversion?

You do not need to do this - the model used for runtime evaluation should have a fixed input tensor size.

Why is this?

Is your TFLite model running fine with NNAPI acceleration option? Otherwise, there is an op in your model that is not supported by NNAPI and thus this error happens. The full list of NNAPI-1.2 compatible ops is provided here.
 

stevenlau

New member
Is it using the model before the quantization to generate the full resolution image for submission? Which model should I use to generate the full resolution image for quality evaluation as I know there is some performance degradation after the quantization? How can I generate the dynamic resolution image TFLite model if the quantized TFLite model are used for quality evaluation since all the validation image are different resolution?
 
Last edited:

Andrey Ignatov

Administrator
Staff member
Is it using the model before the quantization to generate the full resolution image for submission

No! You should use the model obtained after quantization to generate full-resolution images - during the final test phase, we will use only quantized model to evaluate its fidelity and runtime results.

there is some performance degradation after the quantization

Obviously, this is the cost of quantization: you are reducing the bit-width of model weights from 32-bit to 8-bit to get faster runtime. The results obtained after this operation will always be worse compared to the original model even when using quantized aware training.

How can I generate the dynamic resolution image TFLite model

Set the second and third input dimensions to None and set the experimental_new_converter option to True.
 

Zhang9678

New member
Either rename your model to model.tflite and put it into the Download folder, or select the Show internal storage option in the prompted file explorer and then go to your model location.



You do not need to do this - the model used for runtime evaluation should have a fixed input tensor size.



Is your TFLite model running fine with NNAPI acceleration option? Otherwise, there is an op in your model that is not supported by NNAPI and thus this error happens. The full list of NNAPI-1.2 compatible ops is provided here.
My TFLite model can run fine with NNAPI acceleration option.....I can not find the problem.
 

JetDZC

New member
Should we fully quantize the model or can there be some layer unquantized for better accuracy?
Hello, can I ask you a question? How much does psnr decrease if using fully quantized model? I use quantize-aware training but the performance still decreases up to 5dB. I am wondering whether there is something wrong...
 

deepernewbie

New member
Even when I use Quatization Aware Training, when I convert from a model obtaining around 30dB it drops to 22-23dB's what am I doing wrong while quantizing can any body comment on this?

Here is a minimal working example with the problem

Python:
class KerasLite:
    def __init__(self,interpreter: tf.lite.Interpreter,float_type=np.float32):
        self.interpreter = interpreter
        self.input_details = interpreter.get_input_details()
        self.output_details = interpreter.get_output_details()
        self.output_height = self.output_details[0]['shape'][1]
        self.output_width = self.output_details[0]['shape'][2]

    def predict(self,input_tensor: np.array):
        self.interpreter.resize_tensor_input(self.input_details[0]["index"], input_tensor.shape)
        self.interpreter.allocate_tensors()
        self.interpreter.set_tensor(self.input_details[0]["index"],input_tensor)
        self.interpreter.invoke()
        output_data = self.interpreter.get_tensor(self.output_details[0]["index"])
        return output_data

def imread_RGB_norm(filename,float_type=np.float32):
    img = cv2.imread(filename)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    img = img.astype(float_type)
    img = img / 255.0
    return img

def representative_dataset():
    dataset_folder_lr = "../Test/DIV2K_train_LR_bicubic_X3"
    files = sorted(os.listdir(dataset_folder_lr))
    for file in files:
        data = imread_RGB_norm(dataset_folder_lr + "/" + file)
        data = data[None, :, :, :]
        yield [data]

def network():
    lr_image_input = kl.Input(shape=(None, None, 3), name="lr_image")
    x_i = tfmot.quantization.keras.quantize_annotate_layer(kl.Conv2D(27, (3, 3), strides=1, padding="same"))(lr_image_input)
    hr_img_reconstructed_diff = tfmot.quantization.keras.quantize_annotate_layer(kl.Lambda(lambda x: tf.nn.depth_to_space(x, 3)),quantize_config=default_8bit_quantize_configs.NoOpQuantizeConfig())(x_i)

    hr_img_reconstructed_bic = tfmot.quantization.keras.quantize_annotate_layer(kl.UpSampling2D((3,3)),quantize_config=default_8bit_quantize_configs.NoOpQuantizeConfig())(lr_image_input)

    hr_img_reconstructed = tfmot.quantization.keras.quantize_annotate_layer(kl.Add())((hr_img_reconstructed_bic,hr_img_reconstructed_diff))

    model = km.Model(lr_image_input, hr_img_reconstructed, name="TestQAT")
    return model


simple_sr_model = network()
with tfmot.quantization.keras.quantize_scope():
    q_aware_model = tfmot.quantization.keras.quantize_apply(simple_sr_model)

q_aware_model.summary()
optim = ko.Adam(lr=0.001)
q_aware_model.compile(optimizer=optim, loss="mse")
q_aware_model.fit(training_images, epochs=1) #here training_images is just a keras.utils.Sequence class for loading images
q_aware_model.save("qat.h5")


validation_dataset_folder_hr = "../Test/DIV2K_train_HR"
validation_dataset_folder_lr = "../Test/DIV2K_train_LR_bicubic_X3"
#ignore the name of ValidationDataset it is just a generator for the folder returns
# return [lr_image, hr_image, bicubic_image], hr_image
validation_images = u.ValidationDataset(validation_dataset_folder_hr, validation_dataset_folder_lr, downscale)

test_images = validation_images[0] #get and image from the folders above
lr_im = test_images[0][0] #get the lr image
hr_im = test_images[1] #get the hr image
bic_im = test_images[0][2] #the the bicubic int. image
sr_im = q_aware_model.predict(lr_im)

mse_sr = np.mean(np.square(sr_im-hr_im))
mse_bic = np.mean(np.square(bic_im-hr_im))

psnr_sr = 10*np.log10(1/mse_sr)
psnr_bic = 10*np.log10(1/mse_bic)

print("psnr sr %f" %psnr_sr)
print("psnr bic %f" %psnr_bic)

#sample output
#psnr sr 34.226449   <<<<<<<<<<<<RESULT OF SIMPLE SR MODEL
#psnr bic 35.126754   <<<<<<<<<<<<RESULT OF BICUBIC INTERPOLATION

print("Quantazition Started!")

#quantize model
q_aware_model.save("./TFModel", overwrite=True, include_optimizer=False, save_format='tf')
q_aware_model = tf.saved_model.load("./TFModel")
concrete_func = q_aware_model.signatures[tf.saved_model.DEFAULT_SERVING_SIGNATURE_DEF_KEY]
concrete_func.inputs[0].set_shape([1, None, None, 3])
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.experimental_new_converter = True
converter.experimental_new_quantizer = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]

converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

converter.inference_type = tf.uint8
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

model_byte = converter.convert()
with open("model_test.tflite", "wb") as file:
    file.write(model_byte)

model_int = tf.lite.Interpreter(model_path="model_test.tflite")
model_lite = KerasLite(model_int) #this is my wrapper for prediction as defined above

sr_im_int = model_lite.predict((255*lr_im).astype(np.uint8))/255.

mse_sr_int = np.mean(np.square(sr_im_int-hr_im))
psnr_sr_int = 10*np.log10(1/mse_sr_int)

print("psnr quantized sr %f" %psnr_sr_int)

#sample output
#psnr quantized sr 23.165068  <<<<<<<<<<< a reduction of about 10dB!!! ?

#HERE the image sr_im_int has very dull colors
 
Last edited:

deepernewbie

New member
Hello, can I ask you a question? How much does psnr decrease if using fully quantized model? I use quantize-aware training but the performance still decreases up to 5dB. I am wondering whether there is something wrong...
I am in the same shoes I hope some one can help
 

Andrey Ignatov

Administrator
Staff member
Should we fully quantize the model or can there be some layer unquantized for better accuracy?

No. The reason is that Synaptics NPU can run only quantized models, same as the Qualcomm's Hexagon NN DSP or Google's Edge TPU. If your model contains some floating-point ops - it won't run on them, they simply don't have silicon for this.

I use quantize-aware training but the performance still decreases up to 5dB
when I convert from a model obtaining around 30dB it drops to 22-23dB's
I am in the same shoes I hope some one can help

First of all, this is normal - please check the above response. Secondly, you should forget all the information and accuracy numbers related to quantized networks that you saw before. Especially forget all claims that "the results are nearly the same", "the accuracy loss is negligible". If you are quantizing neural networks to 8 bit - the loss will almost definitely be serve. How serve - this depends on your architecture and quantization method, and the goal of this challenge is to get as good results as possible.

Some sample network quantization results can be found, e.g., in this paper from MediaTek: for image deblurring problem / neural network, they got an accuracy drop of 2.2dB PSNR when using post-training quantization and around 1dB loss when using their own quantized aware training tools. So, you should expect to get some similar or lower values when quantizing your models in this challenge.
 

loser

New member
make sure your have model.tflite in Downloads folder it seems like it could not find the file
thanks,u have solve my many question ,but I m still have something need to resovle. Now I want to try edsr with quantization aware training to pratice. I inplement unsample :
-----------------------------------------------------------------------------------------------------------------------------------
def upsample(x, num_filters):
x = Conv2D(num_filters * (3 ** 2), 3, padding='same')(x)
Depth2Space = tf.keras.layers.Lambda(lambda x: tf.nn.depth_to_space(x, 3))
x = tfmot.quantization.keras.quantize_annotate_layer(Depth2Space,
quantize_config=default_8bit_quantize_configs.NoOpQuantizeConfig())(x)
return x
--------------------------------------------------------------------------------------------------------------------------------------

Is it right? after train section , I use

--------------------------------------------------------------------------------------------------------------------------------------
_, pretrained_weights = tempfile.mkstemp('.tf')

model.save_weights(pretrained_weights)

base_model = edsr()
base_model.load_weights(pretrained_weights) # optional but recommended for model accuracy
quant_aware_model = tfmot.quantization.keras.quantize_model(base_model)

# Save or checkpoint the model.
_, keras_model_file = tempfile.mkstemp('.h5')
quant_aware_model.save(keras_model_file)

# `quantize_scope` is needed for deserializing HDF5 models.
with tfmot.quantization.keras.quantize_scope():
loaded_model = tf.keras.models.load_model(keras_model_file)
----------------------------------------------------------------------------------------------------------------------------------------
but tell me error :

ValueError: Unable to clone model. This generally happens if you used custom Keras layers or objects in your model. Please specify them via `quantize_scope` for your calls to `quantize_model` and `quantize_apply`.

I m going to collapse ,because have no template in github 0.0
 

deepernewbie

New member
dont use quantize model use quantize annotate layers for individual layers. the problem is first you use quantize annotate layers in upsample and the you try to quantize entire model so you already have a quantize annotated layer but quantize model tries to do annotation for all of the layers from the scratch.
thanks,u have solve my many question ,but I m still have something need to resovle. Now I want to try edsr with quantization aware training to pratice. I inplement unsample :
-----------------------------------------------------------------------------------------------------------------------------------
def upsample(x, num_filters):
x = Conv2D(num_filters * (3 ** 2), 3, padding='same')(x)
Depth2Space = tf.keras.layers.Lambda(lambda x: tf.nn.depth_to_space(x, 3))
x = tfmot.quantization.keras.quantize_annotate_layer(Depth2Space,
quantize_config=default_8bit_quantize_configs.NoOpQuantizeConfig())(x)
return x
--------------------------------------------------------------------------------------------------------------------------------------

Is it right? after train section , I use

--------------------------------------------------------------------------------------------------------------------------------------
_, pretrained_weights = tempfile.mkstemp('.tf')

model.save_weights(pretrained_weights)

base_model = edsr()
base_model.load_weights(pretrained_weights) # optional but recommended for model accuracy
quant_aware_model = tfmot.quantization.keras.quantize_model(base_model)

# Save or checkpoint the model.
_, keras_model_file = tempfile.mkstemp('.h5')
quant_aware_model.save(keras_model_file)

# `quantize_scope` is needed for deserializing HDF5 models.
with tfmot.quantization.keras.quantize_scope():
loaded_model = tf.keras.models.load_model(keras_model_file)
----------------------------------------------------------------------------------------------------------------------------------------
but tell me error :

ValueError: Unable to clone model. This generally happens if you used custom Keras layers or objects in your model. Please specify them via `quantize_scope` for your calls to `quantize_model` and `quantize_apply`.

I m going to collapse ,because have no template in github 0.0
 

deepernewbie

New member
No. The reason is that Synaptics NPU can run only quantized models, same as the Qualcomm's Hexagon NN DSP or Google's Edge TPU. If your model contains some floating-point ops - it won't run on them, they simply don't have silicon for this.





First of all, this is normal - please check the above response. Secondly, you should forget all the information and accuracy numbers related to quantized networks that you saw before. Especially forget all claims that "the results are nearly the same", "the accuracy loss is negligible". If you are quantizing neural networks to 8 bit - the loss will almost definitely be serve. How serve - this depends on your architecture and quantization method, and the goal of this challenge is to get as good results as possible.

Some sample network quantization results can be found, e.g., in this paper from MediaTek: for image deblurring problem / neural network, they got an accuracy drop of 2.2dB PSNR when using post-training quantization and around 1dB loss when using their own quantized aware training tools. So, you should expect to get some similar or lower values when quantizing your models in this challenge.
Thanks this was helpful!
 

loser

New member
dont use quantize model use quantize annotate layers for individual layers. the problem is first you use quantize annotate layers in upsample and the you try to quantize entire model so you already have a quantize annotated layer but quantize model tries to do annotation for all of the layers from the scratch.
do you know whether I can quantize sigmoid layer?
 

richlaji

New member
Hi,
the email said
If your model performs any image pre-processing (rescaling, normalization, etc.) - it should be integrated directly into it, no additional scripts are accepted.
I inference my int8 tflite model with the following code

Python:
#load int8 model and get int quantization info(zero point & scale for input and output)
interpreter = tf.lite.Interpreter(model_path)
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_scale, input_zero_point = input_details[0]["quantization"]
output_scale, output_zero_point = output_details[0]["quantization"]
#read image
raw_img = cv2.imread(imgName).astype(np.float32)
raw_img = raw_img[np.newaxis, :] / 255
#turn float32 to int8
raw_img = raw_img / input_scale + input_zero_point
raw_img = raw_img.astype(np.uint8)
#inference
interpreter.resize_tensor_input(input_details[0]['index'], raw_img.shape)
interpreter.allocate_tensors()
interpreter.set_tensor(input_details[0]['index'], raw_img)
interpreter.invoke()
#get output
sr = interpreter.get_tensor(output_details[0]['index'])
sr = sr.astype(np.float32)
#turn float32 to uint8
sr = (sr - output_zero_point) * output_scale
sr_img = (np.squeeze(sr).clip(0, 1) * 255).astype(np.uint8)

There are 4 processes before or after inference:
(1) div 255 after read image (raw_img[np.newaxis, :] / 255)
(2) int8 quantize the input image (raw_img / input_scale + input_zero_point)
(3) unquantize the output ((sr - output_zero_point) * output_scale)
(4) mul 255 after unquantize (sr_img = (np.squeeze(sr).clip(0, 1) * 255).astype(np.uint8))
and I cant get the reasonable result with this code

I have two question
(1) which process should be integrated directly into my model. Can (2)&(3) which include floating point op be integrated to int8 model?
(2) Can you provide an example code for inference with int8 model(In particular, how to use zero point & scale for int8 quantized input and output)? The code here looks like inference with float32

Thank you~
 

deepernewbie

New member
Hi,
the email said

I inference my int8 tflite model with the following code

Python:
#load int8 model and get int quantization info(zero point & scale for input and output)
interpreter = tf.lite.Interpreter(model_path)
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
input_scale, input_zero_point = input_details[0]["quantization"]
output_scale, output_zero_point = output_details[0]["quantization"]
#read image
raw_img = cv2.imread(imgName).astype(np.float32)
raw_img = raw_img[np.newaxis, :] / 255
#turn float32 to int8
raw_img = raw_img / input_scale + input_zero_point
raw_img = raw_img.astype(np.uint8)
#inference
interpreter.resize_tensor_input(input_details[0]['index'], raw_img.shape)
interpreter.allocate_tensors()
interpreter.set_tensor(input_details[0]['index'], raw_img)
interpreter.invoke()
#get output
sr = interpreter.get_tensor(output_details[0]['index'])
sr = sr.astype(np.float32)
#turn float32 to uint8
sr = (sr - output_zero_point) * output_scale
sr_img = (np.squeeze(sr).clip(0, 1) * 255).astype(np.uint8)

There are 4 processes before or after inference:
(1) div 255 after read image (raw_img[np.newaxis, :] / 255)
(2) int8 quantize the input image (raw_img / input_scale + input_zero_point)
(3) unquantize the output ((sr - output_zero_point) * output_scale)
(4) mul 255 after unquantize (sr_img = (np.squeeze(sr).clip(0, 1) * 255).astype(np.uint8))
and I cant get the reasonable result with this code

I have two question
(1) which process should be integrated directly into my model. Can (2)&(3) which include floating point op be integrated to int8 model?
(2) Can you provide an example code for inference with int8 model(In particular, how to use zero point & scale for int8 quantized input and output)? The code here looks like inference with float32

Thank you~
My understanding is

raw_img = cv2.imread(imgName)

is the input

so anything below that should be integrated into the code

especially these parts

raw_img = raw_img / input_scale + input_zero_point

and

sr = (sr - output_zero_point) * output_scale

so basically these are all "unfortunately" elementwise operations
 

richlaji

New member
My understanding is

raw_img = cv2.imread(imgName)

is the input

so anything below that should be integrated into the code

especially these parts

raw_img = raw_img / input_scale + input_zero_point

and

sr = (sr - output_zero_point) * output_scale

so basically these are all "unfortunately" elementwise operations
Thank you for your reply~
Can the quantization part be integrated into int8 model? The type of scale & zero point is float and is produced after converting tf model to tflite model.
 

deepernewbie

New member
Thank you for your reply~
Can the quantization part be integrated into int8 model? The type of scale & zero point is float and is produced after converting tf model to tflite model.
you should do the scaling (scale zero point etc) and let tflite converter do its job for the quantization part. It should be like

lr_image = cv2.imread(filename)
#no extra code here
sr_image = super_duper_model(lr_image) #here sr_image is 0-255 uint8
#no extra code here
cv2.imshow(sr_image) #voila!
 

fengtao.xie

New member
Do we need to include the SR images in the final zip. We got failure when we submit a zip file which includes all the files you required for testing phase, but fails as "Expected 100 .png " occurred.

CodaLab - Competition
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
Traceback (most recent call last):
File "/tmp/codalab/tmpaefcVn/run/program/evaluation.py", line 95, in
raise Exception('Expected %d .png images'%len(ref_pngs))
Exception: Expected 100 .png images
 
Last edited:

richlaji

New member
you should do the scaling (scale zero point etc) and let tflite converter do its job for the quantization part. It should be like

lr_image = cv2.imread(filename)
#no extra code here
sr_image = super_duper_model(lr_image) #here sr_image is 0-255 uint8
#no extra code here
cv2.imshow(sr_image) #voila!
ok~I understand what you mean for inference part, but I am confused about that both zero point & scale are generated after tf model to quantization tflite (is it right?) and how to add these op to tflite
Thank you~
 

wuwuwuwuwu

New member
Please don't confuse model runtime and fidelity evaluation:

1. For evaluating model runtime, you need to get a TFLite model processing the images of resolution 360 x 640 pixels.
2. To check the quality of the reconstructed visual results, your model should process the images of original size (actually, not original - downscaled by a factor of 3).
1.For evaluating model runtime, the input shape [1, 360, 640, 1] OR [1, 360, 640, 3] ?
2.To check the quality of the reconstructed visual result, should I convert results from RGB to YUV?
 

Andrey Ignatov

Administrator
Staff member
Do we need to include the SR images in the final zip. We got failure when we submit a zip file which includes all the files you required for testing phase, but fails as "Expected 100 .png " occurred.

No, you don't need to include SR images. There is indeed a problem with Codalab submission server now, should be fixed tomorrow. For now, you can just ignore this error as it doesn't affect your submission.
 
Last edited:

hansi

New member
Excuse me, what should be the format of submission? I submit it according to the requirements , but there will be the following mistakes. The tflite file can work normally in the app. Hope to get an answer.
This is the error I encountered.
`RV%IRSE19%9OU[HZ1X5I63.png

Here is my submission.

GTLBWDKZ0P4UKMUJSSSJZ59.png
 

Andrey Ignatov

Administrator
Staff member
I submit it according to the requirements , but there will be the following mistakes.

You did everything correct, the problem is with Codalab. It should be fixed tomorrow, for now you can just ignore these errors as they will not be taken into account - we will be checking the last submission uploaded by you.
 
If you are making your final submission - the results on the final test subset will be released after the end of the competition.
If I choose DEVELOPMENT, I upload 100 png and tflite, then I can see result in google doc, right?
If I choose Testing, I upload final format, after end of competition I can see result, right?
 

Andrey Ignatov

Administrator
Staff member
If I choose DEVELOPMENT, I upload 100 png and tflite, then I can see result in google doc, right?
If I choose Testing, I upload final format, after end of competition I can see result, right?

Yes.

What's mean of "finished", is it mean the zip file is ok?

This means that your final submission archive was successfully uploaded to the server, the script is not checking its contents.
 

richlaji

New member
Hi,
Can you provide a python code for inference with the model_none.tflite? I want to know whether the converted model is right.

Code:
#read model
interpreter = tf.lite.Interpreter(model_path=model_path)
#read image & raw_img is int8 np array
raw_img = np.array(Image.open(imgName).convert('RGB'))
raw_img = raw_img[np.newaxis, :]
#allocate tensor
interpreter.resize_tensor_input(input_details[0]['index'], raw_img.shape)
interpreter.allocate_tensors()
interpreter.set_tensor(input_details[0]['index'], raw_img)
#inference
interpreter.invoke()
#sr_img is int8 np array
sr_img = interpreter.get_tensor(output_details[0]['index'])[0]

I use the code above for inference. Am I right?
Thank you~
 

Finn

New member
Hi,
Can you provide a python code for inference with the model_none.tflite? I want to know whether the converted model is right.
Hi,I meet the problem in converting model to model_none.tflite in other challenge,
Can you tell me how can you convert sucessfully?

code:
converter = tf.lite.TFLiteConverter.from_keras_model(model_tf.build(input_shape=(1,None,None,30)))
converter.experimental_new_converter = True
tflite_model = converter.convert()


 

JetDZC

New member
ok~I understand what you mean for inference part, but I am confused about that both zero point & scale are generated after tf model to quantization tflite (is it right?) and how to add these op to tflite
Thank you~
How can we deal with this post scaling using zero point & scale? Thanks a lot!
 

Andrey Ignatov

Administrator
Staff member

hansi

New member
Halo,I choose Devolopment .I submit a zip( name: model(cea).zip ) to u.The zip have 1 tflite(360,640) and 100 pictures and Readme . But u tell me the error :

WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
Traceback (most recent call last):
File "/tmp/codalab/tmpb9X7xN/run/program/evaluation.py", line 95, in
raise Exception('Expected %d .png images'%len(ref_pngs))
Exception: Expected 100 .png images

I have try many days about 20 times. Can u tell me what is my wrong?????? I m in hurry.Please response me ,thanks
 

xindongzhang

New member
ok~I understand what you mean for inference part, but I am confused about that both zero point & scale are generated after tf model to quantization tflite (is it right?) and how to add these op to tflite
Thank you~
I am also confused with it, have you solved this problem? Thanks.
 

Andrey Ignatov

Administrator
Staff member
I wonder if the output should be rescaled by "output_scale & output_zero_point" to match the correct range

Your output is already uint8 by definition (since quantized model can output only these values), therefore no rescaling is needed here if quantization is done correctly.
 

xindongzhang

New member
Your output is already uint8 by definition (since quantized model can output only these values), therefore no rescaling is needed here if quantization is done correctly.
Thanks, I understand what you mean. But uint8 is only the bound of quantized Op, it is not the actually space of fp32-output. Only if we de-quantize the output of network, we then get the correct numerical range and result.
 
Top