much higher latencies with NNAPI on Snapdragon 888

noodles

New member
CPU-int8CPU-FP16(ms)CPU-FP32(ms)GPU-int8(ms)GPU-FP16(ms)GPU-FP32(ms)NNAPI-int8(ms)NNAPI-FP16(ms)NNAPI-FP32(ms)
TF-lite(float)Unsupported
218​
220​
Unsupported
30​
31​
Unsupported
22​
41​
TF-lite(weight quantized)Unsupported
169​
163​
Unsupported
19​
32​
Unsupported
194​
156​
TF-lite(full quantized)
149​
UnsupportedUnsupported
25​
UnsupportedUnsupported
8710​
UnsupportedUnsupported

I run a very simple model ESPCN for super resolution, it is just a 3-layer convolution network. The results are very strange. Is there something wrong with quantization or other things? If the reason is quantization, how to deal with it?
 
Last edited:

noodles

New member
CPU-int8CPU-FP16(us)CPU-FP32(us)GPU-int8(us)GPU-FP16(us)GPU-FP32(us)NNAPI-int8(us)NNAPI-FP16(us)NNAPI-FP32(us)
TF-lite(float)Unsupported
218000​
220000​
Unsupported
30000​
31400​
Unsupported
22600​
41600​
TF-lite(weight quantized)Unsupported
169000​
163000​
Unsupported
19200​
32400​
Unsupported
194000​
156000​
TF-lite(full quantized)
149000​
UnsupportedUnsupported
25900​
UnsupportedUnsupported
8710000​
UnsupportedUnsupported

I run a very simple model ESPCN for super resolution, it is just a 3-layer convolution network. The results are very strange. Is there something wrong with quantization or other things? If the reason is quantization, how to deal with it?
by the way, my input is big, a 245 * 530 image, and output a 980 * 2120 image.
And I use Netron to check the model to make sure the input, output and weights are int8 numbers.
 
Last edited:

noodles

New member
I also run the quantized fsrcnn from MAI-2021, the latency is still much higher than CPU, GPU. But the models come with AI benchmark can run very fast with NNAPI-int8
 

Andrey Ignatov

Administrator
Staff member
Hi @noodles,

Is there something wrong with quantization or other things? If the reason is quantization, how to deal with it?

There are basically two issues:

1. The first one is that the ESPCN model contains sigmoid and tanh activations that are not well supported by TFLite quantizer and NNAPI.

2. The second issue is the same as described in this thread, the solution can be found in this post.

Finally, in case of the TFLite GPU delegate, it mainly de-quantizes the provided INT8 model to FP16 format, thus there is almost no difference in the runtime between these two modes.
 
Top