So i wanted to benchmark my own custom model on a android 9, 675 phone using just the NNAPI, so i used this app to benchmark the model and i am consistently getting an average latency of around 42-44 ms per frame, but when i benchmark the same model using the command line tool provided by the tensorflow i get around double the average latency per frame for same no. of iterations. I kept the same settings for both the app and for the command line tool. I don't understand why is this happening. For other accelerators the results are around the same but only for NNAPI , the results are different . I am providing my command line command i used to benchmark.
adb shell am start -S \
-n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
--es args '"--graph=/data/local/tmp/tflite_benchmarking_two_layer_sepconv_conv_dummy_fp16.tflite \
--num_threads=4
--num_runs=200
--use_nnapi=true
--nnapi_accelerator_name=""
--disable_nnapi_cpu:false
--nnapi_allow_fp16=true
--use_gpu=false
--gpu_backend=""
--use_hexagon=false
--use_xnnpack=false
--nnapi_execution_preference="undefined"
--enable_op_profiling=true"'
and in the app i have only enabled first two options only .
Please let me know if you know why is there a difference in these two .
adb shell am start -S \
-n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
--es args '"--graph=/data/local/tmp/tflite_benchmarking_two_layer_sepconv_conv_dummy_fp16.tflite \
--num_threads=4
--num_runs=200
--use_nnapi=true
--nnapi_accelerator_name=""
--disable_nnapi_cpu:false
--nnapi_allow_fp16=true
--use_gpu=false
--gpu_backend=""
--use_hexagon=false
--use_xnnpack=false
--nnapi_execution_preference="undefined"
--enable_op_profiling=true"'
and in the app i have only enabled first two options only .
Please let me know if you know why is there a difference in these two .