Benchmarking the A311D / VIM3 NPU

endian

New member
I have a vim3 which includes an amlogic A311D chip.

This chip has a npu built in but it does not seem to have been used when benchmarked currently, as the ai benchmark is the same as on the s922x chip which does not have an npu.

What is required to be able to use the npu when running this benchmark?

What would you imagine might be missing for the npu to be used?
 

Andrey Ignatov

Administrator
Staff member
Hi @endian,

What is required to be able to use the npu when running this benchmark?

What would you imagine might be missing for the npu to be used?

The situation with the A311D chipsest is quite complex. First of all, there is no way to access its NPU through Android: it doesn't support Android NN API (NN HAL is missing), there are no custom TensorFlow Lite delegates for this SoC as well as any proprietary SDKs.

Secondly, even when using Linux - you cannot run the standard TF / TFLite models on this platform: you need to compile them using Amlogic's NPU SDK provided upon a request. It also looks like this NPU is supporting a limited number of TFLite ops and can accelerate INT8 inference only, which means that just some standard quantized image classification models can be executed on it.
 

chro

New member
I find out that vendor of NPU (VeriSilicon) created custom TFLite Delegate on their github repository.
After running tflite benchmark I've got 6.5 msec on single thread MobileNet v2 on NPU delegate
 

Andrey Ignatov

Administrator
Staff member
Hi @chro,

Thanks for the info.

NPU (VeriSilicon) created custom TFLite Delegate on their github repository.

Yes, we have some internal plans for including this delegate to one of our next releases, though do not have a concrete timeline for this yet.

I've got 6.5 msec on single thread MobileNet v2 on NPU delegate

That looks reasonable, you can find the results of another board with VeriSilicon NPU (VideoSmart VS680) here: https://ai-benchmark.com/ranking_IoT
 

micha

New member
Hi! I just ran the AI Benchmark app 4.0.4 on my development board, a "Zora P1" with a Amlogic A311D. Its result was 9.69, pretty close to the VideoSmart VS680 in the linked ranking table. Does that mean the benchmark app uses the NPU, or did I miss something?

Thanks in advance for any additional Information!
 

micha

New member
I find out that vendor of NPU (VeriSilicon) created custom TFLite Delegate on their github repository.
After running tflite benchmark I've got 6.5 msec on single thread MobileNet v2 on NPU delegate
Hi! Did you get that delegate to work on Android? If so, can you possibly share some documentation on how you set it up, and what version of Android you were targeting?

Thanks in advance!
 

chro

New member
Hi! I just ran the AI Benchmark app 4.0.4 on my development board, a "Zora P1" with a Amlogic A311D. Its result was 9.69, pretty close to the VideoSmart VS680 in the linked ranking table. Does that mean the benchmark app uses the NPU, or did I miss something?

Thanks in advance for any additional Information!
Could you provide detail results, including int8 result on mobilenet, because it is more meaningful than generic score.
 

micha

New member
I could not find an easy way to export the stats. Here's some screenshots, hope they help.
 

Attachments

  • device-2022-01-19-133510.png
    device-2022-01-19-133510.png
    1 MB · Views: 10
  • device-2022-01-19-133708.png
    device-2022-01-19-133708.png
    510.3 KB · Views: 10
  • device-2022-01-19-133746.png
    device-2022-01-19-133746.png
    494.6 KB · Views: 9
  • device-2022-01-19-133817.png
    device-2022-01-19-133817.png
    484.7 KB · Views: 8
Top