Benchmarking the A311D / VIM3 NPU

endian · Apr 29, 2021

I have a vim3 which includes an amlogic A311D chip.

This chip has a npu built in but it does not seem to have been used when benchmarked currently, as the ai benchmark is the same as on the s922x chip which does not have an npu.

What is required to be able to use the npu when running this benchmark?

What would you imagine might be missing for the npu to be used?

Andrey Ignatov · May 3, 2021

Hi @endian,

endian said:
What is required to be able to use the npu when running this benchmark?

What would you imagine might be missing for the npu to be used?

The situation with the A311D chipsest is quite complex. First of all, there is no way to access its NPU through Android: it doesn't support Android NN API (NN HAL is missing), there are no custom TensorFlow Lite delegates for this SoC as well as any proprietary SDKs.

Secondly, even when using Linux - you cannot run the standard TF / TFLite models on this platform: you need to compile them using Amlogic's NPU SDK provided upon a request. It also looks like this NPU is supporting a limited number of TFLite ops and can accelerate INT8 inference only, which means that just some standard quantized image classification models can be executed on it.

chro · Oct 15, 2021

I find out that vendor of NPU (VeriSilicon) created custom TFLite Delegate on their github repository.
After running tflite benchmark I've got 6.5 msec on single thread MobileNet v2 on NPU delegate

Andrey Ignatov · Oct 15, 2021

Hi @chro,

Thanks for the info.

chro said:
NPU (VeriSilicon) created custom TFLite Delegate on their github repository.

Yes, we have some internal plans for including this delegate to one of our next releases, though do not have a concrete timeline for this yet.

chro said:
I've got 6.5 msec on single thread MobileNet v2 on NPU delegate

That looks reasonable, you can find the results of another board with VeriSilicon NPU (VideoSmart VS680) here: https://ai-benchmark.com/ranking_IoT

micha · Jan 20, 2022

Hi! I just ran the AI Benchmark app 4.0.4 on my development board, a "Zora P1" with a Amlogic A311D. Its result was 9.69, pretty close to the VideoSmart VS680 in the linked ranking table. Does that mean the benchmark app uses the NPU, or did I miss something?

Thanks in advance for any additional Information!

micha · Jan 21, 2022

chro said:
I find out that vendor of NPU (VeriSilicon) created custom TFLite Delegate on their github repository.
After running tflite benchmark I've got 6.5 msec on single thread MobileNet v2 on NPU delegate

Hi! Did you get that delegate to work on Android? If so, can you possibly share some documentation on how you set it up, and what version of Android you were targeting?

Thanks in advance!

chro · Jan 27, 2022

micha said:
Hi! Did you get that delegate to work on Android? If so, can you possibly share some documentation on how you set it up, and what version of Android you were targeting?

Thanks in advance!

yep, just build using android A311D toolchain https://github.com/VeriSilicon/tflite-vx-delegate however, the building process was not straightfoward.

chro · Jan 27, 2022

micha said:
Hi! I just ran the AI Benchmark app 4.0.4 on my development board, a "Zora P1" with a Amlogic A311D. Its result was 9.69, pretty close to the VideoSmart VS680 in the linked ranking table. Does that mean the benchmark app uses the NPU, or did I miss something?

Thanks in advance for any additional Information!

Could you provide detail results, including int8 result on mobilenet, because it is more meaningful than generic score.

micha · Jan 31, 2022

I could not find an easy way to export the stats. Here's some screenshots, hope they help.

Benchmarking the A311D / VIM3 NPU

endian

New member

Andrey Ignatov

Administrator

chro

New member

Andrey Ignatov

Administrator

micha

New member

micha

New member

chro

New member

chro

New member

micha

New member

Attachments