No, that's not a typo - this model was adapted with some small modifications to the text completion task, being applied to word embeddings instead of images.
What's New:
Updated Qualcomm QNN and MediaTek Neuron delegates.
Enhanced stability and accuracy of the power consumption test.
Various bug fixes and performance improvements.
Download this release from the official website or from the Google Play store.
Feel free to discuss AI Benchmark...
Detailed AI Benchmark V5 results were released for over 50 IoT, smartTV and automotive platforms:
https://ai-benchmark.com/ranking_IoT
https://ai-benchmark.com/ranking_IoT_detailed
The results of the recently presented mobile chipsets including the Snapdragon 8 Gen 3, Dimensity 9300, Google...
What's New:
Added new NPU power consumption test.
Updated TFLite runtime.
Updated TFLite GPU, NNAPI, Qualcomm QNN, Hexagon NN and Samsung ENN delegates.
Updated in-app ranking table.
Various bug fixes and performance improvements.
Download this release from the official website or from the...
This is a very brief answer, but the general idea is as follows:
HTP = rebranded compute DSP (since Snapdragon 888 / Hexagon v68): contains HVX and HMX co-processors / modules.
Note that both HVX and HMX modules are also present in other Hexagon DSPs without HTP.
HTA = additional co-processor...
One can potentially extract all models directly from the benchmark APK file.
Feel free to use this forum for sharing or comparing the results, such posts will not be deleted or banned.
Yes, average or median of the results after removing the outliers.
For the majority of SoCs, the results are obtained based in phone measurements, but in some cases development kits are also used (e.g., when no actual devices have been released yet).
No, the SoC ranking is not taking into...
Yes, partly: the Hexagon 6xx family is denoted as DSPs in QNN, while the Hexagon 7xx family - as HTPs. Here you can find the full list of Hexagon processors.
There are also large architectural differences between these two families - the latest HTPs, for instance, are able to accelerate both...
INT8 models were running with the TFLite GPU delegate.
No, these are plain NPU/GPU runtime results.
Because of the bug in the iOS TFLite implementation.
In the standard benchmark mode, only INT8 inference is tested. However, one can also check the results of FP16 inference in the PRO mode.
You can switch between different NPU inference profiles in the settings, sustained speed is used by default.
Hi @bagofwater,
Thank you for your suggestions.
For FP16 inference, the targets are generated in FP32 mode that provides an accuracy of 7-8 digits after decimal point, so there are no issues here.
Yes, your model should have only one input layer in order to be executed successfully. The easiest workaround here would be to stack two input tensors together into a single input layer and then unstack them during inference.