A weird runtime

chengshen · Mar 10, 2021

I have a 60G Flops model, a 4G Flops model, and a one-layer model with 3*3 kernels. I ran them offline on 64bit Qualcomm Snapdragon 845, Octa-core, 2.8GHz

	60G	4G	one-layer
online	0.8~0.9	0.3~0.4	0.01
offline	1.3	0.13	0.001

Why is only the runtime of 60G 2 times greater than that of 4G in the target platform, while the runtime ratio in my own device is 10.

I use depthwise convolutions in 4G model, but I don't think it is the reason why it is so slow, because there is still a big gap between online and offline results for the one-layer network with only vanilla convolutions.

Andrey Ignatov · Mar 11, 2021

Hi @chengshen,

chengshen said:
Why is only the runtime of 60G 2 times greater than that of 4G in the target platform, while the runtime ratio in my own device is 10.

Well, that's not very surprising. When you run your models on mobile devices, GFLOPS mean almost nothing: some ML ops might have a more efficient implementation at the expense of larger RAM consumption, some ops might be optimized for NPUs / DSPs, while the other might not be supported by the corresponding hardware at all or executed in a very inefficient way. So, if you have two models with the same GFLOPS performance, their real runtime on smartphones might be more than 10 times different depending on their exact architecture.

chengshen said:
I use depthwise convolutions in 4G model

This is actually a very good example of the op that is very computationally expensive for mobile NPUs / GPUs, thus you should try to avoid using it when developing NN models for mobile devices.

A weird runtime

chengshen

New member

Andrey Ignatov

Administrator