A weird runtime

chengshen

New member
I have a 60G Flops model, a 4G Flops model, and a one-layer model with 3*3 kernels. I ran them offline on 64bit Qualcomm Snapdragon 845, Octa-core, 2.8GHz

60G4Gone-layer
online0.8~0.90.3~0.40.01
offline1.30.130.001
Why is only the runtime of 60G 2 times greater than that of 4G in the target platform, while the runtime ratio in my own device is 10.

I use depthwise convolutions in 4G model, but I don't think it is the reason why it is so slow, because there is still a big gap between online and offline results for the one-layer network with only vanilla convolutions.
 

Andrey Ignatov

Administrator
Staff member
Hi @chengshen,

Why is only the runtime of 60G 2 times greater than that of 4G in the target platform, while the runtime ratio in my own device is 10.

Well, that's not very surprising. When you run your models on mobile devices, GFLOPS mean almost nothing: some ML ops might have a more efficient implementation at the expense of larger RAM consumption, some ops might be optimized for NPUs / DSPs, while the other might not be supported by the corresponding hardware at all or executed in a very inefficient way. So, if you have two models with the same GFLOPS performance, their real runtime on smartphones might be more than 10 times different depending on their exact architecture.

I use depthwise convolutions in 4G model

This is actually a very good example of the op that is very computationally expensive for mobile NPUs / GPUs, thus you should try to avoid using it when developing NN models for mobile devices.
 
Top