I have a 60G Flops model, a 4G Flops model, and a one-layer model with 3*3 kernels. I ran them offline on 64bit Qualcomm Snapdragon 845, Octa-core, 2.8GHz
Why is only the runtime of 60G 2 times greater than that of 4G in the target platform, while the runtime ratio in my own device is 10.
I use depthwise convolutions in 4G model, but I don't think it is the reason why it is so slow, because there is still a big gap between online and offline results for the one-layer network with only vanilla convolutions.
60G | 4G | one-layer | |
online | 0.8~0.9 | 0.3~0.4 | 0.01 |
offline | 1.3 | 0.13 | 0.001 |
I use depthwise convolutions in 4G model, but I don't think it is the reason why it is so slow, because there is still a big gap between online and offline results for the one-layer network with only vanilla convolutions.