Speed and Memory Usage on Mobile

#2
by yunfengwang - opened

Hi All, Thanks for sharing this work!
Is there any summary info about memory usage and speed of running this onnx model on mobile device?

Hi, the peak memory consumption with mobile models is about 2.7G. Perf varies among devices. The more power the device is, the faster the speed. On a Samsung Galaxy S21 with cpu-int4-rtn-block-32-acc-level-4 model, we get about 8.6 tokens/s for prompt processing and 6.2 tokens/s for token generation. There are more optimizations coming soon for further higher performance, stay tuned. And if you prefer better performance with a slight trade-off in accuracy, we recommend using the model with acc-level-4.

Hi, the peak memory consumption with mobile models is about 2.7G. Perf varies among devices. The more power the device is, the faster the speed. On a Samsung Galaxy S21 with cpu-int4-rtn-block-32-acc-level-4 model, we get about 8.6 tokens/s for prompt processing and 6.2 tokens/s for token generation. There are more optimizations coming soon for further higher performance, stay tuned. And if you prefer better performance with a slight trade-off in accuracy, we recommend using the model with acc-level-4.

Hi Emma, could you help share some guidance on how to run this on my android phone? thanks!

Hi, the peak memory consumption with mobile models is about 2.7G. Perf varies among devices. The more power the device is, the faster the speed. On a Samsung Galaxy S21 with cpu-int4-rtn-block-32-acc-level-4 model, we get about 8.6 tokens/s for prompt processing and 6.2 tokens/s for token generation. There are more optimizations coming soon for further higher performance, stay tuned. And if you prefer better performance with a slight trade-off in accuracy, we recommend using the model with acc-level-4.

I would also like to know how to use it on Android, please.

Here is a sample of running phi3 on Android. PR should be merged very soon microsoft/onnxruntime-inference-examples#420.

Here is a sample of running phi3 on Android. PR should be merged very soon microsoft/onnxruntime-inference-examples#420.

Thank you very much! I have successfully got it running on the Android device!

kvaishnavi changed discussion status to closed

Sign up or log in to comment