Colab demo doesn't use GPU
#2
by
aaronshenhao
- opened
I'm not really familiar with HF's transformer' library, so I'm not sure what's going on. But the inference demo is not using the T4 GPU at all, whereas it takes up 12 GB of system RAM, which is unexpected for such a small model. It's taking forever to complete. The demo also tries to load in the base model at the same time, which crashes Colab as it uses all the available system RAM.
Inference demo link: https://huggingface.co/venkycs/phi-2-instruct/blob/main/inference_phi_2_instruct.ipynb