Using a laptop with a 4GB GPU for inference - does it make at all sense to try?

#6
by grzenkom - opened

I have a laptop with 32GB of RAM memory and Quadro T1000 card with 4GB of memory. I am able to load the model successfully, but the text generation is slow (i.e. I am yet to see a single generated token).

Is there a chance to get any output in reasonable time? Let's say 15-20 minutes would be OK for the experiments I am conducting.

I am wondering if there is anything I could improved in my setup or the only way to go is switching to a more powerful machine right away.

Sign up or log in to comment