I'm getting 0.4 tokens/s on a 4090

by androtester - opened

Is this expected? Simple messages take 350-400s for a reply on a 4090.

I get 5-6t/s on a 3090 so that's abnormal. Going to need more info on your specs, what code your running, all that.

I'm using Oobabooga, I have a 4090, 5800x3d, 32 GB RAM, 2TB NVME.

All I did was use the Oobabooga windows installer, it's supposedly handled the dependencies for me.

Sign up or log in to comment