Can it run faster than 2 tokens/second on one A100?

by aibarito-ua - opened

I am trying to run this model on one A100, but the speed is quite slow - 2 tokens/sec. Does anybody know how to make it faster?
I have tried 8-bit-mode and it is allocating twice less gpu memory, but the speed is not increasing.

Sign up or log in to comment