Token Generation Speeds
#5
by
HellishGaming
- opened
Just curious what kind of T/s everyone is getting when sending 32K context. I have the 4.0bpw EXL2 loaded on an A100 80GB SXM4 and I get about 1.5T/s.
Back-end: Text Gen Webui
Front-end: SillyTavern