Token Generation Speeds

#5
by HellishGaming - opened

Just curious what kind of T/s everyone is getting when sending 32K context. I have the 4.0bpw EXL2 loaded on an A100 80GB SXM4 and I get about 1.5T/s.

Back-end: Text Gen Webui

Front-end: SillyTavern

Sign up or log in to comment