correct decoding time
Browse files
README.md
CHANGED
@@ -19,8 +19,8 @@ This is an <a href="https://github.com/mobiusml/hqq/">HQQ</a> all 4-bit (group-s
|
|
19 |
## Model Decoding Speed
|
20 |
| Models | fp16| HQQ 4-bit/gs-64|
|
21 |
|:-------------------:|:--------:|:----------------:|
|
22 |
-
| Decoding - short seq (tokens/sec)| 10.5 (tokens/sec)** |
|
23 |
-
| Decoding - long seq (tokens/sec)| 9.5 (tokens/sec)** |
|
24 |
|
25 |
**: 2xA100 80GB<br>
|
26 |
*: 1xA100 80GB
|
|
|
19 |
## Model Decoding Speed
|
20 |
| Models | fp16| HQQ 4-bit/gs-64|
|
21 |
|:-------------------:|:--------:|:----------------:|
|
22 |
+
| Decoding - short seq (tokens/sec)| 10.5 (tokens/sec)** | 23 (tokens/sec)* |
|
23 |
+
| Decoding - long seq (tokens/sec)| 9.5 (tokens/sec)** | 19 (tokens/sec)*|
|
24 |
|
25 |
**: 2xA100 80GB<br>
|
26 |
*: 1xA100 80GB
|