Update README.md
Browse files
README.md
CHANGED
@@ -19,10 +19,11 @@ This is an <a href="https://github.com/mobiusml/hqq/">HQQ</a> all 4-bit (group-s
|
|
19 |
## Model Decoding Speed
|
20 |
| Models | fp16| HQQ 4-bit/gs-64|
|
21 |
|:-------------------:|:--------:|:----------------:|
|
22 |
-
| Decoding
|
23 |
-
| Decoding
|
24 |
|
25 |
-
|
|
|
26 |
|
27 |
## Performance
|
28 |
|
|
|
19 |
## Model Decoding Speed
|
20 |
| Models | fp16| HQQ 4-bit/gs-64|
|
21 |
|:-------------------:|:--------:|:----------------:|
|
22 |
+
| Decoding - short seq (tokens/sec)| 10.5 (tokens/sec)** | 10.7 (tokens/sec)* |
|
23 |
+
| Decoding - long seq (tokens/sec)| 9.5 (tokens/sec)** | 9.7 (tokens/sec)*|
|
24 |
|
25 |
+
**: 2xA100 80GB<br>
|
26 |
+
*: 1xA100 80GB
|
27 |
|
28 |
## Performance
|
29 |
|