Update README.md
Browse files
README.md
CHANGED
@@ -38,7 +38,7 @@ Scaling model size significantly challenges the deployment and inference of Larg
|
|
38 |
## Early Results from Tech Report
|
39 |
VPTQ achieves better accuracy and higher throughput with lower quantization overhead across models of different sizes. The following experimental results are for reference only; VPTQ can achieve better outcomes under reasonable parameters, especially in terms of model accuracy and inference speed.
|
40 |
|
41 |
-
|
42 |
|
43 |
| Model | bitwidth | W2↓ | C4↓ | AvgQA↑ | tok/s↑ | mem(GB) | cost/h↓ |
|
44 |
| ----------- | -------- | ---- | ---- | ------ | ------ | ------- | ------- |
|
|
|
38 |
## Early Results from Tech Report
|
39 |
VPTQ achieves better accuracy and higher throughput with lower quantization overhead across models of different sizes. The following experimental results are for reference only; VPTQ can achieve better outcomes under reasonable parameters, especially in terms of model accuracy and inference speed.
|
40 |
|
41 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/66a73179315d9b5c32e06967/SVvZJuDNmDut2XRsqI3Wo.png)
|
42 |
|
43 |
| Model | bitwidth | W2↓ | C4↓ | AvgQA↑ | tok/s↑ | mem(GB) | cost/h↓ |
|
44 |
| ----------- | -------- | ---- | ---- | ------ | ------ | ------- | ------- |
|