Spaces:

VPTQ
/

README

No application file

OpenSourceRonin commited on Sep 28, 2024

Commit

46e0941

verified ·

1 Parent(s): 3b6eb87

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -38,7 +38,7 @@ Scaling model size significantly challenges the deployment and inference of Larg
 ## Early Results from Tech Report
 VPTQ achieves better accuracy and higher throughput with lower quantization overhead across models of different sizes. The following experimental results are for reference only; VPTQ can achieve better outcomes under reasonable parameters, especially in terms of model accuracy and inference speed.
-<img src="assets/vptq.png" width="500">
 | Model | bitwidth | W2↓  | C4↓  | AvgQA↑ | tok/s↑ | mem(GB) | cost/h↓ |
 | ----------- | -------- | ---- | ---- | ------ | ------ | ------- | ------- |

 ## Early Results from Tech Report
 VPTQ achieves better accuracy and higher throughput with lower quantization overhead across models of different sizes. The following experimental results are for reference only; VPTQ can achieve better outcomes under reasonable parameters, especially in terms of model accuracy and inference speed.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/66a73179315d9b5c32e06967/SVvZJuDNmDut2XRsqI3Wo.png)
 | Model | bitwidth | W2↓  | C4↓  | AvgQA↑ | tok/s↑ | mem(GB) | cost/h↓ |
 | ----------- | -------- | ---- | ---- | ------ | ------ | ------- | ------- |