OpenSourceRonin commited on
Commit
46e0941
·
verified ·
1 Parent(s): 3b6eb87

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -38,7 +38,7 @@ Scaling model size significantly challenges the deployment and inference of Larg
38
  ## Early Results from Tech Report
39
  VPTQ achieves better accuracy and higher throughput with lower quantization overhead across models of different sizes. The following experimental results are for reference only; VPTQ can achieve better outcomes under reasonable parameters, especially in terms of model accuracy and inference speed.
40
 
41
- <img src="assets/vptq.png" width="500">
42
 
43
  | Model | bitwidth | W2↓ | C4↓ | AvgQA↑ | tok/s↑ | mem(GB) | cost/h↓ |
44
  | ----------- | -------- | ---- | ---- | ------ | ------ | ------- | ------- |
 
38
  ## Early Results from Tech Report
39
  VPTQ achieves better accuracy and higher throughput with lower quantization overhead across models of different sizes. The following experimental results are for reference only; VPTQ can achieve better outcomes under reasonable parameters, especially in terms of model accuracy and inference speed.
40
 
41
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66a73179315d9b5c32e06967/SVvZJuDNmDut2XRsqI3Wo.png)
42
 
43
  | Model | bitwidth | W2↓ | C4↓ | AvgQA↑ | tok/s↑ | mem(GB) | cost/h↓ |
44
  | ----------- | -------- | ---- | ---- | ------ | ------ | ------- | ------- |