tctsung commited on
Commit
0e43d15
1 Parent(s): 35c24a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -3
README.md CHANGED
@@ -7,15 +7,13 @@ This model is quantized by autoawq package using `tctsung/chat_restaurant_recomm
7
 
8
  Reference model: [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
9
 
10
- For more details, see github repo [tctsung/LLM_quantize](https://github.com/tctsung/LLM_quantize.git)
11
-
12
  ## Key results:
13
 
14
  1. AWQ quantization resulted in a **1.62x improvement** in inference speed, generating **140.47 new tokens per second**.
15
  2. The model size was compressed from 4.4GB to 0.78GB, representing a reduction in memory footprint to only **17.57%** of the original model.
16
  3. I used 6 different LLM tasks to demonstrate that the quantized model maintains similar accuracy, with a maximum accuracy degradation of only ~1%
17
 
18
- <Gallery />
19
 
20
  ## Inference tutorial
21
 
 
7
 
8
  Reference model: [TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)
9
 
 
 
10
  ## Key results:
11
 
12
  1. AWQ quantization resulted in a **1.62x improvement** in inference speed, generating **140.47 new tokens per second**.
13
  2. The model size was compressed from 4.4GB to 0.78GB, representing a reduction in memory footprint to only **17.57%** of the original model.
14
  3. I used 6 different LLM tasks to demonstrate that the quantized model maintains similar accuracy, with a maximum accuracy degradation of only ~1%
15
 
16
+ For more details, see github repo [tctsung/LLM_quantize](https://github.com/tctsung/LLM_quantize.git)
17
 
18
  ## Inference tutorial
19