JustinLin610 commited on
Commit
05d0727
1 Parent(s): f077bf1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -8
README.md CHANGED
@@ -168,10 +168,10 @@ response, history = model.chat(tokenizer, "你好", history=None)
168
 
169
  We illustrate the zero-shot performance of both BF16 and Int4 models on the benchmark, and we find that the quantized model does not suffer from significant performance degradation. Results are shown below:
170
 
171
- | Quantization | MMLU | CEval (val) | GSM8K | Humaneval |
172
- | ------------- | :--------: | :----------: | :----: | :--------: |
173
- | BF16 | 55.8 | 59.7 | 50.3 | 37.2 |
174
- | Int4 | 55.1 | 59.2 | 49.7 | 35.4 |
175
 
176
  ### 推理速度 (Inference Speed)
177
 
@@ -179,10 +179,10 @@ We illustrate the zero-shot performance of both BF16 and Int4 models on the benc
179
 
180
  We measured the average inference speed of generating 2048 and 8192 tokens under BF16 precision and Int4 quantization level, respectively.
181
 
182
- | Quantization | Speed (2048 tokens) | Speed (8192 tokens) |
183
- | ------------- | :------------------:| :------------------:|
184
- | BF16 | 30.53 | 28.51 |
185
- | Int4 | 45.60 | 33.83 |
186
 
187
  具体而言,我们记录在长度为1的上下文的条件下生成8192个token的性能。评测运行于单张A100-SXM4-80G GPU,使用PyTorch 2.0.1和CUDA 11.4。推理速度是生成8192个token的速度均值。
188
 
 
168
 
169
  We illustrate the zero-shot performance of both BF16 and Int4 models on the benchmark, and we find that the quantized model does not suffer from significant performance degradation. Results are shown below:
170
 
171
+ | Quantization | MMLU | CEval (val) | GSM8K | Humaneval |
172
+ |--------------|:----:|:-----------:|:-----:|:---------:|
173
+ | BF16 | 55.8 | 59.7 | 50.3 | 37.2 |
174
+ | Int4 | 55.1 | 59.2 | 49.7 | 35.4 |
175
 
176
  ### 推理速度 (Inference Speed)
177
 
 
179
 
180
  We measured the average inference speed of generating 2048 and 8192 tokens under BF16 precision and Int4 quantization level, respectively.
181
 
182
+ | Quantization | Speed (2048 tokens) | Speed (8192 tokens) |
183
+ |--------------|:-------------------:|:-------------------:|
184
+ | BF16 | 30.53 | 28.51 |
185
+ | Int4 | 45.60 | 33.83 |
186
 
187
  具体而言,我们记录在长度为1的上下文的条件下生成8192个token的性能。评测运行于单张A100-SXM4-80G GPU,使用PyTorch 2.0.1和CUDA 11.4。推理速度是生成8192个token的速度均值。
188