renillhuang commited on
Commit
1205968
1 Parent(s): 94f9d42

readme: Modify image size

Browse files

Signed-off-by: eric <renillhuang@163.com>

Files changed (2) hide show
  1. README.md +5 -4
  2. README_zh.md +5 -4
README.md CHANGED
@@ -77,7 +77,7 @@ tags:
77
  - Model pretrain data distribution
78
  - The training dataset is primarily composed of English, Chinese, and other languages, accounting for 50%, 25%, and 12% of the data, respectively. Additionally, code makes up 9%, while mathematical text accounts for 4%. The distribution by topics is detailed in the table below.
79
  <div align="center">
80
- <img src="./assets/imgs/data_src_dist.png" alt="logo" width="70%" />
81
  </div>
82
 
83
 
@@ -159,7 +159,8 @@ Test code: https://github.com/nishiwen1214/Benchmark-leakage-detection.
159
  |CMMLU | 0.38 | 0.39 | 0.23 | 0.27 | 0.22 |
160
 
161
  ### 3.1.6. Inference speed
162
- Setup inference server on 8x Nvidia RTX3090, and get results from client in unit of tokens per second.
 
163
 
164
  |OrionLLM_V2.4.6.1|1para_out62|1para_out85|1para_out125|1para_out210|
165
  |---------|-------|-------|-------|-------|
@@ -176,10 +177,10 @@ Setup inference server on 8x Nvidia RTX3090, and get results from client in un
176
  |OrionMOE | 25.71 | 27.13 | 28.89 | 29.70 |
177
  |Qwen32 | 21.16 | 21.92 | 23.14 | 23.56 |
178
 
179
- We found that the inference speed results vary based on the number of concurrent requests and the length of output. To facilitate horizontal comparisons, we conducted multiple sets of tests. Each set of test data has a specific format: \<n>para_out\<m>. For example, "4para_out220" indicates the inference speed when there are 4 concurrent requests from the client and the average output token length is 220.
180
 
181
  <div align="center">
182
- <img src="./assets/imgs/inf_spd.png" alt="inf_speed" width="100%" />
183
  </div>
184
 
185
 
 
77
  - Model pretrain data distribution
78
  - The training dataset is primarily composed of English, Chinese, and other languages, accounting for 50%, 25%, and 12% of the data, respectively. Additionally, code makes up 9%, while mathematical text accounts for 4%. The distribution by topics is detailed in the table below.
79
  <div align="center">
80
+ <img src="./assets/imgs/data_src_dist.png" alt="logo" width="50%" />
81
  </div>
82
 
83
 
 
159
  |CMMLU | 0.38 | 0.39 | 0.23 | 0.27 | 0.22 |
160
 
161
  ### 3.1.6. Inference speed
162
+ Setup inference server on 8x Nvidia RTX3090, and get results from client in unit of tokens per second.<br>
163
+ We found that the inference speed results vary based on the number of concurrent requests and the length of output. To facilitate horizontal comparisons, we conducted multiple sets of tests. Each set of test data has a specific format: \<n>para_out\<m>. For example, "4para_out220" indicates the inference speed when there are 4 concurrent requests from the client and the average output token length is 220.
164
 
165
  |OrionLLM_V2.4.6.1|1para_out62|1para_out85|1para_out125|1para_out210|
166
  |---------|-------|-------|-------|-------|
 
177
  |OrionMOE | 25.71 | 27.13 | 28.89 | 29.70 |
178
  |Qwen32 | 21.16 | 21.92 | 23.14 | 23.56 |
179
 
180
+
181
 
182
  <div align="center">
183
+ <img src="./assets/imgs/inf_spd.png" alt="inf_speed" width="60%" />
184
  </div>
185
 
186
 
README_zh.md CHANGED
@@ -69,7 +69,7 @@
69
  - Orion-MOE8x7B-Base训练数据组成
70
  - 预训练数据语种上主要由英语、中文和其他多语言语言组成,分别占比50%、25%和12%。数据分类上,代码占9%,数学文本占4%,分布参考下图。
71
  <div align="center">
72
- <img src="./assets/imgs/data_src_dist.png" alt="logo" width="70%" />
73
  </div>
74
 
75
 
@@ -152,7 +152,8 @@
152
  |CMMLU | 0.38 | 0.39 | 0.23 | 0.27 | 0.22 |
153
 
154
  ### 3.1.6. 推理速度
155
- 搭建基于8卡Nvidia RTX3090,采用"token/秒"为单位,从客户端统计测试结果。
 
156
 
157
  |OrionLLM_V2.4.6.1|1并发_输出62|1并发_输出85|1并发_输出125|1并发_输出210|
158
  |---------|-------|-------|-------|-------|
@@ -169,10 +170,10 @@
169
  |OrionMOE | 25.71 | 27.13 | 28.89 | 29.70 |
170
  |Qwen32 | 21.16 | 21.92 | 23.14 | 23.56 |
171
 
172
- 我们发现根据推理的并发数以及模型输出长度的不同,推理速度的结果会有变化,为了方便横向对比,我们做了多组数据的测试,每一组测试数据的格式含义:客户端并发数_每次推理输出token数,在此前提条件下的推理速度,例如:4para_out220,表示客户端4并发打请求,输出token数平均在220个token时的推理速度。
173
 
174
  <div align="center">
175
- <img src="./assets/imgs/inf_spd.png" alt="inf_speed" width="100%" />
176
  </div>
177
 
178
 
 
69
  - Orion-MOE8x7B-Base训练数据组成
70
  - 预训练数据语种上主要由英语、中文和其他多语言语言组成,分别占比50%、25%和12%。数据分类上,代码占9%,数学文本占4%,分布参考下图。
71
  <div align="center">
72
+ <img src="./assets/imgs/data_src_dist.png" alt="logo" width="50%" />
73
  </div>
74
 
75
 
 
152
  |CMMLU | 0.38 | 0.39 | 0.23 | 0.27 | 0.22 |
153
 
154
  ### 3.1.6. 推理速度
155
+ 搭建基于8卡Nvidia RTX3090,采用"token/秒"为单位,从客户端统计测试结果。<br>
156
+ 我们发现根据推理的并发数以及模型输出长度的不同,推理速度的结果会有变化,为了方便横向对比,我们做了多组数据的测试,每一组测试数据的格式含义:客户端并发数_每次推理输出token数,在此前提条件下的推理速度,例如:4para_out220,表示客户端4并发打请求,输出token数平均在220个token时的推理速度。
157
 
158
  |OrionLLM_V2.4.6.1|1并发_输出62|1并发_输出85|1并发_输出125|1并发_输出210|
159
  |---------|-------|-------|-------|-------|
 
170
  |OrionMOE | 25.71 | 27.13 | 28.89 | 29.70 |
171
  |Qwen32 | 21.16 | 21.92 | 23.14 | 23.56 |
172
 
173
+
174
 
175
  <div align="center">
176
+ <img src="./assets/imgs/inf_spd.png" alt="inf_speed" width="60%" />
177
  </div>
178
 
179