liuyongq commited on
Commit
c7fe9d9
1 Parent(s): c9c7d61

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -7
README.md CHANGED
@@ -51,9 +51,16 @@ pipeline_tag: text-generation
51
  - The fine-tuned models demonstrate strong adaptability, excelling in human-annotated blind tests.
52
  - The long-chat version supports extremely long texts, extending up to 200K tokens.
53
  - The quantized versions reduce model size by 70%, improve inference speed by 30%, with performance loss less than 1%.
54
- <div align="center">
55
- <img src="./assets/imgs/model_cap_en.png" alt="model_cap_en" width="50%" />
56
- </div>
 
 
 
 
 
 
 
57
 
58
  - Orion-14B series models including:
59
  - **Orion-14B-Base:** A multilingual large language foundational model with 14 billion parameters, pretrained on a diverse dataset of 2.5 trillion tokens.
@@ -99,7 +106,7 @@ Model release and download links are provided in the table below:
99
  | Baichuan 2-13B | 68.9 | 67.2 | 70.8 | 78.1 | 74.1 | 66.3 |
100
  | QWEN-14B | 93.0 | 90.3 | **80.2** | 79.8 | 71.4 | 66.3 |
101
  | InternLM-20B | 86.4 | 83.3 | 78.1 | **80.3** | 71.8 | 68.3 |
102
- | **Orion-14B-Base** | **93.3** | **91.3** | 78.5 | 79.5 | **78.9** | **70.2** |
103
 
104
  ### 3.1.3. LLM evaluation results of OpenCompass testsets
105
  | Model | Average | Examination | Language | Knowledge | Understanding | Reasoning |
@@ -109,7 +116,7 @@ Model release and download links are provided in the table below:
109
  | Baichuan 2-13B | 49.4 | 51.8 | 47.5 | 48.9 | 58.1 | 44.2 |
110
  | QWEN-14B | 62.4 | 71.3 | 52.67 | 56.1 | 68.8 | 60.1 |
111
  | InternLM-20B | 59.4 | 62.5 | 55.0 | **60.1** | 67.3 | 54.9 |
112
- |**Orion-14B-Base**| **64.4** | **71.4** | **55.0** | 60.0 | **71.9** | **61.6** |
113
 
114
  ### 3.1.4. Comparison of LLM performances on Japanese testsets
115
  | Model |**Average**| JCQA | JNLI | MARC | JSQD | JQK | XLS | XWN | MGSM |
@@ -170,8 +177,7 @@ Model release and download links are provided in the table below:
170
  | Llama2-13B-Chat | 3.05 | 3.79 | 5.43 | 4.40 | 6.76 | 6.63 | 6.99 | 5.65 | 4.70 |
171
  | InternLM-20B-Chat | 3.39 | 3.92 | 5.96 | 5.50 |**7.18**| 6.19 | 6.49 | 6.22 | 4.96 |
172
  | **Orion-14B-Chat** | 4.00 | 4.24 | 6.18 |**6.57**| 7.16 |**7.36**|**7.16**|**6.99**| 5.51 |
173
-
174
- \* use vllm for inference
175
 
176
  ## 3.3. LongChat Model Orion-14B-LongChat Benchmarks
177
  ### 3.3.1. LongChat evaluation of LongBench
 
51
  - The fine-tuned models demonstrate strong adaptability, excelling in human-annotated blind tests.
52
  - The long-chat version supports extremely long texts, extending up to 200K tokens.
53
  - The quantized versions reduce model size by 70%, improve inference speed by 30%, with performance loss less than 1%.
54
+ <table style="border-collapse: collapse; width: 100%;">
55
+ <tr>
56
+ <td style="border: none; padding: 10px; box-sizing: border-box;">
57
+ <img src="./assets/imgs/opencompass_en.png" alt="opencompass" style="width: 100%; height: auto;">
58
+ </td>
59
+ <td style="border: none; padding: 10px; box-sizing: border-box;">
60
+ <img src="./assets/imgs/model_cap_en.png" alt="modelcap" style="width: 100%; height: auto;">
61
+ </td>
62
+ </tr>
63
+ </table>
64
 
65
  - Orion-14B series models including:
66
  - **Orion-14B-Base:** A multilingual large language foundational model with 14 billion parameters, pretrained on a diverse dataset of 2.5 trillion tokens.
 
106
  | Baichuan 2-13B | 68.9 | 67.2 | 70.8 | 78.1 | 74.1 | 66.3 |
107
  | QWEN-14B | 93.0 | 90.3 | **80.2** | 79.8 | 71.4 | 66.3 |
108
  | InternLM-20B | 86.4 | 83.3 | 78.1 | **80.3** | 71.8 | 68.3 |
109
+ | **Orion-14B-Base** | **93.2** | **91.3** | 78.5 | 79.5 | **78.8** | **70.2** |
110
 
111
  ### 3.1.3. LLM evaluation results of OpenCompass testsets
112
  | Model | Average | Examination | Language | Knowledge | Understanding | Reasoning |
 
116
  | Baichuan 2-13B | 49.4 | 51.8 | 47.5 | 48.9 | 58.1 | 44.2 |
117
  | QWEN-14B | 62.4 | 71.3 | 52.67 | 56.1 | 68.8 | 60.1 |
118
  | InternLM-20B | 59.4 | 62.5 | 55.0 | **60.1** | 67.3 | 54.9 |
119
+ |**Orion-14B-Base**| **64.3** | **71.4** | **55.0** | 60.0 | **71.9** | **61.6** |
120
 
121
  ### 3.1.4. Comparison of LLM performances on Japanese testsets
122
  | Model |**Average**| JCQA | JNLI | MARC | JSQD | JQK | XLS | XWN | MGSM |
 
177
  | Llama2-13B-Chat | 3.05 | 3.79 | 5.43 | 4.40 | 6.76 | 6.63 | 6.99 | 5.65 | 4.70 |
178
  | InternLM-20B-Chat | 3.39 | 3.92 | 5.96 | 5.50 |**7.18**| 6.19 | 6.49 | 6.22 | 4.96 |
179
  | **Orion-14B-Chat** | 4.00 | 4.24 | 6.18 |**6.57**| 7.16 |**7.36**|**7.16**|**6.99**| 5.51 |
180
+ \* use vllm for inference
 
181
 
182
  ## 3.3. LongChat Model Orion-14B-LongChat Benchmarks
183
  ### 3.3.1. LongChat evaluation of LongBench