Update README.md
Browse files
README.md
CHANGED
@@ -51,9 +51,16 @@ pipeline_tag: text-generation
|
|
51 |
- The fine-tuned models demonstrate strong adaptability, excelling in human-annotated blind tests.
|
52 |
- The long-chat version supports extremely long texts, extending up to 200K tokens.
|
53 |
- The quantized versions reduce model size by 70%, improve inference speed by 30%, with performance loss less than 1%.
|
54 |
-
<
|
55 |
-
|
56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
57 |
|
58 |
- Orion-14B series models including:
|
59 |
- **Orion-14B-Base:** A multilingual large language foundational model with 14 billion parameters, pretrained on a diverse dataset of 2.5 trillion tokens.
|
@@ -99,7 +106,7 @@ Model release and download links are provided in the table below:
|
|
99 |
| Baichuan 2-13B | 68.9 | 67.2 | 70.8 | 78.1 | 74.1 | 66.3 |
|
100 |
| QWEN-14B | 93.0 | 90.3 | **80.2** | 79.8 | 71.4 | 66.3 |
|
101 |
| InternLM-20B | 86.4 | 83.3 | 78.1 | **80.3** | 71.8 | 68.3 |
|
102 |
-
| **Orion-14B-Base** | **93.
|
103 |
|
104 |
### 3.1.3. LLM evaluation results of OpenCompass testsets
|
105 |
| Model | Average | Examination | Language | Knowledge | Understanding | Reasoning |
|
@@ -109,7 +116,7 @@ Model release and download links are provided in the table below:
|
|
109 |
| Baichuan 2-13B | 49.4 | 51.8 | 47.5 | 48.9 | 58.1 | 44.2 |
|
110 |
| QWEN-14B | 62.4 | 71.3 | 52.67 | 56.1 | 68.8 | 60.1 |
|
111 |
| InternLM-20B | 59.4 | 62.5 | 55.0 | **60.1** | 67.3 | 54.9 |
|
112 |
-
|**Orion-14B-Base**| **64.
|
113 |
|
114 |
### 3.1.4. Comparison of LLM performances on Japanese testsets
|
115 |
| Model |**Average**| JCQA | JNLI | MARC | JSQD | JQK | XLS | XWN | MGSM |
|
@@ -170,8 +177,7 @@ Model release and download links are provided in the table below:
|
|
170 |
| Llama2-13B-Chat | 3.05 | 3.79 | 5.43 | 4.40 | 6.76 | 6.63 | 6.99 | 5.65 | 4.70 |
|
171 |
| InternLM-20B-Chat | 3.39 | 3.92 | 5.96 | 5.50 |**7.18**| 6.19 | 6.49 | 6.22 | 4.96 |
|
172 |
| **Orion-14B-Chat** | 4.00 | 4.24 | 6.18 |**6.57**| 7.16 |**7.36**|**7.16**|**6.99**| 5.51 |
|
173 |
-
|
174 |
-
\* use vllm for inference
|
175 |
|
176 |
## 3.3. LongChat Model Orion-14B-LongChat Benchmarks
|
177 |
### 3.3.1. LongChat evaluation of LongBench
|
|
|
51 |
- The fine-tuned models demonstrate strong adaptability, excelling in human-annotated blind tests.
|
52 |
- The long-chat version supports extremely long texts, extending up to 200K tokens.
|
53 |
- The quantized versions reduce model size by 70%, improve inference speed by 30%, with performance loss less than 1%.
|
54 |
+
<table style="border-collapse: collapse; width: 100%;">
|
55 |
+
<tr>
|
56 |
+
<td style="border: none; padding: 10px; box-sizing: border-box;">
|
57 |
+
<img src="./assets/imgs/opencompass_en.png" alt="opencompass" style="width: 100%; height: auto;">
|
58 |
+
</td>
|
59 |
+
<td style="border: none; padding: 10px; box-sizing: border-box;">
|
60 |
+
<img src="./assets/imgs/model_cap_en.png" alt="modelcap" style="width: 100%; height: auto;">
|
61 |
+
</td>
|
62 |
+
</tr>
|
63 |
+
</table>
|
64 |
|
65 |
- Orion-14B series models including:
|
66 |
- **Orion-14B-Base:** A multilingual large language foundational model with 14 billion parameters, pretrained on a diverse dataset of 2.5 trillion tokens.
|
|
|
106 |
| Baichuan 2-13B | 68.9 | 67.2 | 70.8 | 78.1 | 74.1 | 66.3 |
|
107 |
| QWEN-14B | 93.0 | 90.3 | **80.2** | 79.8 | 71.4 | 66.3 |
|
108 |
| InternLM-20B | 86.4 | 83.3 | 78.1 | **80.3** | 71.8 | 68.3 |
|
109 |
+
| **Orion-14B-Base** | **93.2** | **91.3** | 78.5 | 79.5 | **78.8** | **70.2** |
|
110 |
|
111 |
### 3.1.3. LLM evaluation results of OpenCompass testsets
|
112 |
| Model | Average | Examination | Language | Knowledge | Understanding | Reasoning |
|
|
|
116 |
| Baichuan 2-13B | 49.4 | 51.8 | 47.5 | 48.9 | 58.1 | 44.2 |
|
117 |
| QWEN-14B | 62.4 | 71.3 | 52.67 | 56.1 | 68.8 | 60.1 |
|
118 |
| InternLM-20B | 59.4 | 62.5 | 55.0 | **60.1** | 67.3 | 54.9 |
|
119 |
+
|**Orion-14B-Base**| **64.3** | **71.4** | **55.0** | 60.0 | **71.9** | **61.6** |
|
120 |
|
121 |
### 3.1.4. Comparison of LLM performances on Japanese testsets
|
122 |
| Model |**Average**| JCQA | JNLI | MARC | JSQD | JQK | XLS | XWN | MGSM |
|
|
|
177 |
| Llama2-13B-Chat | 3.05 | 3.79 | 5.43 | 4.40 | 6.76 | 6.63 | 6.99 | 5.65 | 4.70 |
|
178 |
| InternLM-20B-Chat | 3.39 | 3.92 | 5.96 | 5.50 |**7.18**| 6.19 | 6.49 | 6.22 | 4.96 |
|
179 |
| **Orion-14B-Chat** | 4.00 | 4.24 | 6.18 |**6.57**| 7.16 |**7.36**|**7.16**|**6.99**| 5.51 |
|
180 |
+
\* use vllm for inference
|
|
|
181 |
|
182 |
## 3.3. LongChat Model Orion-14B-LongChat Benchmarks
|
183 |
### 3.3.1. LongChat evaluation of LongBench
|