OrionStarAI
/

Orion-14B-Base

@@ -51,9 +51,16 @@ pipeline_tag: text-generation
   - The fine-tuned models demonstrate strong adaptability, excelling in human-annotated blind tests.
   - The long-chat version supports extremely long texts, extending up to 200K tokens.
   - The quantized versions reduce model size by 70%, improve inference speed by 30%, with performance loss less than 1%.
- <div align="center">
-  <img src="./assets/imgs/model_cap_en.png" alt="model_cap_en" width="50%" />
-</div>
 - Orion-14B series models including:
   - **Orion-14B-Base:**  A multilingual large language foundational model with 14 billion parameters, pretrained on a diverse dataset of 2.5 trillion tokens.
@@ -99,7 +106,7 @@ Model release and download links are provided in the table below:
 | Baichuan 2-13B     |   68.9   |   67.2   |   70.8   |   78.1   |   74.1   |   66.3   |
 | QWEN-14B           |   93.0   |   90.3   | **80.2** |   79.8   |   71.4   |   66.3   |
 | InternLM-20B       |   86.4   |   83.3   |   78.1   | **80.3** |   71.8   |   68.3   |
-| **Orion-14B-Base** | **93.3** | **91.3** |   78.5   |   79.5   | **78.9** | **70.2** |
 ### 3.1.3. LLM evaluation results of OpenCompass testsets
 | Model | Average  | Examination | Language | Knowledge | Understanding | Reasoning |
@@ -109,7 +116,7 @@ Model release and download links are provided in the table below:
 | Baichuan 2-13B   |   49.4   |   51.8   |   47.5   |   48.9   |   58.1   |   44.2   |
 | QWEN-14B         |   62.4   |   71.3   |   52.67  |   56.1   |   68.8   |   60.1   |
 | InternLM-20B     |   59.4   |   62.5   |   55.0   | **60.1** |   67.3   |   54.9   |
-|**Orion-14B-Base**| **64.4** | **71.4** | **55.0** |   60.0   | **71.9** | **61.6** |
 ### 3.1.4. Comparison of LLM performances on Japanese testsets
 | Model             |**Average**|  JCQA    |  JNLI    |  MARC    |  JSQD    |  JQK     |  XLS     |  XWN     |  MGSM    |
@@ -170,8 +177,7 @@ Model release and download links are provided in the table below:
 | Llama2-13B-Chat    |  3.05  |  3.79  |  5.43  |  4.40  |  6.76  |  6.63  |  6.99  |  5.65  |  4.70  |
 | InternLM-20B-Chat  |  3.39  |  3.92  |  5.96  |  5.50  |**7.18**|  6.19  |  6.49  |  6.22  |  4.96  |
 | **Orion-14B-Chat** |  4.00  |  4.24  |  6.18  |**6.57**|  7.16  |**7.36**|**7.16**|**6.99**|  5.51  |
- \* use vllm for inference
 ## 3.3. LongChat Model Orion-14B-LongChat Benchmarks
 ### 3.3.1. LongChat evaluation of LongBench

   - The fine-tuned models demonstrate strong adaptability, excelling in human-annotated blind tests.
   - The long-chat version supports extremely long texts, extending up to 200K tokens.
   - The quantized versions reduce model size by 70%, improve inference speed by 30%, with performance loss less than 1%.
+ <table style="border-collapse: collapse; width: 100%;">
+   <tr>
+     <td style="border: none; padding: 10px; box-sizing: border-box;">
+       <img src="./assets/imgs/opencompass_en.png" alt="opencompass" style="width: 100%; height: auto;">
+     </td>
+     <td style="border: none; padding: 10px; box-sizing: border-box;">
+       <img src="./assets/imgs/model_cap_en.png" alt="modelcap" style="width: 100%; height: auto;">
+     </td>
+   </tr>
+ </table>
 - Orion-14B series models including:
   - **Orion-14B-Base:**  A multilingual large language foundational model with 14 billion parameters, pretrained on a diverse dataset of 2.5 trillion tokens.
 | Baichuan 2-13B     |   68.9   |   67.2   |   70.8   |   78.1   |   74.1   |   66.3   |
 | QWEN-14B           |   93.0   |   90.3   | **80.2** |   79.8   |   71.4   |   66.3   |
 | InternLM-20B       |   86.4   |   83.3   |   78.1   | **80.3** |   71.8   |   68.3   |
+| **Orion-14B-Base** | **93.2** | **91.3** |   78.5   |   79.5   | **78.8** | **70.2** |
 ### 3.1.3. LLM evaluation results of OpenCompass testsets
 | Model | Average  | Examination | Language | Knowledge | Understanding | Reasoning |
 | Baichuan 2-13B   |   49.4   |   51.8   |   47.5   |   48.9   |   58.1   |   44.2   |
 | QWEN-14B         |   62.4   |   71.3   |   52.67  |   56.1   |   68.8   |   60.1   |
 | InternLM-20B     |   59.4   |   62.5   |   55.0   | **60.1** |   67.3   |   54.9   |
+|**Orion-14B-Base**| **64.3** | **71.4** | **55.0** |   60.0   | **71.9** | **61.6** |
 ### 3.1.4. Comparison of LLM performances on Japanese testsets
 | Model             |**Average**|  JCQA    |  JNLI    |  MARC    |  JSQD    |  JQK     |  XLS     |  XWN     |  MGSM    |
 | Llama2-13B-Chat    |  3.05  |  3.79  |  5.43  |  4.40  |  6.76  |  6.63  |  6.99  |  5.65  |  4.70  |
 | InternLM-20B-Chat  |  3.39  |  3.92  |  5.96  |  5.50  |**7.18**|  6.19  |  6.49  |  6.22  |  4.96  |
 | **Orion-14B-Chat** |  4.00  |  4.24  |  6.18  |**6.57**|  7.16  |**7.36**|**7.16**|**6.99**|  5.51  |
+\* use vllm for inference
 ## 3.3. LongChat Model Orion-14B-LongChat Benchmarks
 ### 3.3.1. LongChat evaluation of LongBench