isakzhang commited on
Commit
c80d594
•
1 Parent(s): 2a63adf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -6
README.md CHANGED
@@ -9,6 +9,9 @@ language:
9
  - vi
10
  - th
11
  - ms
 
 
 
12
  ---
13
 
14
  # *SeaLLM3* - Large Language Models for Southeast Asia
@@ -17,17 +20,21 @@ language:
17
  <p align="center">
18
  <a href="https://damo-nlp-sg.github.io/SeaLLMs/" target="_blank" rel="noopener">Website</a>
19
  &nbsp;&nbsp;
20
- <a href="https://huggingface.co/SeaLLMs/SeaLLM-7B-v2.5" target="_blank" rel="noopener"> 🤗 Tech Memo</a>
21
  &nbsp;&nbsp;
22
- <a href="https://huggingface.co/spaces/SeaLLMs/SeaLLM-7B-v2.5" target="_blank" rel="noopener"> 🤗 DEMO</a>
23
  &nbsp;&nbsp;
24
  <a href="https://github.com/DAMO-NLP-SG/SeaLLMs" target="_blank" rel="noopener">Github</a>
25
  &nbsp;&nbsp;
26
  <a href="https://arxiv.org/pdf/2312.00738.pdf" target="_blank" rel="noopener">Technical Report</a>
27
  </p>
28
 
29
- We introduce **SeaLLM3**, the latest series of the SeaLLMs (Large Language Models for Southeast Asian languages) family. It achieves state-of-the-art performance among models with similar sizes, excelling across a diverse array of tasks such as world knowledge, mathematical reasoning, translation, and instruction following. In the meantime, it was specifically enhanced to be more trustworthy, exhibiting reduced hallucination and providing safe responses, particularly in queries posed in Southeast Asian languages.
30
 
 
 
 
 
31
 
32
  ## Uses
33
 
@@ -149,7 +156,7 @@ By using our released weights, codes, and demos, you agree to and comply with th
149
  We conduct our evaluation along two dimensions:
150
 
151
  1. **Model Capability**: We assess the model's performance on human exam questions, its ability to follow instructions, its proficiency in mathematics, and its translation accuracy.
152
- 2. **Model Trustworthiness**: We evaluate the model's safety and tendency to hallucinate, particularly in the context of Southeast Asia (SEA).
153
 
154
  ### Model Capability
155
 
@@ -172,9 +179,8 @@ We conduct our evaluation along two dimensions:
172
  #### Multilingual Instruction-following Capability - SeaBench
173
  SeaBench consists of multi-turn human instructions spanning various task types. It evaluates chat-based models on their ability to follow human instructions in both single and multi-turn settings and assesses their performance across different task types. The dataset and corresponding evaluation code will be released soon!
174
 
175
- | model | id turn-1 | id turn-2 | id avg | th turn-1 | th turn-2 | th avg | vi turn-1 | vi turn-2 | vi avg | avg |
176
  |:----------------|------------:|------------:|---------:|------------:|------------:|---------:|------------:|------------:|---------:|------:|
177
- | ChatGPT-0125 | 6.99 | 7.21 | 7.10 | 5.36 | 5.08 | 5.22 | 6.62 | 6.73 | 6.68 | 6.33 |
178
  | Qwen2-7B-Instruct| 5.93 | 5.84 | 5.89 | 5.47 | 5.20 | 5.34 | 6.17 | 5.60 | 5.89 | 5.70 |
179
  | SeaLLM-7B-v2.5 | 6.27 | 4.96 | 5.62 | 5.79 | 3.82 | 4.81 | 6.02 | 4.02 | 5.02 | 5.15 |
180
  | Sailor-14B-Chat | 5.26 | 5.53 | 5.40 | 4.62 | 4.36 | 4.49 | 5.31 | 4.74 | 5.03 | 4.97 |
 
9
  - vi
10
  - th
11
  - ms
12
+ tags:
13
+ - sea
14
+ - multilingual
15
  ---
16
 
17
  # *SeaLLM3* - Large Language Models for Southeast Asia
 
20
  <p align="center">
21
  <a href="https://damo-nlp-sg.github.io/SeaLLMs/" target="_blank" rel="noopener">Website</a>
22
  &nbsp;&nbsp;
23
+ <a href="https://huggingface.co/SeaLLMs/SeaLLM3-7B-Chat" target="_blank" rel="noopener"> 🤗 Tech Memo</a>
24
  &nbsp;&nbsp;
25
+ <a href="https://huggingface.co/spaces/SeaLLMs/SeaLLM-Chat" target="_blank" rel="noopener"> 🤗 DEMO</a>
26
  &nbsp;&nbsp;
27
  <a href="https://github.com/DAMO-NLP-SG/SeaLLMs" target="_blank" rel="noopener">Github</a>
28
  &nbsp;&nbsp;
29
  <a href="https://arxiv.org/pdf/2312.00738.pdf" target="_blank" rel="noopener">Technical Report</a>
30
  </p>
31
 
32
+ We introduce **SeaLLM3**, the latest series of the SeaLLMs (Large Language Models for Southeast Asian languages) family. It achieves state-of-the-art performance among models with similar sizes, excelling across a diverse array of tasks such as world knowledge, mathematical reasoning, translation, and instruction following. In the meantime, it was specifically enhanced to be more trustworthy, exhibiting reduced hallucination and providing safe responses, particularly in queries closed related to Southeast Asian culture.
33
 
34
+ ## 🔥 Highlights
35
+ - State-of-the-art performance compared to open-source models of similar sizes, evaluated across various dimensions such as human exam questions, instruction-following, mathematics, and translation.
36
+ - Significantly enhanced instruction-following capability, especially in multi-turn settings.
37
+ - Ensures safety in usage with significantly reduced instances of hallucination and sensitivity to local contexts.
38
 
39
  ## Uses
40
 
 
156
  We conduct our evaluation along two dimensions:
157
 
158
  1. **Model Capability**: We assess the model's performance on human exam questions, its ability to follow instructions, its proficiency in mathematics, and its translation accuracy.
159
+ 2. **Model Trustworthiness**: We evaluate the model's safety and tendency to hallucinate, particularly in the context of Southeast Asia.
160
 
161
  ### Model Capability
162
 
 
179
  #### Multilingual Instruction-following Capability - SeaBench
180
  SeaBench consists of multi-turn human instructions spanning various task types. It evaluates chat-based models on their ability to follow human instructions in both single and multi-turn settings and assesses their performance across different task types. The dataset and corresponding evaluation code will be released soon!
181
 
182
+ | model | id<br>turn-1 | id<br>turn-2 | id<br>avg | th<br>turn-1 | th<br>turn-2 | th<br>avg | vi<br>turn-1 | vi<br>turn-2 | vi<br>avg | avg |
183
  |:----------------|------------:|------------:|---------:|------------:|------------:|---------:|------------:|------------:|---------:|------:|
 
184
  | Qwen2-7B-Instruct| 5.93 | 5.84 | 5.89 | 5.47 | 5.20 | 5.34 | 6.17 | 5.60 | 5.89 | 5.70 |
185
  | SeaLLM-7B-v2.5 | 6.27 | 4.96 | 5.62 | 5.79 | 3.82 | 4.81 | 6.02 | 4.02 | 5.02 | 5.15 |
186
  | Sailor-14B-Chat | 5.26 | 5.53 | 5.40 | 4.62 | 4.36 | 4.49 | 5.31 | 4.74 | 5.03 | 4.97 |