Minor fixes to README.md

#1
by weiqipedia - opened
Files changed (1) hide show
  1. README.md +10 -11
README.md CHANGED
@@ -3,24 +3,23 @@ license: mit
3
  ---
4
  # SEA-LION
5
 
6
- SEA-LION is a collection of LLMs which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
7
- The models range from 3 billion to 7 billion parameters.
8
  This is the card for the SEA-LION 7B model.
9
 
10
- SEA-LION stands for <i>Southeast Asia Languages In One Network</i>.
11
 
12
 
13
  ## Model Details
14
 
15
  ### Model Description
16
 
17
- The SEA-LION model is a significant leap forward in the field of natural language processing,
18
- specifically trained to understand Southeast Asia (SEA) regional context.
19
 
20
- SEA-LION is built on the robust MPT architecture and utilize a vocabulary size of 256K.
21
 
22
- The model employs our custom SEABPETokenizer for tokenization.
23
- Our SEABPETokenizer is specially tailored for SEA languages, ensuring optimal model performance.
24
 
25
  The training data for SEA-LION encompasses 980B tokens.
26
 
@@ -44,7 +43,7 @@ SEA-LION was trained on 980B tokens of the following data:
44
  | mC4 - Indonesian | 14.7B | 1.50% |
45
  | mC4 - Malay | 2.9B | 0.29% |
46
  | mC4 - Filipino | 5.3B | 0.54% |
47
- | mC4 - Burmese | 1.2B | 0.49% |
48
  | mC4 - Vietnamese | 63.4B | 6.46% |
49
  | mC4 - Thai | 21.6B | 2.20% |
50
  | mC4 - Lao | 1.1B | 0.12% |
@@ -108,14 +107,14 @@ The tokenizer type is Byte-Pair Encoding (BPE).
108
  ## The Team
109
 
110
  Lam Zhiwen Clarence<br>
111
- Leong Weiqi<br>
112
  Li Yier<br>
113
  Liu Darius<br>
114
  Lovenia Holy<br>
115
  Montalan Jann Railey<br>
116
  Ng Raymond<br>
117
  Ngui Jian Gang<br>
118
- Nguyen Ngan Thanh<br>
119
  Ong Tat-Wee David<br>
120
  Rengarajan Hamsawardhini<br>
121
  Susanto Yosephine<br>
 
3
  ---
4
  # SEA-LION
5
 
6
+ SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
7
+ The size of the models range from 3 billion to 7 billion parameters.
8
  This is the card for the SEA-LION 7B model.
9
 
10
+ SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
11
 
12
 
13
  ## Model Details
14
 
15
  ### Model Description
16
 
17
+ The SEA-LION model is a significant leap forward in the field of Natural Language Processing,
18
+ specifically trained to understand the SEA regional context.
19
 
20
+ SEA-LION is built on the robust MPT architecture and has a vocabulary size of 256K.
21
 
22
+ For tokenization, the model employs our custom SEABPETokenizer, which is specially tailored for SEA languages, ensuring optimal model performance.
 
23
 
24
  The training data for SEA-LION encompasses 980B tokens.
25
 
 
43
  | mC4 - Indonesian | 14.7B | 1.50% |
44
  | mC4 - Malay | 2.9B | 0.29% |
45
  | mC4 - Filipino | 5.3B | 0.54% |
46
+ | mC4 - Burmese | 4.9B | 0.49% |
47
  | mC4 - Vietnamese | 63.4B | 6.46% |
48
  | mC4 - Thai | 21.6B | 2.20% |
49
  | mC4 - Lao | 1.1B | 0.12% |
 
107
  ## The Team
108
 
109
  Lam Zhiwen Clarence<br>
110
+ Leong Wei Qi<br>
111
  Li Yier<br>
112
  Liu Darius<br>
113
  Lovenia Holy<br>
114
  Montalan Jann Railey<br>
115
  Ng Raymond<br>
116
  Ngui Jian Gang<br>
117
+ Nguyen Thanh Ngan<br>
118
  Ong Tat-Wee David<br>
119
  Rengarajan Hamsawardhini<br>
120
  Susanto Yosephine<br>