aisingapore
/

sea-lion-7b

Text Generation

text-generation-inference

Model card Files Files and versions Community

Minor fixes to README.md

#1

by weiqipedia - opened Oct 31, 2023

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +10 -11

README.md CHANGED Viewed

@@ -3,24 +3,23 @@ license: mit
 ---
 # SEA-LION
-SEA-LION is a collection of LLMs which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
-The models range from 3 billion to 7 billion parameters.
 This is the card for the SEA-LION 7B model.
-SEA-LION stands for <i>Southeast Asia Languages In One Network</i>.
 ## Model Details
 ### Model Description
-The SEA-LION model is a significant leap forward in the field of natural language processing,
-specifically trained to understand Southeast Asia (SEA) regional context.
-SEA-LION is built on the robust MPT architecture and utilize a vocabulary size of 256K.
-The model employs our custom SEABPETokenizer for tokenization.
-Our SEABPETokenizer is specially tailored for SEA languages, ensuring optimal model performance.
 The training data for SEA-LION encompasses 980B tokens.
@@ -44,7 +43,7 @@ SEA-LION was trained on 980B tokens of the following data:
 | mC4 - Indonesian          |  14.7B |      1.50% |
 | mC4 - Malay               |   2.9B |      0.29% |
 | mC4 - Filipino            |   5.3B |      0.54% |
-| mC4 - Burmese             |   1.2B |      0.49% |
 | mC4 - Vietnamese          |  63.4B |      6.46% |
 | mC4 - Thai                |  21.6B |      2.20% |
 | mC4 - Lao                 |   1.1B |      0.12% |
@@ -108,14 +107,14 @@ The tokenizer type is Byte-Pair Encoding (BPE).
 ## The Team
 Lam Zhiwen Clarence<br>
-Leong Weiqi<br>
 Li Yier<br>
 Liu Darius<br>
 Lovenia Holy<br>
 Montalan Jann Railey<br>
 Ng Raymond<br>
 Ngui Jian Gang<br>
-Nguyen Ngan Thanh<br>
 Ong Tat-Wee David<br>
 Rengarajan Hamsawardhini<br>
 Susanto Yosephine<br>

 ---
 # SEA-LION
+SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
+The size of the models range from 3 billion to 7 billion parameters.
 This is the card for the SEA-LION 7B model.
+SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
 ## Model Details
 ### Model Description
+The SEA-LION model is a significant leap forward in the field of Natural Language Processing,
+specifically trained to understand the SEA regional context.
+SEA-LION is built on the robust MPT architecture and has a vocabulary size of 256K.
+For tokenization, the model employs our custom SEABPETokenizer, which is specially tailored for SEA languages, ensuring optimal model performance.
 The training data for SEA-LION encompasses 980B tokens.
 | mC4 - Indonesian          |  14.7B |      1.50% |
 | mC4 - Malay               |   2.9B |      0.29% |
 | mC4 - Filipino            |   5.3B |      0.54% |
+| mC4 - Burmese             |   4.9B |      0.49% |
 | mC4 - Vietnamese          |  63.4B |      6.46% |
 | mC4 - Thai                |  21.6B |      2.20% |
 | mC4 - Lao                 |   1.1B |      0.12% |
 ## The Team
 Lam Zhiwen Clarence<br>
+Leong Wei Qi<br>
 Li Yier<br>
 Liu Darius<br>
 Lovenia Holy<br>
 Montalan Jann Railey<br>
 Ng Raymond<br>
 Ngui Jian Gang<br>
+Nguyen Thanh Ngan<br>
 Ong Tat-Wee David<br>
 Rengarajan Hamsawardhini<br>
 Susanto Yosephine<br>