RichardErkhov commited on
Commit
01297b1
1 Parent(s): 9d8f9eb

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +226 -0
README.md ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ sea-lion-7b - GGUF
11
+ - Model creator: https://huggingface.co/aisingapore/
12
+ - Original model: https://huggingface.co/aisingapore/sea-lion-7b/
13
+
14
+
15
+ | Name | Quant method | Size |
16
+ | ---- | ---- | ---- |
17
+ | [sea-lion-7b.Q2_K.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q2_K.gguf) | Q2_K | 3.07GB |
18
+ | [sea-lion-7b.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.IQ3_XS.gguf) | IQ3_XS | 3.35GB |
19
+ | [sea-lion-7b.IQ3_S.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.IQ3_S.gguf) | IQ3_S | 3.42GB |
20
+ | [sea-lion-7b.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q3_K_S.gguf) | Q3_K_S | 3.42GB |
21
+ | [sea-lion-7b.IQ3_M.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.IQ3_M.gguf) | IQ3_M | 3.72GB |
22
+ | [sea-lion-7b.Q3_K.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q3_K.gguf) | Q3_K | 3.97GB |
23
+ | [sea-lion-7b.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q3_K_M.gguf) | Q3_K_M | 3.97GB |
24
+ | [sea-lion-7b.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q3_K_L.gguf) | Q3_K_L | 4.27GB |
25
+ | [sea-lion-7b.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.IQ4_XS.gguf) | IQ4_XS | 4.07GB |
26
+ | [sea-lion-7b.Q4_0.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q4_0.gguf) | Q4_0 | 4.22GB |
27
+ | [sea-lion-7b.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.IQ4_NL.gguf) | IQ4_NL | 4.25GB |
28
+ | [sea-lion-7b.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q4_K_S.gguf) | Q4_K_S | 4.25GB |
29
+ | [sea-lion-7b.Q4_K.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q4_K.gguf) | Q4_K | 4.67GB |
30
+ | [sea-lion-7b.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q4_K_M.gguf) | Q4_K_M | 4.67GB |
31
+ | [sea-lion-7b.Q4_1.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q4_1.gguf) | Q4_1 | 4.6GB |
32
+ | [sea-lion-7b.Q5_0.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q5_0.gguf) | Q5_0 | 4.97GB |
33
+ | [sea-lion-7b.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q5_K_S.gguf) | Q5_K_S | 4.97GB |
34
+ | [sea-lion-7b.Q5_K.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q5_K.gguf) | Q5_K | 5.3GB |
35
+ | [sea-lion-7b.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q5_K_M.gguf) | Q5_K_M | 5.3GB |
36
+ | [sea-lion-7b.Q5_1.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q5_1.gguf) | Q5_1 | 5.35GB |
37
+ | [sea-lion-7b.Q6_K.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q6_K.gguf) | Q6_K | 5.77GB |
38
+ | [sea-lion-7b.Q8_0.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q8_0.gguf) | Q8_0 | 7.46GB |
39
+
40
+
41
+
42
+
43
+ Original model description:
44
+ ---
45
+ license: mit
46
+ language:
47
+ - en
48
+ - zh
49
+ - id
50
+ - ms
51
+ - th
52
+ - vi
53
+ - fil
54
+ - ta
55
+ - my
56
+ - km
57
+ - lo
58
+ ---
59
+ # SEA-LION
60
+
61
+ SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
62
+ The size of the models range from 3 billion to 7 billion parameters.
63
+ This is the card for the SEA-LION 7B base model.
64
+
65
+ SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
66
+
67
+
68
+ ## Model Details
69
+
70
+ ### Model Description
71
+
72
+ The SEA-LION model is a significant leap forward in the field of Natural Language Processing,
73
+ specifically trained to understand the SEA regional context.
74
+
75
+ SEA-LION is built on the robust MPT architecture and has a vocabulary size of 256K.
76
+
77
+ For tokenization, the model employs our custom SEABPETokenizer, which is specially tailored for SEA languages, ensuring optimal model performance.
78
+
79
+ The training data for SEA-LION encompasses 980B tokens.
80
+
81
+ - **Developed by:** Products Pillar, AI Singapore
82
+ - **Funded by:** Singapore NRF
83
+ - **Model type:** Decoder
84
+ - **Languages:** English, Chinese, Indonesian, Malay, Thai, Vietnamese, Filipino, Tamil, Burmese, Khmer, Lao
85
+ - **License:** MIT License
86
+
87
+ ### Performance Benchmarks
88
+
89
+ SEA-LION has an average performance on general tasks in English (as measured by Hugging Face's LLM Leaderboard):
90
+
91
+ | Model | ARC | HellaSwag | MMLU | TruthfulQA | Average |
92
+ |-------------|:-----:|:---------:|:-----:|:----------:|:-------:|
93
+ | SEA-LION 7B | 39.93 | 68.51 | 26.87 | 35.09 | 42.60 |
94
+
95
+ ## Training Details
96
+
97
+ ### Data
98
+
99
+ SEA-LION was trained on 980B tokens of the following data:
100
+
101
+ | Data Source | Unique Tokens | Multiplier | Total Tokens | Percentage |
102
+ |---------------------------|:-------------:|:----------:|:------------:|:----------:|
103
+ | RefinedWeb - English | 571.3B | 1 | 571.3B | 58.20% |
104
+ | mC4 - Chinese | 91.2B | 1 | 91.2B | 9.29% |
105
+ | mC4 - Indonesian | 3.68B | 4 | 14.7B | 1.50% |
106
+ | mC4 - Malay | 0.72B | 4 | 2.9B | 0.29% |
107
+ | mC4 - Filipino | 1.32B | 4 | 5.3B | 0.54% |
108
+ | mC4 - Burmese | 1.2B | 4 | 4.9B | 0.49% |
109
+ | mC4 - Vietnamese | 63.4B | 1 | 63.4B | 6.46% |
110
+ | mC4 - Thai | 5.8B | 2 | 11.6B | 1.18% |
111
+ | WangChanBERTa - Thai | 5B | 2 | 10B | 1.02% |
112
+ | mC4 - Lao | 0.27B | 4 | 1.1B | 0.12% |
113
+ | mC4 - Khmer | 0.97B | 4 | 3.9B | 0.40% |
114
+ | mC4 - Tamil | 2.55B | 4 | 10.2B | 1.04% |
115
+ | the Stack - Python | 20.9B | 2 | 41.8B | 4.26% |
116
+ | the Stack - Javascript | 55.6B | 1 | 55.6B | 5.66% |
117
+ | the Stack - Shell | 1.2B5 | 2 | 2.5B | 0.26% |
118
+ | the Stack - SQL | 6.4B | 2 | 12.8B | 1.31% |
119
+ | the Stack - Markdown | 26.6B | 1 | 26.6B | 2.71% |
120
+ | RedPajama - StackExchange | 21.2B | 1 | 21.2B | 2.16% |
121
+ | RedPajama - ArXiv | 30.6B | 1 | 30.6B | 3.12% |
122
+
123
+ ### Infrastructure
124
+
125
+ SEA-LION was trained using [MosaicML Composer](https://github.com/mosaicml/composer)
126
+ on the following hardware:
127
+
128
+ | Training Details | SEA-LION 7B |
129
+ |----------------------|:------------:|
130
+ | AWS EC2 p4d.24xlarge | 32 instances |
131
+ | Nvidia A100 40GB GPU | 256 |
132
+ | Training Duration | 22 days |
133
+
134
+
135
+ ### Configuration
136
+
137
+ | HyperParameter | SEA-LION 7B |
138
+ |-------------------|:------------------:|
139
+ | Precision | bfloat16 |
140
+ | Optimizer | decoupled_adamw |
141
+ | Scheduler | cosine_with_warmup |
142
+ | Learning Rate | 6.0e-5 |
143
+ | Global Batch Size | 2048 |
144
+ | Micro Batch Size | 4 |
145
+
146
+
147
+ ## Technical Specifications
148
+
149
+ ### Model Architecture and Objective
150
+
151
+ SEA-LION is a decoder model using the MPT architecture.
152
+
153
+ | Parameter | SEA-LION 7B |
154
+ |-----------------|:-----------:|
155
+ | Layers | 32 |
156
+ | d_model | 4096 |
157
+ | head_dim | 32 |
158
+ | Vocabulary | 256000 |
159
+ | Sequence Length | 2048 |
160
+
161
+
162
+ ### Tokenizer Details
163
+
164
+ We sample 20M lines from the training data to train the tokenizer.<br>
165
+ The framework for training is [SentencePiece](https://github.com/google/sentencepiece).<br>
166
+ The tokenizer type is Byte-Pair Encoding (BPE).
167
+
168
+
169
+
170
+ ## The Team
171
+
172
+ Lam Wen Zhi Clarence<br>
173
+ Leong Wei Qi<br>
174
+ Li Yier<br>
175
+ Liu Bing Jie Darius<br>
176
+ Lovenia Holy<br>
177
+ Montalan Jann Railey<br>
178
+ Ng Boon Cheong Raymond<br>
179
+ Ngui Jian Gang<br>
180
+ Nguyen Thanh Ngan<br>
181
+ Ong Tat-Wee David<br>
182
+ Rengarajan Hamsawardhini<br>
183
+ Susanto Yosephine<br>
184
+ Tai Ngee Chia<br>
185
+ Tan Choon Meng<br>
186
+ Teo Jin Howe<br>
187
+ Teo Eng Sipp Leslie<br>
188
+ Teo Wei Yi<br>
189
+ Tjhi William<br>
190
+ Yeo Yeow Tong<br>
191
+ Yong Xianbin<br>
192
+
193
+ ## Acknowledgements
194
+
195
+ AI Singapore is a national programme supported by the National Research Foundation, Singapore and hosted by the National University of Singapore.
196
+ Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.
197
+
198
+ ## Contact
199
+
200
+ For more info, please contact us using this [SEA-LION Inquiry Form](https://forms.gle/sLCUVb95wmGf43hi6)
201
+
202
+ [Link to SEA-LION's GitHub repository](https://github.com/aisingapore/sealion)
203
+
204
+
205
+ ## Disclaimer
206
+
207
+ This the repository for the base model.
208
+ The model has _not_ been aligned for safety.
209
+ Developers and users should perform their own safety fine-tuning and related security measures.
210
+ In no event shall the authors be held liable for any claim, damages, or other liability
211
+ arising from the use of the released weights and codes.
212
+
213
+
214
+ ## References
215
+
216
+ ```bibtex
217
+ @misc{lowphansirikul2021wangchanberta,
218
+ title={WangchanBERTa: Pretraining transformer-based Thai Language Models},
219
+ author={Lalita Lowphansirikul and Charin Polpanumas and Nawat Jantrakulchai and Sarana Nutanong},
220
+ year={2021},
221
+ eprint={2101.09635},
222
+ archivePrefix={arXiv},
223
+ primaryClass={cs.CL}
224
+ }
225
+ ```
226
+