aisingapore
/

sea-lion-7b

Text Generation

text-generation-inference

Model card Files Files and versions Community

RaymondAISG commited on Jan 3

Commit

2cddd1c

•

1 Parent(s): 11cc427

Add citation for Thai dataset

Files changed (1) hide show

README.md +15 -1

README.md CHANGED Viewed

@@ -52,7 +52,8 @@ SEA-LION was trained on 980B tokens of the following data:
 | mC4 - Filipino            |   5.3B |      0.54% |
 | mC4 - Burmese             |   4.9B |      0.49% |
 | mC4 - Vietnamese          |  63.4B |      6.46% |
-| mC4 - Thai                |  21.6B |      2.20% |
 | mC4 - Lao                 |   1.1B |      0.12% |
 | mC4 - Khmer               |   3.9B |      0.40% |
 | mC4 - Tamil               |  10.2B |      1.04% |
@@ -153,3 +154,16 @@ The model has _not_ been aligned for safety.
 Developers and users should perform their own safety fine-tuning and related security measures.
 In no event shall the authors be held liable for any claim, damages, or other liability
 arising from the use of the released weights and codes.

 | mC4 - Filipino            |   5.3B |      0.54% |
 | mC4 - Burmese             |   4.9B |      0.49% |
 | mC4 - Vietnamese          |  63.4B |      6.46% |
+| mC4 - Thai                |  11.6B |      1.18% |
+| WangChanBERTa - Thai      |    10B |      1.02% |
 | mC4 - Lao                 |   1.1B |      0.12% |
 | mC4 - Khmer               |   3.9B |      0.40% |
 | mC4 - Tamil               |  10.2B |      1.04% |
 Developers and users should perform their own safety fine-tuning and related security measures.
 In no event shall the authors be held liable for any claim, damages, or other liability
 arising from the use of the released weights and codes.
+## Citations
+```bibtex
+@misc{lowphansirikul2021wangchanberta,
+    title={WangchanBERTa: Pretraining transformer-based Thai Language Models},
+    author={Lalita Lowphansirikul and Charin Polpanumas and Nawat Jantrakulchai and Sarana Nutanong},
+    year={2021},
+    eprint={2101.09635},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```