RichardErkhov
/

aisingapore_-_sea-lion-7b-gguf

GGUF

Inference Endpoints

Model card Files Files and versions Community

RichardErkhov commited on May 31

Commit

01297b1

•

1 Parent(s): 9d8f9eb

uploaded readme

Browse files

Files changed (1) hide show

README.md +226 -0

README.md ADDED Viewed

	@@ -0,0 +1,226 @@

+Quantization made by Richard Erkhov.
+[Github](https://github.com/RichardErkhov)
+[Discord](https://discord.gg/pvy7H8DZMG)
+[Request more models](https://github.com/RichardErkhov/quant_request)
+sea-lion-7b - GGUF
+- Model creator: https://huggingface.co/aisingapore/
+- Original model: https://huggingface.co/aisingapore/sea-lion-7b/
+| Name | Quant method | Size |
+| ---- | ---- | ---- |
+| [sea-lion-7b.Q2_K.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q2_K.gguf) | Q2_K | 3.07GB |
+| [sea-lion-7b.IQ3_XS.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.IQ3_XS.gguf) | IQ3_XS | 3.35GB |
+| [sea-lion-7b.IQ3_S.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.IQ3_S.gguf) | IQ3_S | 3.42GB |
+| [sea-lion-7b.Q3_K_S.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q3_K_S.gguf) | Q3_K_S | 3.42GB |
+| [sea-lion-7b.IQ3_M.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.IQ3_M.gguf) | IQ3_M | 3.72GB |
+| [sea-lion-7b.Q3_K.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q3_K.gguf) | Q3_K | 3.97GB |
+| [sea-lion-7b.Q3_K_M.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q3_K_M.gguf) | Q3_K_M | 3.97GB |
+| [sea-lion-7b.Q3_K_L.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q3_K_L.gguf) | Q3_K_L | 4.27GB |
+| [sea-lion-7b.IQ4_XS.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.IQ4_XS.gguf) | IQ4_XS | 4.07GB |
+| [sea-lion-7b.Q4_0.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q4_0.gguf) | Q4_0 | 4.22GB |
+| [sea-lion-7b.IQ4_NL.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.IQ4_NL.gguf) | IQ4_NL | 4.25GB |
+| [sea-lion-7b.Q4_K_S.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q4_K_S.gguf) | Q4_K_S | 4.25GB |
+| [sea-lion-7b.Q4_K.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q4_K.gguf) | Q4_K | 4.67GB |
+| [sea-lion-7b.Q4_K_M.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q4_K_M.gguf) | Q4_K_M | 4.67GB |
+| [sea-lion-7b.Q4_1.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q4_1.gguf) | Q4_1 | 4.6GB |
+| [sea-lion-7b.Q5_0.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q5_0.gguf) | Q5_0 | 4.97GB |
+| [sea-lion-7b.Q5_K_S.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q5_K_S.gguf) | Q5_K_S | 4.97GB |
+| [sea-lion-7b.Q5_K.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q5_K.gguf) | Q5_K | 5.3GB |
+| [sea-lion-7b.Q5_K_M.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q5_K_M.gguf) | Q5_K_M | 5.3GB |
+| [sea-lion-7b.Q5_1.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q5_1.gguf) | Q5_1 | 5.35GB |
+| [sea-lion-7b.Q6_K.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q6_K.gguf) | Q6_K | 5.77GB |
+| [sea-lion-7b.Q8_0.gguf](https://huggingface.co/RichardErkhov/aisingapore_-_sea-lion-7b-gguf/blob/main/sea-lion-7b.Q8_0.gguf) | Q8_0 | 7.46GB |
+Original model description:
+---
+license: mit
+language:
+- en
+- zh
+- id
+- ms
+- th
+- vi
+- fil
+- ta
+- my
+- km
+- lo
+---
+# SEA-LION
+SEA-LION is a collection of Large Language Models (LLMs) which has been pretrained and instruct-tuned for the Southeast Asia (SEA) region.
+The size of the models range from 3 billion to 7 billion parameters.
+This is the card for the SEA-LION 7B base model.
+SEA-LION stands for <i>Southeast Asian Languages In One Network</i>.
+## Model Details
+### Model Description
+The SEA-LION model is a significant leap forward in the field of Natural Language Processing,
+specifically trained to understand the SEA regional context.
+SEA-LION is built on the robust MPT architecture and has a vocabulary size of 256K.
+For tokenization, the model employs our custom SEABPETokenizer, which is specially tailored for SEA languages, ensuring optimal model performance.
+The training data for SEA-LION encompasses 980B tokens.
+- **Developed by:** Products Pillar, AI Singapore
+- **Funded by:** Singapore NRF
+- **Model type:** Decoder
+- **Languages:** English, Chinese, Indonesian, Malay, Thai, Vietnamese, Filipino, Tamil, Burmese, Khmer, Lao
+- **License:** MIT License
+### Performance Benchmarks
+SEA-LION has an average performance on general tasks in English (as measured by Hugging Face's LLM Leaderboard):
+| Model       | ARC   | HellaSwag | MMLU  | TruthfulQA | Average |
+|-------------|:-----:|:---------:|:-----:|:----------:|:-------:|
+| SEA-LION 7B | 39.93 | 68.51     | 26.87 |      35.09 | 42.60   |
+## Training Details
+### Data
+SEA-LION was trained on 980B tokens of the following data:
+| Data Source               | Unique Tokens | Multiplier | Total Tokens | Percentage |
+|---------------------------|:-------------:|:----------:|:------------:|:----------:|
+| RefinedWeb - English      |        571.3B |          1 |       571.3B |     58.20% |
+| mC4 - Chinese             |         91.2B |          1 |        91.2B |      9.29% |
+| mC4 - Indonesian          |         3.68B |          4 |        14.7B |      1.50% |
+| mC4 - Malay               |         0.72B |          4 |         2.9B |      0.29% |
+| mC4 - Filipino            |         1.32B |          4 |         5.3B |      0.54% |
+| mC4 - Burmese             |          1.2B |          4 |         4.9B |      0.49% |
+| mC4 - Vietnamese          |         63.4B |          1 |        63.4B |      6.46% |
+| mC4 - Thai                |          5.8B |          2 |        11.6B |      1.18% |
+| WangChanBERTa - Thai      |            5B |          2 |          10B |      1.02% |
+| mC4 - Lao                 |         0.27B |          4 |         1.1B |      0.12% |
+| mC4 - Khmer               |         0.97B |          4 |         3.9B |      0.40% |
+| mC4 - Tamil               |         2.55B |          4 |        10.2B |      1.04% |
+| the Stack - Python        |         20.9B |          2 |        41.8B |      4.26% |
+| the Stack - Javascript    |         55.6B |          1 |        55.6B |      5.66% |
+| the Stack - Shell         |         1.2B5 |          2 |         2.5B |      0.26% |
+| the Stack - SQL           |         6.4B  |          2 |        12.8B |      1.31% |
+| the Stack - Markdown      |         26.6B |          1 |        26.6B |      2.71% |
+| RedPajama - StackExchange |         21.2B |          1 |        21.2B |      2.16% |
+| RedPajama - ArXiv         |         30.6B |          1 |        30.6B |      3.12% |
+### Infrastructure
+SEA-LION was trained using [MosaicML Composer](https://github.com/mosaicml/composer)
+on the following hardware:
+| Training Details     | SEA-LION 7B  |
+|----------------------|:------------:|
+| AWS EC2 p4d.24xlarge | 32 instances |
+| Nvidia A100 40GB GPU | 256          |
+| Training Duration    | 22 days      |
+### Configuration
+| HyperParameter    | SEA-LION 7B        |
+|-------------------|:------------------:|
+| Precision         | bfloat16           |
+| Optimizer         | decoupled_adamw    |
+| Scheduler         | cosine_with_warmup |
+| Learning Rate     | 6.0e-5             |
+| Global Batch Size | 2048               |
+| Micro Batch Size  | 4                  |
+## Technical Specifications
+### Model Architecture and Objective
+SEA-LION is a decoder model using the MPT architecture.
+| Parameter       | SEA-LION 7B |
+|-----------------|:-----------:|
+| Layers          | 32          |
+| d_model         | 4096        |
+| head_dim        | 32          |
+| Vocabulary      | 256000      |
+| Sequence Length | 2048        |
+### Tokenizer Details
+We sample 20M lines from the training data to train the tokenizer.<br>
+The framework for training is [SentencePiece](https://github.com/google/sentencepiece).<br>
+The tokenizer type is Byte-Pair Encoding (BPE).
+## The Team
+Lam Wen Zhi Clarence<br>
+Leong Wei Qi<br>
+Li Yier<br>
+Liu Bing Jie Darius<br>
+Lovenia Holy<br>
+Montalan Jann Railey<br>
+Ng Boon Cheong Raymond<br>
+Ngui Jian Gang<br>
+Nguyen Thanh Ngan<br>
+Ong Tat-Wee David<br>
+Rengarajan Hamsawardhini<br>
+Susanto Yosephine<br>
+Tai Ngee Chia<br>
+Tan Choon Meng<br>
+Teo Jin Howe<br>
+Teo Eng Sipp Leslie<br>
+Teo Wei Yi<br>
+Tjhi William<br>
+Yeo Yeow Tong<br>
+Yong Xianbin<br>
+## Acknowledgements
+AI Singapore is a national programme supported by the National Research Foundation, Singapore and hosted by the National University of Singapore.
+Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of National Research Foundation, Singapore.
+## Contact
+For more info, please contact us using this [SEA-LION Inquiry Form](https://forms.gle/sLCUVb95wmGf43hi6)
+[Link to SEA-LION's GitHub repository](https://github.com/aisingapore/sealion)
+## Disclaimer
+This the repository for the base model.
+The model has _not_ been aligned for safety.
+Developers and users should perform their own safety fine-tuning and related security measures.
+In no event shall the authors be held liable for any claim, damages, or other liability
+arising from the use of the released weights and codes.
+## References
+```bibtex
+@misc{lowphansirikul2021wangchanberta,
+    title={WangchanBERTa: Pretraining transformer-based Thai Language Models},
+    author={Lalita Lowphansirikul and Charin Polpanumas and Nawat Jantrakulchai and Sarana Nutanong},
+    year={2021},
+    eprint={2101.09635},
+    archivePrefix={arXiv},
+    primaryClass={cs.CL}
+}
+```