Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,85 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- kobkrit/rd-taxqa
|
5 |
+
- iapp_wiki_qa_squad
|
6 |
+
- Thaweewat/alpaca-cleaned-52k-th
|
7 |
+
- Thaweewat/instruction-wild-52k-th
|
8 |
+
- Thaweewat/databricks-dolly-15k-th
|
9 |
+
- Thaweewat/hc3-24k-th
|
10 |
+
- Thaweewat/gpteacher-20k-th
|
11 |
+
- Thaweewat/onet-m6-social
|
12 |
+
- Thaweewat/alpaca-finance-43k-th
|
13 |
+
language:
|
14 |
+
- th
|
15 |
+
- en
|
16 |
+
library_name: transformers
|
17 |
+
pipeline_tag: text-generation
|
18 |
+
tags:
|
19 |
+
- openthaigpt
|
20 |
+
- llama
|
21 |
---
|
22 |
+
|
23 |
+
# 🇹🇠OpenThaiGPT 1.0.0-beta
|
24 |
+
<img src="https://1173516064-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvvbWvIIe82Iv1yHaDBC5%2Fuploads%2Fb8eiMDaqiEQL6ahbAY0h%2Fimage.png?alt=media&token=6fce78fd-2cca-4c0a-9648-bd5518e644ce
|
25 |
+
https://openthaigpt.aieat.or.th/" width="200px">
|
26 |
+
|
27 |
+
🇹🇠OpenThaiGPT Version 1.0.0-beta is a Thai language 7B-parameter LLaMA v2 Chat model finetuned to follow Thai translated instructions and extend more than 24,554 most popular Thai words vocabularies into LLM's dictionary for turbo speed.
|
28 |
+
|
29 |
+
# ---- Lora Adapter Format of OpenThaiGPT 1.0.0-beta ----
|
30 |
+
|
31 |
+
## Upgrade from OpenThaiGPT 1.0.0-alpha
|
32 |
+
- Add more than 24,554 most popular Thai words vocabularies into LLM's dictionary and re-pretrain embedding layers which make it generate Thai text 10 times faster than previous version.
|
33 |
+
|
34 |
+
## Pretrain Model
|
35 |
+
- [https://huggingface.co/ChanonUtupon/openthaigpt-merge-lora-llama-2-7B-3470k](https://huggingface.co/ChanonUtupon/openthaigpt-merge-lora-llama-2-7B-3470k)
|
36 |
+
|
37 |
+
|
38 |
+
## Support
|
39 |
+
- Official website: https://openthaigpt.aieat.or.th
|
40 |
+
- Facebook page: https://web.facebook.com/groups/openthaigpt
|
41 |
+
- A Discord server for discussion and support [here](https://discord.gg/rUTp6dfVUF)
|
42 |
+
- E-mail: kobkrit@iapp.co.th
|
43 |
+
|
44 |
+
## License
|
45 |
+
**Source Code**: License Apache Software License 2.0.<br>
|
46 |
+
**Weight**: Research and **Commercial uses**.<br>
|
47 |
+
|
48 |
+
## Code and Weight
|
49 |
+
**Web Demo**: https://demo-beta.openthaigpt.aieat.or.th/<br>
|
50 |
+
**Colab Demo**: https://colab.research.google.com/drive/1NkmAJHItpqu34Tur9wCFc97A6JzKR8xo?usp=sharing<br>
|
51 |
+
**Finetune Code**: https://github.com/OpenThaiGPT/openthaigpt-finetune-010beta<br>
|
52 |
+
**Inference Code**: https://github.com/OpenThaiGPT/openthaigpt<br>
|
53 |
+
**Weight (Lora Adapter)**: https://huggingface.co/openthaigpt/openthaigpt-1.0.0-beta-7b-chat<br>
|
54 |
+
**Weight (Huggingface Checkpoint)**: https://huggingface.co/openthaigpt/openthaigpt-1.0.0-beta-7b-chat-ckpt-hf
|
55 |
+
|
56 |
+
## Sponsors
|
57 |
+
Pantip.com, ThaiSC<br>
|
58 |
+
<img src="https://1173516064-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvvbWvIIe82Iv1yHaDBC5%2Fuploads%2FiWjRxBQgo0HUDcpZKf6A%2Fimage.png?alt=media&token=4fef4517-0b4d-46d6-a5e3-25c30c8137a6" width="100px">
|
59 |
+
<img src="https://1173516064-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvvbWvIIe82Iv1yHaDBC5%2Fuploads%2Ft96uNUI71mAFwkXUtxQt%2Fimage.png?alt=media&token=f8057c0c-5c5f-41ac-bb4b-ad02ee3d4dc2" width="100px">
|
60 |
+
|
61 |
+
### Powered by
|
62 |
+
OpenThaiGPT Volunteers, Artificial Intelligence Entrepreneur Association of Thailand (AIEAT), and Artificial Intelligence Association of Thailand (AIAT)
|
63 |
+
|
64 |
+
<img src="https://1173516064-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvvbWvIIe82Iv1yHaDBC5%2Fuploads%2F6yWPXxdoW76a4UBsM8lw%2Fimage.png?alt=media&token=1006ee8e-5327-4bc0-b9a9-a02e93b0c032" width="100px">
|
65 |
+
<img src="https://1173516064-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvvbWvIIe82Iv1yHaDBC5%2Fuploads%2FBwsmSovEIhW9AEOlHTFU%2Fimage.png?alt=media&token=5b550289-e9e2-44b3-bb8f-d3057d74f247" width="100px">
|
66 |
+
|
67 |
+
### Authors
|
68 |
+
* Kobkrit Viriyayudhakorn (kobkrit@aieat.or.th)
|
69 |
+
* Sumeth Yuenyong (sumeth.yue@mahidol.edu)
|
70 |
+
* Thaweewat Rugsujarit (thaweewr@scg.com)
|
71 |
+
* Jillaphat Jaroenkantasima (autsadang41@gmail.com)
|
72 |
+
* Norapat Buppodom (new@norapat.com)
|
73 |
+
* Koravich Sangkaew (kwankoravich@gmail.com)
|
74 |
+
* Peerawat Rojratchadakorn (peerawat.roj@gmail.com)
|
75 |
+
* Surapon Nonesung (nonesungsurapon@gmail.com)
|
76 |
+
* Chanon Utupon (chanon.utupon@gmail.com)
|
77 |
+
* Sadhis Wongprayoon (sadhis.tae@gmail.com)
|
78 |
+
* Nucharee Thongthungwong (nuchhub@hotmail.com)
|
79 |
+
* Chawakorn Phiantham (mondcha1507@gmail.com)
|
80 |
+
* Patteera Triamamornwooth (patt.patteera@gmail.com)
|
81 |
+
* Nattarika Juntarapaoraya (natt.juntara@gmail.com)
|
82 |
+
* Kriangkrai Saetan (kraitan.ss21@gmail.com)
|
83 |
+
* Pitikorn Khlaisamniang (pitikorn32@gmail.com)
|
84 |
+
|
85 |
+
<i>Disclaimer: Provided responses are not guaranteed.</i>
|