Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,95 @@
|
|
1 |
---
|
2 |
-
license:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- th
|
5 |
+
- en
|
6 |
+
library_name: transformers
|
7 |
+
pipeline_tag: text-generation
|
8 |
+
tags:
|
9 |
+
- openthaigpt
|
10 |
+
- llama
|
11 |
---
|
12 |
+
|
13 |
+
# 🇹🇭 OpenThaiGPT 7b 1.0.0
|
14 |
+
<img src="https://1173516064-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FvvbWvIIe82Iv1yHaDBC5%2Fuploads%2Fb8eiMDaqiEQL6ahbAY0h%2Fimage.png?alt=media&token=6fce78fd-2cca-4c0a-9648-bd5518e644ce
|
15 |
+
https://openthaigpt.aieat.or.th/" width="200px">
|
16 |
+
|
17 |
+
🇹🇭 OpenThaiGPT 7b Version 1.0.0-beta is a Thai language 7B-parameter LLaMA v2 Chat model finetuned to Thai instructions and extend more than 10,000 most popular Thai words vocabularies into LLM's dictionary for turbo speed.
|
18 |
+
|
19 |
+
## Features
|
20 |
+
- Multi-turn Conversation Support
|
21 |
+
- Retrieval Augmented Generation (RAG) Support
|
22 |
+
- State-of-the-Art Thai language LLM, Acheive the highest 38.40% average score over all opensource LLMs on 17 Thai exams.
|
23 |
+
|
24 |
+
## Benchmark
|
25 |
+
| **Exams** | **OTG 7b (Aug 2023)** | **OTG 13b (Dec 2023)** | **OTG 7b (March 2024)** | **OTG 13b (March 2024)** | **OTG 70b (March 2024)** | **SeaLLM 7b v1** | **SeaLLM 7b v2** | **TyphoonGPT 7b** | **SeaLion 7b** | **WanchanGLM 7b** | **Sailor-7B-Chat** | **GPT3.5** | **GPT4** | **Gemini Pro** | **Gemini 1.5** | **Claude 3 Haiku** | **Claude 3 Sonnet** | **Claude 3 Opus** |
|
26 |
+
|----------------------------------|-----------------------|------------------------|-------------------------|--------------------------|--------------------------|------------------|------------------|--------------------|----------------|-------------------|--------------------|------------|----------|----------------|----------------|--------------------|---------------------|-------------------|
|
27 |
+
| **A-Level** | 17.50% | 34.17% | 25.00% | 30.83% | 45.83% | 18.33% | 34.17% | N/A | 21.67% | 17.50% | 40.00% | 38.33% | 65.83% | 56.67% | 55.83% | 58.33% | 59.17% | 77.50% |
|
28 |
+
| **TGAT** | 24.00% | 22.00% | 22.00% | 36.00% | 36.00% | 14.00% | 28.00% | N/A | 24.00% | 16.00% | 34.00% | 28.00% | 44.00% | 22.00% | 28.00% | 36.00% | 34.00% | 46.00% |
|
29 |
+
| **TPAT1** | 22.50% | 47.50% | 42.50% | 27.50% | 62.50% | 22.50% | 27.50% | N/A | 22.50% | 17.50% | 40.00% | 45.00% | 52.50% | 52.50% | 50.00% | 52.50% | 50.00% | 62.50% |
|
30 |
+
| **ic_all_test** | 8.00% | 28.00% | 76.00% | 84.00% | 68.00% | 16.00% | 28.00% | N/A | 24.00% | 16.00% | 24.00% | 40.00% | 64.00% | 52.00% | 32.00% | 44.00% | 64.00% | 72.00% |
|
31 |
+
| **facebook_beleble_tha** | 25.00% | 45.00% | 34.50% | 39.50% | 70.00% | 13.50% | 51.00% | N/A | 27.00% | 24.50% | 63.00% | 50.00% | 72.50% | 65.00% | 74.00% | 63.50% | 77.00% | 90.00% |
|
32 |
+
| **xcopa_th_200** | 45.00% | 56.50% | 49.50% | 51.50% | 74.50% | 26.50% | 47.00% | N/A | 51.50% | 48.50% | 68.50% | 64.00% | 82.00% | 68.00% | 74.00% | 64.00% | 80.00% | 86.00% |
|
33 |
+
| **xnli2.0_tha** | 33.50% | 34.50% | 39.50% | 31.00% | 47.00% | 21.00% | 43.00% | N/A | 37.50% | 33.50% | 16.00% | 50.00% | 69.00% | 53.00% | 54.50% | 50.00% | 68.00% | 68.50% |
|
34 |
+
| **onet_m3 (Average from 19-23)** | 17.85% | 38.86% | 34.11% | 39.36% | 56.15% | 15.58% | 23.92% | N/A | 21.79% | 19.56% | 21.37% | 37.91% | 49.97% | 55.99% | 57.41% | 52.73% | 40.60% | 63.87% |
|
35 |
+
| **onet_m6 (Average from 25-29)** | 21.14% | 28.87% | 22.53% | 23.32% | 42.85% | 15.09% | 19.48% | N/A | 16.96% | 20.67% | 28.64% | 34.44% | 46.29% | 45.53% | 50.23% | 34.79% | 38.49% | 48.56% |
|
36 |
+
| **** | | | | | | | | | | | | | | | | | | |
|
37 |
+
| **Average Score** | 23.83% | 37.27% | 38.40% | 40.33% | 55.87% | 18.06% | 33.56% | N/A | 27.44% | 23.75% | 37.28% | 43.07% | 60.68% | 52.30% | 52.89% | 50.65% | 56.81% | 68.32% |
|
38 |
+
|
39 |
+
## Licenses
|
40 |
+
**Source Code**: License Apache Software License 2.0.<br>
|
41 |
+
**Weight**: Research and **Commercial uses**.<br>
|
42 |
+
|
43 |
+
## Sponsors
|
44 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/5fcd9c426d942eaf4d1ebd30/42d-GioSs4evIdNuMAaPB.png" width="600px">
|
45 |
+
|
46 |
+
## Supports
|
47 |
+
- Official website: https://openthaigpt.aieat.or.th
|
48 |
+
- Facebook page: https://web.facebook.com/groups/openthaigpt
|
49 |
+
- A Discord server for discussion and support [here](https://discord.gg/rUTp6dfVUF)
|
50 |
+
- E-mail: kobkrit@aieat.or.th
|
51 |
+
|
52 |
+
## Description
|
53 |
+
Prompt format is Llama2
|
54 |
+
```
|
55 |
+
<s>[INST] <<SYS>>
|
56 |
+
system_prompt
|
57 |
+
<</SYS>>
|
58 |
+
|
59 |
+
question [/INST]
|
60 |
+
```
|
61 |
+
System prompt:
|
62 |
+
You are a question answering assistant. Answer the question as truthful and helpful as possible คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด
|
63 |
+
|
64 |
+
## How to use
|
65 |
+
|
66 |
+
1. install VLLM (https://github.com/vllm-project/vllm)
|
67 |
+
2. python -m vllm.entrypoints.api_server --model /path/to/model --tensor-parallel-size num_gpus
|
68 |
+
3. run inference (CURL example)
|
69 |
+
|
70 |
+
```
|
71 |
+
curl --request POST \
|
72 |
+
--url http://localhost:8000/generate \
|
73 |
+
--header "Content-Type: application/json" \
|
74 |
+
--data '{"prompt": "<s>[INST] <<SYS>>\nYou are a question answering assistant. Answer the question as truthful and helpful as possible คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด\n<</SYS>>\n\nอยากลดความอ้วนต้องทำอย่างไร [/INST]","use_beam_search": false, "temperature": 0.1, "max_tokens": 512, "top_p": 0.75, "top_k": 40, "frequency_penalty": 0.3 "stop": "</s>"}'
|
75 |
+
```
|
76 |
+
|
77 |
+
### Authors
|
78 |
+
* Kobkrit Viriyayudhakorn (kobkrit@aieat.or.th)
|
79 |
+
* Sumeth Yuenyong (sumeth.yue@mahidol.edu)
|
80 |
+
* Thaweewat Rugsujarit (thaweewr@scg.com)
|
81 |
+
* Jillaphat Jaroenkantasima (autsadang41@gmail.com)
|
82 |
+
* Norapat Buppodom (new@norapat.com)
|
83 |
+
* Koravich Sangkaew (kwankoravich@gmail.com)
|
84 |
+
* Peerawat Rojratchadakorn (peerawat.roj@gmail.com)
|
85 |
+
* Surapon Nonesung (nonesungsurapon@gmail.com)
|
86 |
+
* Chanon Utupon (chanon.utupon@gmail.com)
|
87 |
+
* Sadhis Wongprayoon (sadhis.tae@gmail.com)
|
88 |
+
* Nucharee Thongthungwong (nuchhub@hotmail.com)
|
89 |
+
* Chawakorn Phiantham (mondcha1507@gmail.com)
|
90 |
+
* Patteera Triamamornwooth (patt.patteera@gmail.com)
|
91 |
+
* Nattarika Juntarapaoraya (natt.juntara@gmail.com)
|
92 |
+
* Kriangkrai Saetan (kraitan.ss21@gmail.com)
|
93 |
+
* Pitikorn Khlaisamniang (pitikorn32@gmail.com)
|
94 |
+
|
95 |
+
<i>Disclaimer: Provided responses are not guaranteed.</i>
|