metadata
license: apache-2.0
language:
- th
- en
library_name: transformers
pipeline_tag: text-generation
tags:
- openthaigpt
- llama
🇹🇭 OpenThaiGPT 7b 1.0.0
🇹🇭 OpenThaiGPT 7b Version 1.0.0-beta is a Thai language 7B-parameter LLaMA v2 Chat model finetuned to Thai instructions and extend more than 10,000 most popular Thai words vocabularies into LLM's dictionary for turbo speed.
Features
- Multi-turn Conversation Support
- Retrieval Augmented Generation (RAG) Support
- State-of-the-Art Thai language LLM, Acheive the highest 38.40% average score over all opensource LLMs on 17 Thai exams.
Benchmark
Exams | OTG 7b (Aug 2023) | OTG 13b (Dec 2023) | OTG 7b (March 2024) | OTG 13b (March 2024) | OTG 70b (March 2024) | SeaLLM 7b v1 | SeaLLM 7b v2 | TyphoonGPT 7b | SeaLion 7b | WanchanGLM 7b | Sailor-7B-Chat | GPT3.5 | GPT4 | Gemini Pro | Gemini 1.5 | Claude 3 Haiku | Claude 3 Sonnet | Claude 3 Opus |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A-Level | 17.50% | 34.17% | 25.00% | 30.83% | 45.83% | 18.33% | 34.17% | N/A | 21.67% | 17.50% | 40.00% | 38.33% | 65.83% | 56.67% | 55.83% | 58.33% | 59.17% | 77.50% |
TGAT | 24.00% | 22.00% | 22.00% | 36.00% | 36.00% | 14.00% | 28.00% | N/A | 24.00% | 16.00% | 34.00% | 28.00% | 44.00% | 22.00% | 28.00% | 36.00% | 34.00% | 46.00% |
TPAT1 | 22.50% | 47.50% | 42.50% | 27.50% | 62.50% | 22.50% | 27.50% | N/A | 22.50% | 17.50% | 40.00% | 45.00% | 52.50% | 52.50% | 50.00% | 52.50% | 50.00% | 62.50% |
ic_all_test | 8.00% | 28.00% | 76.00% | 84.00% | 68.00% | 16.00% | 28.00% | N/A | 24.00% | 16.00% | 24.00% | 40.00% | 64.00% | 52.00% | 32.00% | 44.00% | 64.00% | 72.00% |
facebook_beleble_tha | 25.00% | 45.00% | 34.50% | 39.50% | 70.00% | 13.50% | 51.00% | N/A | 27.00% | 24.50% | 63.00% | 50.00% | 72.50% | 65.00% | 74.00% | 63.50% | 77.00% | 90.00% |
xcopa_th_200 | 45.00% | 56.50% | 49.50% | 51.50% | 74.50% | 26.50% | 47.00% | N/A | 51.50% | 48.50% | 68.50% | 64.00% | 82.00% | 68.00% | 74.00% | 64.00% | 80.00% | 86.00% |
xnli2.0_tha | 33.50% | 34.50% | 39.50% | 31.00% | 47.00% | 21.00% | 43.00% | N/A | 37.50% | 33.50% | 16.00% | 50.00% | 69.00% | 53.00% | 54.50% | 50.00% | 68.00% | 68.50% |
ONET M3 | 17.85% | 38.86% | 34.11% | 39.36% | 56.15% | 15.58% | 23.92% | N/A | 21.79% | 19.56% | 21.37% | 37.91% | 49.97% | 55.99% | 57.41% | 52.73% | 40.60% | 63.87% |
ONET M6 | 21.14% | 28.87% | 22.53% | 23.32% | 42.85% | 15.09% | 19.48% | N/A | 16.96% | 20.67% | 28.64% | 34.44% | 46.29% | 45.53% | 50.23% | 34.79% | 38.49% | 48.56% |
---------------------------------- | ----------------------- | ------------------------ | ------------------------- | -------------------------- | -------------------------- | ------------------ | ------------------ | -------------------- | ---------------- | ------------------- | -------------------- | ------------ | ---------- | ---------------- | ---------------- | -------------------- | --------------------- | ------------------- |
Average Score | 23.83% | 37.27% | 38.40% | 40.33% | 55.87% | 18.06% | 33.56% | N/A | 27.44% | 23.75% | 37.28% | 43.07% | 60.68% | 52.30% | 52.89% | 50.65% | 56.81% | 68.32% |
Licenses
Source Code: License Apache Software License 2.0.
Weight: Research and Commercial uses.
Sponsors
![](https://cdn-uploads.huggingface.co/production/uploads/5fcd9c426d942eaf4d1ebd30/42d-GioSs4evIdNuMAaPB.png)
Supports
- Official website: https://openthaigpt.aieat.or.th
- Facebook page: https://web.facebook.com/groups/openthaigpt
- A Discord server for discussion and support here
- E-mail: kobkrit@aieat.or.th
Prompt Format
Prompt format is based on Llama2 with a small modification (Adding "###" to specify the context part)
<s>[INST] <<SYS>
{system_prompt}
<</SYS>>
{human_turn1}###{context_turn1} [/INST]{assistant_turn1}</s><s>{human_turn2}###{context_turn2} [/INST] ...
System prompt:
You are a question answering assistant. Answer the question as truthful and helpful as possible คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด
Single Turn Conversation Example
<s>[INST] <<SYS>
You are a question answering assistant. Answer the question as truthful and helpful as possible คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด
<</SYS>>
สวัสดี [/INST]
Single Turn Conversation with Context (RAG) Example
<s>[INST] <<SYS>
You are a question answering assistant. Answer the question as truthful and helpful as possible คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด
<</SYS>>
กรุงเทพมีพื้นที่เท่าไร่###กรุงเทพมหานคร เป็นเมืองหลวง นครและมหานครที่มีประชากรมากที่สุดของประเทศไทย กรุงเทพมหานครมีพื้นที่ทั้งหมด 1,568.737 ตร.กม. มีประชากรตามทะเบียนราษฎรกว่า 8 ล้านคน [/INST]
How to use
- install VLLM (https://github.com/vllm-project/vllm)
- python -m vllm.entrypoints.api_server --model /path/to/model --tensor-parallel-size num_gpus
- run inference (CURL example)
curl --request POST \
--url http://localhost:8000/generate \
--header "Content-Type: application/json" \
--data '{"prompt": "<s>[INST] <<SYS>>\nYou are a question answering assistant. Answer the question as truthful and helpful as possible คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด\n<</SYS>>\n\nอยากลดความอ้วนต้องทำอย่างไร [/INST]","use_beam_search": false, "temperature": 0.1, "max_tokens": 512, "top_p": 0.75, "top_k": 40, "frequency_penalty": 0.3 "stop": "</s>"}'
Authors
- Kobkrit Viriyayudhakorn (kobkrit@aieat.or.th)
- Sumeth Yuenyong (sumeth.yue@mahidol.edu)
- Thaweewat Rugsujarit (thaweewr@scg.com)
- Jillaphat Jaroenkantasima (autsadang41@gmail.com)
- Norapat Buppodom (new@norapat.com)
- Koravich Sangkaew (kwankoravich@gmail.com)
- Peerawat Rojratchadakorn (peerawat.roj@gmail.com)
- Surapon Nonesung (nonesungsurapon@gmail.com)
- Chanon Utupon (chanon.utupon@gmail.com)
- Sadhis Wongprayoon (sadhis.tae@gmail.com)
- Nucharee Thongthungwong (nuchhub@hotmail.com)
- Chawakorn Phiantham (mondcha1507@gmail.com)
- Patteera Triamamornwooth (patt.patteera@gmail.com)
- Nattarika Juntarapaoraya (natt.juntara@gmail.com)
- Kriangkrai Saetan (kraitan.ss21@gmail.com)
- Pitikorn Khlaisamniang (pitikorn32@gmail.com)
Disclaimer: Provided responses are not guaranteed.