kobkrit's picture
Update README.md
53833c8 verified
|
raw
history blame
No virus
8.76 kB
metadata
license: apache-2.0
language:
  - th
  - en
library_name: transformers
pipeline_tag: text-generation
tags:
  - openthaigpt
  - llama

🇹🇭 OpenThaiGPT 7b 1.0.0

🇹🇭 OpenThaiGPT 7b Version 1.0.0-beta is a Thai language 7B-parameter LLaMA v2 Chat model finetuned to Thai instructions and extend more than 10,000 most popular Thai words vocabularies into LLM's dictionary for turbo speed.

Features

  • Multi-turn Conversation Support
  • Retrieval Augmented Generation (RAG) Support
  • State-of-the-Art Thai language LLM, Acheive the highest 38.40% average score over all opensource LLMs on 17 Thai exams.

Benchmark

Exams OTG 7b (Aug 2023) OTG 13b (Dec 2023) OTG 7b (March 2024) OTG 13b (March 2024) OTG 70b (March 2024) SeaLLM 7b v1 SeaLLM 7b v2 TyphoonGPT 7b SeaLion 7b WanchanGLM 7b Sailor-7B-Chat GPT3.5 GPT4 Gemini Pro Gemini 1.5 Claude 3 Haiku Claude 3 Sonnet Claude 3 Opus
A-Level 17.50% 34.17% 25.00% 30.83% 45.83% 18.33% 34.17% N/A 21.67% 17.50% 40.00% 38.33% 65.83% 56.67% 55.83% 58.33% 59.17% 77.50%
TGAT 24.00% 22.00% 22.00% 36.00% 36.00% 14.00% 28.00% N/A 24.00% 16.00% 34.00% 28.00% 44.00% 22.00% 28.00% 36.00% 34.00% 46.00%
TPAT1 22.50% 47.50% 42.50% 27.50% 62.50% 22.50% 27.50% N/A 22.50% 17.50% 40.00% 45.00% 52.50% 52.50% 50.00% 52.50% 50.00% 62.50%
ic_all_test 8.00% 28.00% 76.00% 84.00% 68.00% 16.00% 28.00% N/A 24.00% 16.00% 24.00% 40.00% 64.00% 52.00% 32.00% 44.00% 64.00% 72.00%
facebook_beleble_tha 25.00% 45.00% 34.50% 39.50% 70.00% 13.50% 51.00% N/A 27.00% 24.50% 63.00% 50.00% 72.50% 65.00% 74.00% 63.50% 77.00% 90.00%
xcopa_th_200 45.00% 56.50% 49.50% 51.50% 74.50% 26.50% 47.00% N/A 51.50% 48.50% 68.50% 64.00% 82.00% 68.00% 74.00% 64.00% 80.00% 86.00%
xnli2.0_tha 33.50% 34.50% 39.50% 31.00% 47.00% 21.00% 43.00% N/A 37.50% 33.50% 16.00% 50.00% 69.00% 53.00% 54.50% 50.00% 68.00% 68.50%
onet_m3 (Average from 19-23) 17.85% 38.86% 34.11% 39.36% 56.15% 15.58% 23.92% N/A 21.79% 19.56% 21.37% 37.91% 49.97% 55.99% 57.41% 52.73% 40.60% 63.87%
onet_m6 (Average from 25-29) 21.14% 28.87% 22.53% 23.32% 42.85% 15.09% 19.48% N/A 16.96% 20.67% 28.64% 34.44% 46.29% 45.53% 50.23% 34.79% 38.49% 48.56%
****
Average Score 23.83% 37.27% 38.40% 40.33% 55.87% 18.06% 33.56% N/A 27.44% 23.75% 37.28% 43.07% 60.68% 52.30% 52.89% 50.65% 56.81% 68.32%

Licenses

Source Code: License Apache Software License 2.0.
Weight: Research and Commercial uses.

Sponsors

Supports

Description

Prompt format is Llama2

<s>[INST] <<SYS>>
system_prompt
<</SYS>>

question [/INST]

System prompt: You are a question answering assistant. Answer the question as truthful and helpful as possible คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด

How to use

  1. install VLLM (https://github.com/vllm-project/vllm)
  2. python -m vllm.entrypoints.api_server --model /path/to/model --tensor-parallel-size num_gpus
  3. run inference (CURL example)
curl --request POST \
    --url http://localhost:8000/generate \
    --header "Content-Type: application/json" \
    --data '{"prompt": "<s>[INST] <<SYS>>\nYou are a question answering assistant. Answer the question as truthful and helpful as possible คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด\n<</SYS>>\n\nอยากลดความอ้วนต้องทำอย่างไร [/INST]","use_beam_search": false, "temperature": 0.1, "max_tokens": 512, "top_p": 0.75, "top_k": 40, "frequency_penalty": 0.3 "stop": "</s>"}'

Authors

Disclaimer: Provided responses are not guaranteed.