jojo-ai-mst
/

thai-opt350m-instruct

Text Generation

Chitwania Tharu

southeast-asian

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

thai-opt350m-instruct / README.md

jojo-ai-mst's picture

Update README.md

fe262c9 verified 4 months ago

|

raw history blame contribute delete

No virus

3.33 kB

	---
	library_name: transformers
	tags:
	- thai
	- opt
	- generative ai
	- SEA
	- southeast-asian
	- nlp
	license: mit
	language:
	- the
	widget:
	- text: 'User: อะไรคือวิธีที่ดีที่สุดในการทําความสะอาดพรม Assistant:'
	example_title: Example 1
	- text: >-
	User: ทําไมเครื่องตรวจจับควันถึงส่งเสียงบี๊บเมื่อแบตเตอรี่เหลือน้อย
	Assistant:
	example_title: Example 2
	- text: 'User: เมื่อไหร่จะเป็นเวลาที่ดีที่สุดของปีที่จะจองล่องเรือ? Assistant:'
	example_title: Example 3
	- text: >-
	User: ฉันต้องการใช้โฟโต้ชอป แต่ฉันไม่คุ้นเคยกับซอฟต์แวร์
	คุณช่วยแนะนําบทเรียนเบื้องต้นเพื่อช่วยฉันให้เร็วขึ้นหน่อยได้ไหม? Assistant:
	example_title: Example 4
	- text: 'User: ฉันควรทําอย่างไรถ้าฉันเจอหมีขณะตั้งแคมป์? Assistant:'
	example_title: Example 5
	---

	![thai-opt350m-instruct-logo](thai-opt350m.png)

	# thai-opt350m-instruct
	a generative language model for thai language based on opt350m

	Thai-OPT350M-Instruct is a fine-tuned pretrained transformer for thai language based on facebook/opt-350m.

	Dataset for thai-opt350m-instruct
	- yadapruk/thai-instructions-rallio

	## Base Model
	- Facebook Open Pretrained Transformer

	## Languages
	- mainly support Thai Language
	- a few English, Chinese, Arabic

	## Training
	- epochs - 12
	- training loss - 0.809200

	# Model Page

	[https://huggingface.co/jojo-ai-mst/thai-opt350m-instruct](https://huggingface.co/jojo-ai-mst/thai-opt350m-instruct)

	## Prompt Format

	```
	User: อะไรคือวิธีที่ดีที่สุดในการทําความสะอาดพรม Assistant:
	```

	# How to use

	```python
	# pip install transformers

	from transformers import AutoModelForCausalLM,AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("jojo-ai-mst/thai-opt350m-instruct")
	tokenizer = AutoTokenizer.from_pretrained("jojo-ai-mst/thai-opt350m-instruct")

	def generate_text(prompt, max_length=200, temperature=0.8, top_k=50):
	input_ids = tokenizer.encode(prompt, return_tensors="pt").cuda() # remove .cuda() if only cpu
	output = model.generate(
	input_ids,
	max_length=max_length,
	temperature=temperature,
	top_k=top_k,
	pad_token_id=tokenizer.eos_token_id,
	do_sample=True
	)
	for result in output:
	generated_text = tokenizer.decode(result, skip_special_tokens=True)
	print(generated_text)

	generate_text("User: อะไรคือวิธีที่ดีที่สุดในการทําความสะอาดพรม Assistant:")
	```

	# Date of Release

	22/03/2024

	# License

	MIT

	# Author

	[Min Si Thu](https://www.linkedin.com/in/min-si-thu/)

	# Notes

	This ai model is a movement of [MyanmarGPT-Movement](https://github.com/MyanmarGPT-Movement).