romrawinjp's picture
Update README.md
b86187f verified
|
raw
history blame
3.09 kB
metadata
license: apache-2.0
datasets:
  - AIAT/Pangpuriye-dataset
  - AIAT/Pangpuriye-public_ThaiSum40k
  - AIAT/Pangpuriye-generated_by_LLama3-codeLlama
  - AIAT/Pangpuriye-public_alpaca-cleaned
  - AIAT/Pangpuriye-generated_by_typhoon
language:
  - th
  - en
pipeline_tag: text-generation
tags:
  - code_generation
  - sql
metrics:
  - accuracy

🤖 Super AI Engineer Development Program Season 4 - Pangpuriye Table-based Question Answering Model

logo

This model was fine-tuned from the original OpenThaiGPT-1.0.1-7b. The model is set under Apache license 2.0.

Example inference using huggingface transformers.

The following code is an exmaple of how to inference our model.

from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer
import pandas as pd

def get_prediction(raw_prediction):
    if "[/INST]" in raw_prediction:
        index = raw_prediction.index("[/INST]")
        return raw_prediction[index + 7:]

    return raw_prediction

tokenizer = LlamaTokenizer.from_pretrained("AIAT/Pangpuriye-openthaigpt-1.0.0-7b-chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("AIAT/Pangpuriye-openthaigpt-1.0.0-7b-chat", trust_remote_code=True)

schema = """your SQL schema"""
query = "หาจำนวนลูกค้าที่เป็นเพศชาย"

prompt = f"""
    [INST] <<SYS>>
    You are a question answering assistant. Answer the question as truthful and helpful as possible คุณคือผู้ช่วยตอบคำถาม จงตอบคำถามอย่างถูกต้องและมีประโยชน์ที่สุด
    <</SYS>>
    {schema}### (sql extract) {query} [/INST]
"""

tokens = tokenizer(prompt, return_tensors="pt")
output = model.generate(tokens["input_ids"], max_new_tokens=20, eos_token_id=tokenizer.eos_token_id)
print(get_prediction(tokenizer.decode(output[0], skip_special_tokens=True)))

Acknowledgements

The model collaborated by the members of Panguriye's house during the LLMs hackathon in Super AI Engineer Development Program Season 4.

We thank the organizers of this hackathon, OpenThaiGPT, AIAT, NECTEC and ThaiSC for this challenging task and opportunity to be a part of developing Thai large language model.

Citation Information

If our work will be beneficial to future development, please cite our model as follows:

@misc {artificial_intelligence_association_of_thailand_2024,
    author       = { {Artificial Intelligence Association of Thailand} },
    title        = { Pangpuriye-openthaigpt-1.0.0-7b-chat (Revision 21f9a62) },
    year         = 2024,
    url          = { https://huggingface.co/AIAT/Pangpuriye-openthaigpt-1.0.0-7b-chat },
    doi          = { 10.57967/hf/2193 },
    publisher    = { Hugging Face }
}