Taiwan Words Translator 繁體中文台灣化翻譯器 by LLMs

https://github.com/SuJiaKuan/llm_tw_word

The model supports translation that converts text with China words to text with only Taiwan words. Example:

Input: 這個軟件的質量真高啊
Output: 這個軟體的品質真高啊

This Model

This model is fine-tuned from TinyLlama/TinyLlama-1.1B-Chat-v1.0 (by applying Instruction Finetuning). The dataset is collected from MBZUAI/Bactrian-X and automatically labeled by 繁化姬.

How to use

You can follow the example usage below, or see here to know how to integrate the model into a Python class.

import torch
from transformers import pipeline

SYSTEM_PROMPT = """\
對於輸入內容的中文文字，請將中國用語轉成台灣的用語，其他非中文文字或非中國用語都維持不變。

範例：
Input: ```這個視頻的質量真高啊```
Output: ```這個影片的品質真高啊```\
"""

text_trad = "這個軟件的質量真高啊"

pipeline = pipeline(
    "text-generation",
    model="feabries/TaiwanWordTranslator-v0.1",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

prompt = "Input: ```{}```".format(text_trad)
messages = [{
    "role": "system",
    "content": SYSTEM_PROMPT,
}, {
    "role": "user",
    "content": prompt,
}]
input_text = pipeline.tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
outputs = pipeline(
    input_text,
    do_sample=False,
    max_new_tokens=2048,
)
print(outputs[0]["generated_text"])
# <|system|>
# 對於輸入內容的中文文字，請將中國用語轉成台灣的用語，其他非中文文字或非中國用語都維持不變。
# 
# 範例：
# Input: ```這個視頻的質量真高啊```
# Output: ```這個影片的品質真高啊```</s>
# <|user|>
# Input: ```這個軟件的質量真高啊```</s>
# <|assistant|>
# Output: ```這個軟體的品質真高啊```

feabries
/

TaiwanWordTranslator-v0.1

Taiwan Words Translator 繁體中文台灣化翻譯器 by LLMs

This Model

How to use

Dataset used to train feabries/TaiwanWordTranslator-v0.1