You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

ChindaMT-4B

ChindaMT-4B is an open-weight Thai-English machine translation model fine-tuned from Qwen/Qwen3.5-4B. It supports plain translation and instruction-following translation with auxiliary rules in the prompt.

  • Task: Thai-English machine translation with instruction-following
  • Base model: Qwen3.5-4B
  • Parameter count: 4B
  • License: Apache-2.0

Prompting

Plain translation. Same template for both directions; swap the language line and the source-tag:

Translate English to Thai.

EN: The weather is nice today.
Translate Thai to English.

TH: วันนี้อากาศดีมาก

With instruction following. Add a Rules: block between the language line and the source line. Rules are free-form text:

Translate English to Thai.
Rules:
- Return only the translated text
- Use a clear, professional tone in Thai
- Keep all numerals in Arabic digits

EN: <source text>

Inference

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "iapp/ChindaMT-4B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

prompt = "Translate English to Thai.\n\nEN: The weather is nice today."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs, max_new_tokens=1024, temperature=0.01, top_p=0.7, top_k=20,
    repetition_penalty=1.05, do_sample=True,
)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Evaluation datasets

The evaluation suites used during development will be released soon:

Limitations

  • Thai-English only.
  • Behavior on out-of-domain or paragraph-length inputs is not comprehensively characterized.

iApp AI Research

Downloads last month
20
Safetensors
Model size
5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for iapp/ChindaMT-4B

Finetuned
Qwen/Qwen3.5-4B
Finetuned
(216)
this model
Quantizations
1 model