Instructions to use Sandroeth/cali-id-en-translate with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Sandroeth/cali-id-en-translate with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="Sandroeth/cali-id-en-translate", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Sandroeth/cali-id-en-translate", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Cali Indonesian-English Translation Model
Model terjemahan kecil 121M parameters hasil fine-tuning dari Sandroeth/cali-0.1B untuk menerjemahkan antara bahasa Indonesia dan Inggris.
Apa yang Bisa Dilakukan
Untuk kalimat sehari-hari sederhana, percakapan casual, dan terjemahan cepat ID ke EN atau sebaliknya model ini cukup bisa diandalkan. Cocok juga untuk aplikasi yang butuh model ringan dan responsif karena ukurannya kecil.
Tapi ada batasannya. Kalimat sangat pendek 1-2 kata sering menghasilkan output yang tidak relevan. Istilah teknis atau domain khusus seperti machine learning, medical, atau hukum juga sering meleset. Teks panjang lebih dari 200 kata cenderung mulai drift di tengah jalan.
Data Training
- Total samples: 179,999 bilingual pairs
- Training set: 161,999 samples
- Eval set: 18,000 samples
- Indonesian: 89,999 samples
- English: 90,000 samples
- Epochs: 3
- Learning rate: 5e-5
Format Prompt
Model dilatih dengan prompt template spesifik. Gunakan format ini agar hasilnya konsisten:
[id→en]
{teks input}
→
atau untuk arah sebaliknya:
[en→id]
{teks input}
→
Cara Pakai
Kode inferensi:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch, re
model_id = "Sandroeth/cali-id-en-translate"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True, device_map="auto")
model.eval()
def translate(text, locale):
direction = "[id→en]" if locale == "id" else "[en→id]"
prompt = f"{direction}\n{text}\n→"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=64,
do_sample=False,
repetition_penalty=1.5,
no_repeat_ngram_size=3,
pad_token_id=tokenizer.eos_token_id,
use_cache=False,
)
gen = out[0][inputs["input_ids"].shape[1]:]
result = tokenizer.decode(gen, skip_special_tokens=True).strip()
return re.split(r'(?<=[.!?])\s+', result)[0]
print(translate("Dia pergi ke pasar setiap pagi.", "id"))
print(translate("The weather is very cold today.", "en"))
Contoh Hasil
| Input | Locale | Output |
|---|---|---|
| Saya makan nasi. | id | I eat rice. |
| Dia pergi ke pasar setiap pagi. | id | He goes to the market every morning. |
| She is happy. | en | Dia bahagia. |
| The weather is very cold today. | en | Cuacanya sangat dingin hari ini. |
| Pemerintah sedang membangun infrastruktur baru. | id | The government is building new infrastructure. |
Citation
If you use or reference this model in your research or projects, please cite:
@article{cali2026,
title = {CALI 0.1B},
author = {Sandroeth},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/Sandroeth/cali-0.1B}
}
Author
Sandroeth
Lisensi
Apache License 2.0
- Downloads last month
- 133
Model tree for Sandroeth/cali-id-en-translate
Base model
Sandroeth/cali-0.1B