LocalTranslate EN -> RO
This is a English -> Romanian translation model trained by Mihai Popa (with the help of Kimi K2.6 Thinking)! It's based on BART, and it's a lot smaller than other models.
Why?
Because it's for my own future LocalTranslate project! Unlike Google Translate, this uses models (LTR files in the future) that you download and put on your device! And yes, it translates locally!
Notes
- Trained in just 22 minutes on Colab T4 GPU. 24M parameters and 92 MB in size!
Model Configurations
| Parameter |
Value |
| Tokenizer |
BPE |
| Vocabulary Size |
16384 tokens |
| Batch Size |
128 x 1 = 128 |
| Context Window |
128 tokens |
max_position_embeddings |
128 |
encoder_layers |
6 |
decoder_layers |
6 |
encoder_attention_heads |
6 |
decoder_attention_heads |
6 |
encoder_ffn_dim |
768 |
decoder_ffn_dim |
768 |
d_model |
384 |
dropout |
0.1 |
attention_dropout |
0.1 |
activation_function |
"gelu" |
init_std |
0.02 |
scale_embedding |
True |
normalize_before |
True |
add_final_layer_norm |
True |
pad_token_id |
tokenizer.pad_token_id |
bos_token_id |
tokenizer.bos_token_id |
eos_token_id |
tokenizer.eos_token_id |
decoder_start_token_id |
tokenizer.eos_token_id |
forced_eos_token_id |
tokenizer.eos_token_id |
tie_word_embeddings |
True |
Training Configurations
| Hyperparameter |
Value |
output_dir |
"./localtranslate_en_ro" |
do_train |
True |
do_eval |
True |
eval_strategy |
"steps" |
eval_steps |
500 |
learning_rate |
3e-4 |
weight_decay |
0.01 |
max_steps |
6000 |
warmup_steps |
1000 |
logging_steps |
50 |
save_steps |
500 |
save_total_limit |
5000 |
fp16 |
True |
gradient_checkpointing |
True |
label_smoothing_factor |
0.1 |
predict_with_generate |
False |
report_to |
"none" |
dataloader_num_workers |
2 |
remove_unused_columns |
False |
Limitations
- Not Perfect: As with any other model, it's not 100% perfect and can generate incorrect translations!
- English-Only: It's for English -> Romanian translation (NOT vice-versa)!
Evaluation Results
| Metric |
Score (greedy search on dev split) |
Score (beam search, 2 beams on dev split) |
Score (3 beams on dev split) |
Score (greedy on Flores 200) |
Score (3-beam on Flores 200) |
| BLEU |
33.74 |
34.67 |
34.83 |
11.31 |
12.35 |
| chrF++ |
61.35 |
62.09 |
62.29 |
39.49 |
40.70 |
Usage
Code is by Gemini 3 Flash/Kimi K2.6 Thinking (then some little modifications by myself):
from transformers import BartForConditionalGeneration, AutoTokenizer
import torch
model_id = "MihaiPopa-1/LocalTranslate_EN_RO"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = BartForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.float32,
device_map="cpu"
)
inputs = tokenizer("At your current usage level, this runtime may last up to 1 hour.", return_tensors="pt")
outputs = model.generate(
**inputs,
max_length=64,
num_beams=2,
early_stopping=True,
forced_eos_token_id=tokenizer.eos_token_id,
)
translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(translation)
Data Used
| Dataset |
Translation Pairs |
| Europarl 7 |
~399k raw -> ~340k cleaned |
| Tatoeba+ |
16k |
| Total |
343488 |
| Dev Split |
2000 |