Translation
Transformers
Safetensors
English
Lojban
marian
text2text-generation
lojban
machine-translation
conlang
Instructions to use MihaiPopa-1/opus-mt-en-jbo with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use MihaiPopa-1/opus-mt-en-jbo with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="MihaiPopa-1/opus-mt-en-jbo")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("MihaiPopa-1/opus-mt-en-jbo") model = AutoModelForSeq2SeqLM.from_pretrained("MihaiPopa-1/opus-mt-en-jbo") - Notebooks
- Google Colab
- Kaggle
Opus MT English <-> Lojban
This is the first neural machine translation model that can translate from English to Lojban (and vice-versa!). Fine-tuned from this Opus MT model, this model is designed for translation tasks on any device!
Features
- The First Lojban Translation Model: The first neural machine translation model (that we know of) that supports Lojban! We're going to support every single language on Earth!
- Tiny Size: Beats any other large model on speed and memory usage. No other model is able to compete with this!
Notes
- BLEU on the validation split (yes, 200 sentences per each pair) is generally impressive, but not perfect!
- As per the base model, to translate English to Lojban put ">>jbo<<" at the start then your text. To translate in the opposite direction put ">>en<<" at the start instead!
Evaluation Results
| Direction | BLEU (on val split, 200 sentences per each pair) |
|---|---|
| English -> Lojban | 40.33 |
| Lojban -> English | 45.18 |
Usage
Code is by Colab's Auto-Completion (then some little modifications by myself):
# Translate with a Opus MT model!
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("MihaiPopa-1/opus-mt-en-jbo")
model = AutoModelForSeq2SeqLM.from_pretrained("MihaiPopa-1/opus-mt-en-jbo")
text = ">>jbo<< The password is \"Mihai Popa\" "
# text = ">>en<< lo pamoi cu zo'e \"Mihai Popa\"" (for the opposite direction)
input_ids = tokenizer.encode(text, return_tensors="pt")
outputs = model.generate(
input_ids,
num_beams=5
)
decoded_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(decoded_text)
Data Used
I used Tatoeba's latest snapshot!
- Downloads last month
- 120
Model tree for MihaiPopa-1/opus-mt-en-jbo
Base model
Helsinki-NLP/opus-mt-en-ROMANCE