Edit model card

en-toki-mt

This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-ROMANCE on the English - toki pona translation dataset on Tatoeba.

Model description

toki pona is a minimalist constructed language created in 2014 by Sonja Lang. The language features a very small volcabulary (~130 words) and a very simple grammar structure.

Intended uses & limitations

This model aims to translate English to Toki pona.

Training and evaluation data

The training data is acquired from all En-Toki sentence pairs on Tatoeba (~20000 pairs), without any filtering. Since this dataset mostly only includes core words (pu), it may produce inaccurate results when encountering more complex words. The model achieved a BLEU score of 54 on the testing set.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Framework versions

  • Transformers 4.20.1
  • Pytorch 1.11.0
  • Datasets 2.3.2
  • Tokenizers 0.12.1
Downloads last month
8
Safetensors
Model size
77.5M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ckb/en-toki-mt

Finetuned
(8)
this model