license: apache-2.0 base_model: Helsinki-NLP/opus-mt-ar-en

This model's role is to translate Daraija with Latin words or Arabizi into English. It was trained on 170,000 rows of translation examples.

This model is a fine-tuned version of Helsinki-NLP/opus-mt-ar-en on anDarija Open Dataset (DODa), an ambitious open-source project dedicated to the Moroccan dialect. With about 150,000 entries, DODa is arguably the largest open-source collaborative project for Darija <=> English translation built for Natural Language Processing purposes.

Training hyperparameters

The following hyperparameters were used during training:

  • GPU : H100 80GB SXM5
  • train_batch_size: 32
  • eval_batch_size: 32
  • num_epochs: 5
  • mixed_precision_training: True FP16 enabled
Downloads last month
41
Safetensors
Model size
76.4M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for lachkarsalim/LatinDarija_English-v2

Finetuned
(20)
this model