Edit model card

Terjman-Supreme (3.3B)

Our model is built upon the powerful Transformer architecture, leveraging state-of-the-art natural language processing techniques. It is a fine-tuned version of facebook/nllb-200-3.3B on a the darija_english dataset enhanced with curated corpora ensuring high-quality and accurate translations.

It achieves the following results on the evaluation set:

  • Loss: 2.3687
  • Bleu: 5.6718
  • Gen Len: 39.9504

The finetuning was conducted using a A100-40GB and took 57 hours.

Try it out on our dedicated Terjman-Supreme Space 🤗

Usage

Using our model for translation is simple and straightforward. You can integrate it into your projects or workflows via the Hugging Face Transformers library. Here's a basic example of how to use the model in Python:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("atlasia/Terjman-Supreme")
model = AutoModelForSeq2SeqLM.from_pretrained("atlasia/Terjman-Supreme")

# Define your Moroccan Darija Arabizi text
input_text = "Your english text goes here."

# Tokenize the input text
input_tokens = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)

# Perform translation
output_tokens = model.generate(**input_tokens)

# Decode the output tokens
output_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)

print("Translation:", output_text)

Example

Let's see an example of transliterating Moroccan Darija Arabizi to Arabic:

Input: "Hi my friend, can you tell me a joke in moroccan darija? I'd be happy to hear that from you!"

Output: "أهلا صاحبي، واش تقدر تقول لي نكتة بالدارجة المغربية؟ غادي نكون فرحان باش نسمعها منك!"

Limitations

This version has some limitations mainly due to the Tokenizer. We're currently collecting more data with the aim of continous improvements.

Feedback

We're continuously striving to improve our model's performance and usability and we will be improving it incrementaly. If you have any feedback, suggestions, or encounter any issues, please don't hesitate to reach out to us.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 25

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
2.7591 1.0000 8968 2.5401 4.6975 39.4298
2.593 1.9999 17936 2.4204 5.4569 40.4573
2.6088 2.9999 26904 2.3929 5.5729 40.7879
2.5744 4.0 35873 2.3821 5.567 40.27
2.5658 5.0000 44841 2.3752 5.6589 40.3085
2.5786 5.9999 53809 2.3725 5.5713 39.9532
2.5197 6.9999 62777 2.3705 5.5688 39.9642
2.5599 8.0 71746 2.3703 5.644 39.9449
2.5763 9.0000 80714 2.3692 5.5916 40.3581
2.5734 9.9999 89682 2.3695 5.6339 40.0248
2.5461 10.9999 98650 2.3689 5.6151 40.0055
2.5712 12.0 107619 2.3690 5.5447 40.3554
2.5338 13.0000 116587 2.3685 5.6284 40.0138
2.5733 13.9999 125555 2.3693 5.7858 40.4105
2.5621 14.9999 134523 2.3684 5.6093 39.9614
2.5205 16.0 143492 2.3685 5.6545 40.3444
2.5815 17.0000 152460 2.3686 5.7027 40.3416
2.5666 17.9999 161428 2.3686 5.6684 39.9284
2.5721 18.9999 170396 2.3683 5.6005 40.0028
2.5325 20.0 179365 2.3685 5.688 40.3223
2.5385 21.0000 188333 2.3684 5.6137 39.989
2.5399 21.9999 197301 2.3681 5.6324 40.3499
2.5501 22.9999 206269 2.3679 5.6608 39.9725
2.5419 24.0 215238 2.3679 5.6112 40.3196
2.5583 24.9993 224200 2.3687 5.6718 39.9504

Framework versions

  • Transformers 4.40.2
  • Pytorch 2.2.1+cu121
  • Datasets 2.19.1
  • Tokenizers 0.19.1
Downloads last month
108
Safetensors
Model size
3.34B params
Tensor type
BF16
·

Finetuned from

Collection including atlasia/Terjman-Supreme