NLLB Fine-tuned for Darija to Modern Standard Arabic Translation

This model is a fine-tuned version of facebook/nllb-200-distilled-600M for translating Moroccan Darija (ary) to Modern Standard Arabic (ar). The model was fine-tuned on a custom dataset using the Hugging Face transformers library. The model is developed by : Tachicart Ridouane, Bouzoubaa Karim tachicart@gmail.com

Model Details

Base Model: facebook/nllb-200-distilled-600M
Fine-tuning Library: Hugging Face transformers
Languages Supported: Moroccan Darija (ary), Modern Standard Arabic (ar)
Training Dataset: Custom dataset of Moroccan Darija and Modern Standard Arabic pairs in JSON format.

Performance

The model has been evaluated on a validation set to ensure translation quality. While it excels at capturing colloquial Moroccan Arabic, ongoing improvements and additional data can further enhance its performance.

Limitations

Dataset Size: The custom dataset consists of 21,000 samples, which may limit coverage of diverse expressions and rare terms. Colloquial Variations: Moroccan Arabic has many dialectal variations, which might not all be covered equally.

How to Use

You can use the model with the transformers library as follows:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("tachicart/nllb-ft-darija")
model = AutoModelForSeq2SeqLM.from_pretrained("tachicart/nllb-ft-darija")

# Example translation
inputs = tokenizer("كيفاش نقدر نربح بزاف ديال الفلوس بالزربة  ", return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

tachicart
/

nllb-ft-darija

You need to agree to share your contact information to access this model

NLLB Fine-tuned for Darija to Modern Standard Arabic Translation

Model Details

Performance

Limitations

How to Use