Edit model card

Model Description

Faseeh, an innovative breakthrough in the field of machine translation, specializes in converting Arabic dialects into English. This pre-trained model for machine translation is based on the foundation of advanced language processing techniques. Faseeh not only signifies a remarkable technological feat but also underscores NADSOFT's dedication to enhancing the quality of AI outcomes for Arabic language speakers. This contribution holds particular importance for the Middle East and North Africa (MENA) region and the broader Arab world, aiming to address the distinct linguistic subtleties and cater to the specific requirements of these communities.

Intended Uses & Limitations

Faseeh is currently in the developmental phase, and users should be mindful of its inherent limitations. For instance, the model may encounter challenges accurately translating text from speakers with strong accents, such as Moroccan Arabic. Additionally, Faseeh may face difficulties in transcribing text from recordings with significant background noise.

It's crucial to acknowledge that Faseeh is not flawless and, therefore, should not be relied upon to generate text for use in contexts involving legal, medical, or other sensitive matters.

Furthermore, it's important to highlight that Faseeh is not yet equipped to handle all dialects. While it supports Modern Standard Arabic (MSA), Egyptian, Levantine, Algeria, and Moroccan, work is underway to include support for other dialects in the near future. Users are advised to consider these limitations when utilizing Faseeh for their specific language translation needs.

Training and evaluation

Before fine-tunning image/png
After fine-tunning image/png

How To Use

# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
model_c = 'nadsoft/Faseeh-v0.1-beta'
tokenizer = AutoTokenizer.from_pretrained(model_c,)
model = AutoModelForSeq2SeqLM.from_pretrained(model_c)

# use the pipe

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
text = 'ู‡ู†ุง ุงู„ู†ุต ุงู„ู„ูŠ ุงู†ุช ุนุงูˆุฒ ุชุชุฑุฌู…ู‡'
translator = pipeline('translation', model=model, tokenizer=tokenizer, src_lang='ajp_Arab', tgt_lang='eng_Latn', max_length = 400)

output = translator(text)
translated_text = output[0]['translation_text']
print(translated_text)
# out put ===> Here is the text that you want to translate

#use the model

#translate from Arabic to English
text = "ู†ู‚ุฏ ุงู„ููŠู„ู… ู‡ูˆู‡ ุงู†ูƒ ุชู‚ูŠู…ูŠ ุงู„ููŠู„ู… ูƒูŠู ูƒุงู†ุŒ ุงู„ู†ุงุณ ุงู„ูŠ ุจูŠูƒุชุจูˆุง ุฑุฃูŠู‡ู… ุนู† ุงู„ุงูู„ุงู… ุจูŠู†ุญูƒูŠ ุนู†ู‡ู… ู†ู‚ุงุฏ ุงู„ุงูู„ุงู…."
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=128, num_beams=4, early_stopping=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# out put ===> Movie criticism is the evaluation of a movie as it was, people who write their opinion about movies are talked about by movie critics.

Examples

image/png

image/png

image/png

image/png

image/png

Downloads last month
13
Safetensors
Model size
615M params
Tensor type
F32
ยท

Finetuned from

Dataset used to train nadsoft/Faseeh-v0.1-beta