AfroRoBERTa English-to-Yoruba Translation Model for Medical Domain

Model Description

This is a fine-tuned version of the AfroRoBERTa language model, trained to perform English-to-Yoruba translation in the medical domain. The model is intended to be used for translating medical texts, such as patient records, medical literature, or health information, from English to the Yoruba language.

Training Data

The training data for this model consists of 6,740 English sentences commonly used in Nigerian medical facilities, along with their corresponding Yoruba translations. The English sentences were generated to represent typical language used in the medical field, and the Yoruba translations were provided by linguistic experts.

Performance

The model was evaluated using the BLEU (Bilingual Evaluation Understudy) score, a standard metric for assessing the quality of machine translation outputs. On a held-out test set, the model achieved a BLEU score of 0.79, indicating a high level of translation accuracy compared to reference translations.

In comparative evaluations, this fine-tuned model outperformed popular translation services like Google Translate and Yandex API when translating medical texts from English to Yoruba.

Use Cases

This model can be applied in various medical contexts where English-to-Yoruba translation is required, such as:

Translating medical records and patient information for Yoruba-speaking individuals.
Providing Yoruba translations of medical literature and health education materials.
Enabling communication between English-speaking medical professionals and Yoruba-speaking patients or caregivers.
Developing text-to-speech systems or chatbots for Yoruba-speaking medical professionals or patients.

Limitations and Ethical Considerations

While this model demonstrates strong performance in translating medical texts, it is important to note that it may still make errors or produce biased or offensive outputs. The model's performance may also vary depending on the specific domain or context of the input text.

It is crucial to exercise caution when using this model for critical medical applications, as mistranslations or misinterpretations could potentially lead to harmful consequences. Human review and oversight are recommended, especially in high-stakes situations.

Additionally, the training data for this model may reflect biases present in the original sources, which could be perpetuated in the model's outputs. Efforts should be made to identify and mitigate such biases through careful data curation, model evaluation, and responsible deployment practices.

Usage

The fine-tuned model is available on the Hugging Face platform under the name "Tinny-Robot/Florence-Mode". You can access the model and its associated resources at the following link: https://huggingface.co/Tinny-Robot/Florence-Mode

To use the model, you can follow the instructions provided in the repository's README file or consult the Hugging Face documentation for guidance on loading and running the model.

Acknowledgments

This project was completed as a final year project by [Your Name] at [Your Institution]. Special thanks to the linguistic experts who provided the Yoruba translations and contributed their expertise to the training data.

license: openrail

language:

metrics:

brier_score

library_name: keras

pipeline_tag: translation

Tinny-Robot
/

Florence-Model