ayoubkirouane
/

Med_English2Spanish

text2text-generation

Inference Endpoints

Model card Files Files and versions Community

ayoubkirouane commited on Sep 17, 2023

Commit

38dd65f

•

1 Parent(s): 2286b09

Update README.md

Files changed (1) hide show

README.md +13 -0

README.md CHANGED Viewed

@@ -22,6 +22,19 @@ library_name: transformers
 **Med_English2Spanish**  is a specialized neural machine translation model designed for translating medical content from English to Spanish.
 It has been fine-tuned to cater specifically to the medical domain, ensuring accurate and contextually relevant translations for healthcare professionals and researchers.
 ## Ethical Considerations
 **Med_English2Spanish**  is intended for medical professionals and researchers. Care has been taken to minimize biases in translations and ensure privacy by stripping PII during preprocessing. However, users are encouraged to review translations for accuracy in sensitive medical contexts.

 **Med_English2Spanish**  is a specialized neural machine translation model designed for translating medical content from English to Spanish.
 It has been fine-tuned to cater specifically to the medical domain, ensuring accurate and contextually relevant translations for healthcare professionals and researchers.
+## About Dataset:
+The dataset used in **Med_English2Spanish** is a critical component in ensuring accurate and contextually relevant medical translations. It is a subset of the "WMT-16-PubMed" dataset, which has been meticulously curated and adapted for this specific machine translation task. The dataset was compiled by collecting data from various reputable sources on the internet, as well as integrating content from another medical dataset, resulting in a comprehensive and diverse collection of medical documents.
+**https://huggingface.co/datasets/ayoubkirouane/med_en2es**
+**https://huggingface.co/datasets/qanastek/WMT-16-PubMed**
+### Dataset Statistics:
+* **Source:** Adapted from the **WMT-16-PubMed** dataset and other reputable medical sources.
+* **Total Examples:** 286,000
+* **Content:** The dataset comprises a wide range of medical texts, including research papers, clinical notes, and medical literature, covering various subfields within the healthcare domain.
+* **Data Cleaning:** The dataset underwent rigorous data cleaning and preprocessing, including the removal of personally identifiable information (PII) to ensure privacy and compliance with ethical standards.
 ## Ethical Considerations
 **Med_English2Spanish**  is intended for medical professionals and researchers. Care has been taken to minimize biases in translations and ensure privacy by stripping PII during preprocessing. However, users are encouraged to review translations for accuracy in sensitive medical contexts.