ayoubkirouane commited on
Commit
38dd65f
1 Parent(s): 2286b09

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -22,6 +22,19 @@ library_name: transformers
22
  **Med_English2Spanish** is a specialized neural machine translation model designed for translating medical content from English to Spanish.
23
  It has been fine-tuned to cater specifically to the medical domain, ensuring accurate and contextually relevant translations for healthcare professionals and researchers.
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ## Ethical Considerations
26
  **Med_English2Spanish** is intended for medical professionals and researchers. Care has been taken to minimize biases in translations and ensure privacy by stripping PII during preprocessing. However, users are encouraged to review translations for accuracy in sensitive medical contexts.
27
 
 
22
  **Med_English2Spanish** is a specialized neural machine translation model designed for translating medical content from English to Spanish.
23
  It has been fine-tuned to cater specifically to the medical domain, ensuring accurate and contextually relevant translations for healthcare professionals and researchers.
24
 
25
+ ## About Dataset:
26
+ The dataset used in **Med_English2Spanish** is a critical component in ensuring accurate and contextually relevant medical translations. It is a subset of the "WMT-16-PubMed" dataset, which has been meticulously curated and adapted for this specific machine translation task. The dataset was compiled by collecting data from various reputable sources on the internet, as well as integrating content from another medical dataset, resulting in a comprehensive and diverse collection of medical documents.
27
+
28
+ **https://huggingface.co/datasets/ayoubkirouane/med_en2es**
29
+
30
+ **https://huggingface.co/datasets/qanastek/WMT-16-PubMed**
31
+
32
+ ### Dataset Statistics:
33
+ * **Source:** Adapted from the **WMT-16-PubMed** dataset and other reputable medical sources.
34
+ * **Total Examples:** 286,000
35
+ * **Content:** The dataset comprises a wide range of medical texts, including research papers, clinical notes, and medical literature, covering various subfields within the healthcare domain.
36
+ * **Data Cleaning:** The dataset underwent rigorous data cleaning and preprocessing, including the removal of personally identifiable information (PII) to ensure privacy and compliance with ethical standards.
37
+
38
  ## Ethical Considerations
39
  **Med_English2Spanish** is intended for medical professionals and researchers. Care has been taken to minimize biases in translations and ensure privacy by stripping PII during preprocessing. However, users are encouraged to review translations for accuracy in sensitive medical contexts.
40