kaixkhazaki
/

turkish-medical-question-answering

@@ -16,7 +16,10 @@ should probably proofread and complete it, then remove this comment. -->
 # turkish-medical-question-answering
-This model is a fine-tuned version of [dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 1.2814
 - Exact Match: 52.7881
@@ -32,7 +35,64 @@ Test Metrics
 'eval_exact_match': 52.78810408921933,
 'eval_f1': 76.14367323441282}
 ## Model description
@@ -40,14 +100,73 @@ More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -60,6 +179,8 @@ The following hyperparameters were used during training:
 - lr_scheduler_warmup_steps: 1000
 - num_epochs: 10
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss | Exact Match | F1      |
@@ -157,3 +278,19 @@ The following hyperparameters were used during training:
 - Pytorch 2.4.1+cu121
 - Datasets 3.1.0
 - Tokenizers 0.21.0

 # turkish-medical-question-answering
+This model is a fine-tuned version of [dbmdz/bert-base-turkish-cased](https://huggingface.co/dbmdz/bert-base-turkish-cased) optimized for medical domain question answering in Turkish using incidelen/MedTurkQuAD dataset.
+It uses a BERT-based architecture with additional dropout regularization to prevent overfitting and is specifically trained to extract answers from medical text contexts.
 It achieves the following results on the evaluation set:
 - Loss: 1.2814
 - Exact Match: 52.7881
 'eval_exact_match': 52.78810408921933,
 'eval_f1': 76.14367323441282}
+## Usage
+```python
+# Use a pipeline as a high-level helper
+from transformers import pipeline
+pipe = pipeline("question-answering", model="kaixkhazaki/turkish-question-answering")
+#Enter your text and question
+# Example
+# Define the context
+context = """
+Kalça kırığından şüphe duyulan hastalarda öncelikle standart grafiler çekilmelidir. Bunlar ön arka pelvis grafisi ve etkilenen kalçanın ön arka ve yan grafileridir.
+Özellikle deplase olmayan kırıklarda sağlam taraf ile patolojik tarafın mukayese edilmesi önemlidir. Kırık kalçanın filmi, alt ekstremite hafif traksiyonda iken nötral pozisyonda,
+patella ışın düzlemine dikey halde çekilir. Trokanter majörün en az 10 cm distaline kadar görülmesi faydalı olacaktır. Ayrıca sağlam tarafın görülmesi ile osteoporoz ve hastanın
+normal boyun-cisim açısının tayininde önemlidir. Lateral radyografi posteriorda kırığın stabilitesini ve deplasman miktarını belirlemek için gereklidir. Lateral grafi çekimi acil
+olmamakla birlikte kırığın daha doğru değerlendirilmesi açısından önemlidir. Eğer hasta grafi masasında iken çekilemiyor ise, traksiyon masasına alındığında görülebilir.
+Nadiren de olsa tanı için tomografi çekilmesi gerekli olabilir. Bunun yanında kalça kırığı şüphesi yüksek olan, ancak direk grafide kırık tanısı konulamayan hastalara MR çekilerek
+tanı rahatlıkla konulabilir. Yine röntgende görünmeyen ancak kırık şüphesi yüksek olan hastalara 48-72 saat içerisinde yapılan sintigrafilerde duyarlılık % 100'dür.
+"""
+# Define the question
+question = "Lateral radyografi hangi durumlar için gereklidir?"
+pipe(question=question, context=context)
+>>
+{'score': 0.7423108220100403,
+ 'start': 595,
+ 'end': 662,
+ 'answer': 'posteriorda kırığın stabilitesini ve deplasman miktarını belirlemek'}
+#Example
+# Define the context
+context = """
+Kalça kırığından şüphe duyulan hastalarda öncelikle standart grafiler çekilmelidir. Bunlar ön arka pelvis grafisi ve etkilenen kalçanın ön arka ve yan grafileridir.
+Özellikle deplase olmayan kırıklarda sağlam taraf ile patolojik tarafın mukayese edilmesi önemlidir. Kırık kalçanın filmi, alt ekstremite hafif traksiyonda iken nötral pozisyonda,
+patella ışın düzlemine dikey halde çekilir. Trokanter majörün en az 10 cm distaline kadar görülmesi faydalı olacaktır. Ayrıca sağlam tarafın görülmesi ile osteoporoz ve hastanın
+normal boyun-cisim açısının tayininde önemlidir. Lateral radyografi posteriorda kırığın stabilitesini ve deplasman miktarını belirlemek için gereklidir. Lateral grafi çekimi acil
+olmamakla birlikte kırığın daha doğru değerlendirilmesi açısından önemlidir. Eğer hasta grafi masasında iken çekilemiyor ise, traksiyon masasına alındığında görülebilir.
+Nadiren de olsa tanı için tomografi çekilmesi gerekli olabilir. Bunun yanında kalça kırığı şüphesi yüksek olan, ancak direk grafide kırık tanısı konulamayan hastalara MR çekilerek
+tanı rahatlıkla konulabilir. Yine röntgende görünmeyen ancak kırık şüphesi yüksek olan hastalara 48-72 saat içerisinde yapılan sintigrafilerde duyarlılık % 100'dür.
+"""
+# Define the question
+question = "Trokanter majörün kaç cm distaline kadar görülmesi faydalıdır?"
+pipe(question=question, context=context)
+>>
+{'score': 0.8581815361976624,
+'start': 416,
+'end': 418,
+'answer': '10'}
+```
 ## Model description
 ## Intended uses & limitations
+**Intended Uses**
+* Medical question answering in Turkish
+* Information extraction from Turkish medical texts
+* Supporting medical professionals and researchers in finding specific information in medical documents
+**Limitations**
+* The model is specifically trained for the medical domain and may not perform well on general domain questions
+* Performance may vary on highly technical medical terminology not present in the training data
+* The model is limited to extractive QA (finding answers that are directly present in the text)
+## Bias, Risks, and Limitations
+* This model should not be used as a substitute for professional medical advice
+* The model may reflect biases present in the medical training data
+* Performance may vary across different medical specialties and terminology
+* The model is not suitable for answering complex medical questions requiring reasoning or synthesis of information
+## Training Details
+**Training Hyperparameters**
+* Base Model: dbmdz/bert-base-turkish-cased
+* Batch Size: 16
+* Learning Rate: 1e-5
+* Number of Epochs: 10
+* Weight Decay: 0.02
+* Warmup Steps: 1000
+* Learning Rate Scheduler: Cosine
+* Gradient Clipping: 1.0
+* Training Precision: BF16
+* Optimizer: AdamW
+**Model Architecture Modifications**
+* Hidden Dropout Probability: 0.2
+* Attention Probability Dropout: 0.2
 ## Training and evaluation data
+The model was trained on the Turkish Medical Question Answering dataset.
+https://huggingface.co/datasets/incidelen/MedTurkQuAD
+@INPROCEEDINGS{10711128,
+  author={İncidelen, Mert and Aydoğan, Murat},
+  booktitle={2024 8th International Artificial Intelligence and Data Processing Symposium (IDAP)},
+  title={Developing Question-Answering Models in Low-Resource Languages: A Case Study on Turkish Medical Texts Using Transformer-Based Approaches},
+  year={2024},
+  volume={},
+  number={},
+  pages={1-4},
+  keywords={Training;Adaptation models;Natural languages;Focusing;Encyclopedias;Transformers;Data models;Internet;Online services;Text processing;Natural Language Processing;Medical Domain;BERTurk;Question-Answering},
+  doi={10.1109/IDAP64064.2024.10711128}}
 ## Training procedure
+**Preprocessing**
+* Maximum Sequence Length: 384
+* Stride: 128
+* Question and context pairs are tokenized using BertTokenizerFast
+**Evaluation Strategy**
+* Evaluation performed every 50 steps
+* Best model saved based on F1 score
+* Metrics as Exact Match and F1 Score
 ### Training hyperparameters
 The following hyperparameters were used during training:
 - lr_scheduler_warmup_steps: 1000
 - num_epochs: 10
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss | Exact Match | F1      |
 - Pytorch 2.4.1+cu121
 - Datasets 3.1.0
 - Tokenizers 0.21.0
+## Citation
+```bibtex
+@misc{turkish-medical-question-answering,
+  author = {Fatih Demirci},
+  title = {Turkish Medical Question Answering Model},
+  year = {2024},
+  publisher = {HuggingFace},
+  journal = {HuggingFace Model Hub}
+  howpublished = {\url{https://huggingface.co/kaixkhazaki/turkish-medical-question-answering}}
+}
+```