Create README.md

Browse files

Files changed (1) hide show

README.md +323 -0

README.md ADDED Viewed

	@@ -0,0 +1,323 @@

+---
+language:
+- en
+license: other
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- clinical-nlp
+- medical-coding
+- icd10
+- icd-10-cm
+- reasoning
+- reinforcement-learning
+- grpo
+- healthcare
+base_model:
+- Qwen/Qwen2.5-7B-Instruct
+---
+# DeepICD-R1-7B
+## Model Summary
+**DeepICD-R1-7B** is a clinical reasoning language model for **ICD-10-CM diagnosis outcome prediction from admission notes**.
+It is derived from **Qwen2.5-7B-Instruct** and trained using the **DeepICD-R1 framework**, which combines structured reasoning traces with reinforcement learning and hierarchical reward signals.
+The model is designed to predict a **single ICD-10-CM diagnosis code** from clinical text while producing an interpretable reasoning trace explaining the decision.
+The training methodology follows the approach described in the paper:
+**DeepICD-R1: Medical Reasoning through Hierarchical Rewards and Unsupervised Distillation**
+This work frames clinical diagnosis prediction as a **reasoning task optimized through reinforcement learning**.
+---
+# Model Details
+- **Model name:** DeepICD-R1-7B
+- **Organization:** DATEXIS
+- **Base model:** Qwen2.5-7B-Instruct
+- **Parameters:** ~7B
+- **Task:** Single ICD-10-CM diagnosis prediction from admission notes
+- **Training paradigm:** Supervised reasoning + reinforcement learning
+- **Framework:** VERL RL trainer
+- **Domain:** Clinical NLP / healthcare reasoning
+The Qwen2.5-7B-Instruct architecture is a **7-billion-parameter instruction-tuned language model designed for instruction following and long-form generation tasks**. :contentReference[oaicite:1]{index=1}
+---
+# Intended Use
+This model is intended for **research purposes**, including:
+- clinical reasoning research
+- ICD-10-CM coding prediction
+- reinforcement learning for language models
+- reasoning trace generation
+- structured prediction from clinical text
+### Out-of-Scope Use
+This model **must not be used for**:
+- medical diagnosis
+- clinical decision support
+- patient triage
+- automated medical coding without expert supervision
+- billing or compliance workflows
+---
+# Training Methodology
+The **DeepICD-R1 framework** treats diagnosis prediction as a reasoning problem.
+Training combines:
+### 1. Supervised reasoning traces
+A dataset of reasoning chains explaining diagnosis predictions.
+### 2. Reinforcement learning optimization
+Training uses **Group Relative Policy Optimization (GRPO)** to improve reasoning and prediction accuracy.
+### 3. Hierarchical reward signals
+Rewards are aligned with the hierarchical structure of ICD codes.
+The reward function combines:
+- **format reward** — correct reasoning + diagnosis structure
+- **outcome reward** — correct diagnosis prediction
+- **hierarchical reward** — partial credit for correct ICD prefixes
+This design encourages models to produce both **accurate diagnoses and structured reasoning**.
+---
+# Training Data
+The training task uses **clinical admission notes paired with ICD-10-CM diagnosis codes**, derived from de-identified electronic health record datasets such as **MIMIC-IV**.
+Task formulation:
+**Input**
+Clinical admission note describing patient presentation.
+**Output**
+Structured reasoning trace and predicted ICD-10-CM code.
+---
+# Output Format
+The model is trained to produce structured outputs separating reasoning from the final diagnosis.
+### Example
+```text
+<think>
+The patient presents with ...
+Symptoms and clinical history suggest ...
+...
+</think>
+<diagnosis>
+M5116
+</diagnosis>
+```
+## Training Configuration
+The model was trained using the **VERL reinforcement learning trainer** with **Group Relative Policy Optimization (GRPO)**, following the DeepICD-R1 training framework.
+### Core Training Parameters
+| Parameter | Value |
+|-----------|------|
+| Algorithm | GRPO |
+| Training framework | VERL (`verl.trainer.main_ppo`) |
+| Base model | Qwen2.5-7B-Instruct |
+| Training batch size | 64 |
+| PPO mini batch size | 64 |
+| PPO micro batch size per GPU | 16 |
+| Learning rate | 1e-6 |
+| LR warmup steps | 80 |
+| Total epochs | 1 |
+| Max prompt length | 2048 tokens |
+| Max response length | 1024 tokens |
+### Rollout / Generation Settings
+| Parameter | Value |
+|-----------|------|
+| Rollout engine | vLLM |
+| Samples per prompt (`n`) | 8 |
+| Temperature | 0.9 |
+| Top-k | disabled |
+| dtype | bfloat16 |
+| Tensor parallel size | 1 |
+| GPU memory utilization | 0.4 |
+### Optimization Details
+| Parameter | Value |
+|-----------|------|
+| Entropy coefficient | 0.001 |
+| KL controller coefficient | 0.001 |
+| KL loss | disabled |
+| Gradient checkpointing | enabled |
+| Torch compile | enabled |
+| FSDP param offload | disabled |
+| FSDP optimizer offload | disabled |
+### Hardware
+| Component | Value |
+|-----------|------|
+| GPUs | 4 |
+| Nodes | 1 |
+| Precision | bfloat16 |
+### Reward Function
+Training uses a **custom batched reward function** combining several reward signals:
+- **Outcome reward** — correct ICD-10 prediction
+- **Format reward** — correct `<think>` and `<diagnosis>` structure
+- **Hierarchical reward** — partial credit for ICD prefix matches
+- **Reasoning reward** — encourages meaningful reasoning traces
+- **LLM-based reward** — optional external judge scoring
+These rewards align the model toward producing **both accurate diagnoses and structured reasoning traces**.
+The reasoning trace provides transparency into how the diagnosis was derived from the clinical note.
+---
+## Evaluation
+Evaluation follows the methodology described in the **DeepICD-R1 paper**.
+Performance is measured using **macro-averaged F1 scores** at multiple levels of the ICD hierarchy.
+| Level | Description |
+|------|-------------|
+| Chapter | Broad ICD category |
+| Category | First three digits |
+| Full code | Complete ICD-10 code |
+Hierarchical evaluation allows partial credit when the model predicts the correct high-level diagnostic category even if the full code is incorrect.
+---
+## Limitations
+Models following the **DeepICD-R1 framework** share several limitations.
+### Dataset limitations
+- Training data consists primarily of **English clinical notes**
+- Distribution reflects **hospital-specific patient populations**
+- ICD labels are **highly imbalanced**, affecting rare diagnoses
+### Model limitations
+- Reasoning traces may appear convincing while being incorrect
+- Predictions may fail for rare or long-tail diagnoses
+- Models may demonstrate **premature diagnostic closure**
+- Reinforcement learning rewards are only proxies for expert feedback
+---
+## Ethical Considerations
+This model is trained on **de-identified clinical data** and intended strictly for research.
+### Potential risks
+- propagation of dataset biases
+- overconfidence in generated reasoning
+- misuse in clinical decision making
+### Appropriate safeguards
+- expert oversight
+- dataset bias evaluation
+- fairness audits
+- controlled deployment environments
+---
+## Hardware and Training Setup
+Typical training configuration for models in this family includes:
+- **GPUs:** multi-GPU training (4–8 GPUs)
+- **Precision:** bfloat16
+- **Rollout engine:** vLLM
+- **Training framework:** VERL PPO / GRPO trainer
+- **Sampling:** multiple rollouts per prompt
+---
+## Usage
+### Transformers Example
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_id = "DATEXIS/DeepICD-R1-7B"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    device_map="auto",
+    torch_dtype="auto"
+)
+prompt = """
+You are a clinical reasoning model.
+Given the following admission note,
+produce reasoning in <think> tags
+and a final ICD-10 diagnosis in <diagnosis> tags.
+[ADMISSION NOTE]
+"""
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=512
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Recommended Inference Practices
+- Use prompts consistent with the training format.
+- Validate predicted ICD-10 codes against official code formats.
+- Always review predictions with medical experts.
+- Avoid exposing reasoning traces in safety-critical settings without verification.
+---
+## Citation
+If you use this model, please cite:
+```bibtex
+@inproceedings{roehr2026deepicdr1,
+  title={DeepICD-R1: Medical Reasoning through Hierarchical Rewards and Unsupervised Distillation},
+  author={R{\"o}hr, Tom and Steffek, Thomas and Teucher, Roman and Bressem, Keno and others},
+  booktitle={Proceedings of LREC-COLING},
+  year={2026}
+}