MIMIC Sepsis IQL Offline RL Policy
This repository contains a research-only Implicit Q-Learning (IQL) policy checkpoint trained on a Sepsis-3 MIMIC-IV ICU cohort represented as 4-hour MDP steps. The model maps a 62-dimensional normalized clinical state vector to logits over 25 discrete treatment actions.
Model Details
- Algorithm: discrete-action Implicit Q-Learning (IQL)
- Selected configuration:
iql_sofa_shaped_conservative_safe - Reward variant: SOFA-shaped offline reward
- State dimension: 62
- Action space: 25 actions, interpreted as
5 vasopressor bins x 5 IV fluid bins - Actor/value/critic hidden sizes:
[256, 256] - Training epochs: 200
- Final checkpoint step: 110000
- Source project:
mimic-sepsis-drl(GitHub, commitd3f1fc2)
Files
model.safetensors: Safetensors checkpoint containing the actor policy network weights.config.json: architecture and training metadata for repository consumers.modeling_iql.py: minimal PyTorch inference wrapper for loading the actor and decoding action IDs. Loads frommodel.safetensorsby default; falls back to.bin.requirements.txt: runtime dependency pins needed for local inference.LICENSE.md: license and access caveats.
Intended Use
⚠️ RESEARCH ONLY — NOT FOR CLINICAL USE
This model is not a clinical decision support system. Do not use it for diagnosis, treatment, triage, live patient care, or any automated medical decision. It is a research artifact for retrospective offline reinforcement learning studies only.
Use this artifact for retrospective offline RL research, reproducibility checks, model-card review, and benchmark comparisons under the same preprocessing and cohort definition used during training.
Out-of-scope uses include clinical decision support, direct deployment, patient-level recommendations, commercial medical products, and evaluations that mix this checkpoint with a different state representation without revalidation.
Loading The Policy
from modeling_iql import decode_action, load_policy
# Loads from model.safetensors by default (recommended)
policy = load_policy("model.safetensors")
state = [0.0] * 62 # Replace with a normalized 62-feature state from the project pipeline.
action_id = policy.select_action(state)
print(action_id, decode_action(action_id))
For batch inference, call the model directly with a torch.Tensor of shape (batch_size, 62) and take argmax(dim=-1) over the returned logits.
Note: This model uses a custom PyTorch wrapper (modeling_iql.py) and is not compatible with Hugging Face transformers from_pretrained() auto-loading. Use the provided wrapper for inference.
Training Data
The policy was trained from a derived MIMIC-IV Sepsis-3 ICU cohort. The model repository does not include raw patient data. Reproducing the dataset requires authorized MIMIC-IV access through PhysioNet and the preprocessing pipeline in the source project.
The source project describes the expected raw data location, cohort extraction, onset assignment, episode grid construction, train/validation/test split, feature building, action binning, reward generation, and replay dataset creation.
Source repository: https://github.com/furkan-uzmez/mimic-sepsis-drl (commit d3f1fc2)
Evaluation
Final repeated-seed bootstrap summary from the local evaluation artifacts:
| Metric | Value |
|---|---|
| FQE mean | 2.8478 |
| WIS mean | 8.2032 |
| WIS 95% CI lower mean | 4.9631 |
| WIS 95% CI upper mean | 10.8173 |
| ESS mean | 29.4097 |
| Clinician agreement mean | 0.4137 |
| Support mass mean | 0.9905 |
| Low-support action rate mean | 0.0095 |
| Severe safety flags mean | 0.0000 |
Evaluation was performed offline on held-out replay data using FQE/WIS/support and safety diagnostics.
⚠️ These results are offline estimates only. No prospective validation has been performed. They are not evidence of clinical safety or efficacy.
Limitations And Risks
- Offline RL estimates can be biased by dataset support, confounding, cohort definition, reward design, and model selection choices.
- The action space is discretized and does not capture all treatment context, contraindications, timing, or bedside judgment.
- The model only supports the exact 62-feature normalized state representation used by the source pipeline.
- MIMIC-IV access restrictions and clinical data governance requirements apply to reproduction and downstream use.
- License compatibility for the derived model should be reviewed before publishing publicly.
Citation
No formal paper citation is available for this project. If you use this model, please cite the project repository and this model card:
- Project repository: https://github.com/furkan-uzmez/mimic-sepsis-drl
- Model card: https://huggingface.co/ryan12345441/mimic-sepsis-iql
License
This model is released under a research-only license. See LICENSE.md for details.
Key restrictions:
- Research use only — no clinical, commercial, or diagnostic use
- MIMIC-IV data use terms apply (PhysioNet credentialed access required)
- No warranty; provided "as-is" for academic research
Maintainers
Prepared as a Hugging Face model repository from local training artifacts. Source: mimic-sepsis-drl (commit d3f1fc2).
- Downloads last month
- 17
Evaluation results
- FQE mean on MIMIC-IV Sepsis-3 ICU cohortself-reported2.848
- WIS mean on MIMIC-IV Sepsis-3 ICU cohortself-reported8.203
- Clinician agreement on MIMIC-IV Sepsis-3 ICU cohortself-reported0.414
- Support mass on MIMIC-IV Sepsis-3 ICU cohortself-reported0.991