MIMIC Sepsis IQL Offline RL Policy

This repository contains a research-only Implicit Q-Learning (IQL) policy checkpoint trained on a Sepsis-3 MIMIC-IV ICU cohort represented as 4-hour MDP steps. The model maps a 62-dimensional normalized clinical state vector to logits over 25 discrete treatment actions.

Model Details

Algorithm: discrete-action Implicit Q-Learning (IQL)
Selected configuration: iql_sofa_shaped_conservative_safe
Reward variant: SOFA-shaped offline reward
State dimension: 62
Action space: 25 actions, interpreted as 5 vasopressor bins x 5 IV fluid bins
Actor/value/critic hidden sizes: [256, 256]
Training epochs: 200
Final checkpoint step: 110000
Source project: mimic-sepsis-drl (GitHub, commit d3f1fc2)

Files

model.safetensors: Safetensors checkpoint containing the actor policy network weights.
config.json: architecture and training metadata for repository consumers.
modeling_iql.py: minimal PyTorch inference wrapper for loading the actor and decoding action IDs. Loads from model.safetensors by default; falls back to .bin.
requirements.txt: runtime dependency pins needed for local inference.
LICENSE.md: license and access caveats.

Intended Use

⚠️ RESEARCH ONLY — NOT FOR CLINICAL USE

This model is not a clinical decision support system. Do not use it for diagnosis, treatment, triage, live patient care, or any automated medical decision. It is a research artifact for retrospective offline reinforcement learning studies only.

Use this artifact for retrospective offline RL research, reproducibility checks, model-card review, and benchmark comparisons under the same preprocessing and cohort definition used during training.

Out-of-scope uses include clinical decision support, direct deployment, patient-level recommendations, commercial medical products, and evaluations that mix this checkpoint with a different state representation without revalidation.

Loading The Policy

from modeling_iql import decode_action, load_policy

# Loads from model.safetensors by default (recommended)
policy = load_policy("model.safetensors")
state = [0.0] * 62  # Replace with a normalized 62-feature state from the project pipeline.
action_id = policy.select_action(state)
print(action_id, decode_action(action_id))

For batch inference, call the model directly with a torch.Tensor of shape (batch_size, 62) and take argmax(dim=-1) over the returned logits.

Note: This model uses a custom PyTorch wrapper (modeling_iql.py) and is not compatible with Hugging Face transformers from_pretrained() auto-loading. Use the provided wrapper for inference.

Training Data

The policy was trained from a derived MIMIC-IV Sepsis-3 ICU cohort. The model repository does not include raw patient data. Reproducing the dataset requires authorized MIMIC-IV access through PhysioNet and the preprocessing pipeline in the source project.

The source project describes the expected raw data location, cohort extraction, onset assignment, episode grid construction, train/validation/test split, feature building, action binning, reward generation, and replay dataset creation.

Source repository: https://github.com/furkan-uzmez/mimic-sepsis-drl (commit d3f1fc2)

Evaluation

Final repeated-seed bootstrap summary from the local evaluation artifacts:

Metric	Value
FQE mean	2.8478
WIS mean	8.2032
WIS 95% CI lower mean	4.9631
WIS 95% CI upper mean	10.8173
ESS mean	29.4097
Clinician agreement mean	0.4137
Support mass mean	0.9905
Low-support action rate mean	0.0095
Severe safety flags mean	0.0000

Evaluation was performed offline on held-out replay data using FQE/WIS/support and safety diagnostics.

⚠️ These results are offline estimates only. No prospective validation has been performed. They are not evidence of clinical safety or efficacy.

Limitations And Risks

Offline RL estimates can be biased by dataset support, confounding, cohort definition, reward design, and model selection choices.
The action space is discretized and does not capture all treatment context, contraindications, timing, or bedside judgment.
The model only supports the exact 62-feature normalized state representation used by the source pipeline.
MIMIC-IV access restrictions and clinical data governance requirements apply to reproduction and downstream use.
License compatibility for the derived model should be reviewed before publishing publicly.

Citation

No formal paper citation is available for this project. If you use this model, please cite the project repository and this model card:

Project repository: https://github.com/furkan-uzmez/mimic-sepsis-drl
Model card: https://huggingface.co/ryan12345441/mimic-sepsis-iql

License

This model is released under a research-only license. See LICENSE.md for details.

Key restrictions:

Research use only — no clinical, commercial, or diagnostic use
MIMIC-IV data use terms apply (PhysioNet credentialed access required)
No warranty; provided "as-is" for academic research

Maintainers

Prepared as a Hugging Face model repository from local training artifacts. Source: mimic-sepsis-drl (commit d3f1fc2).

Downloads last month: 17

Safetensors

Model size

88.3k params

Tensor type

F32

Video Preview

Reinforcement Learning

Evaluation results

FQE mean on MIMIC-IV Sepsis-3 ICU cohort
self-reported

2.848
WIS mean on MIMIC-IV Sepsis-3 ICU cohort
self-reported

8.203
Clinician agreement on MIMIC-IV Sepsis-3 ICU cohort
self-reported

0.414
Support mass on MIMIC-IV Sepsis-3 ICU cohort
self-reported

0.991