MIMIC Sepsis IQL Offline RL Policy

This repository contains a research-only Implicit Q-Learning (IQL) policy checkpoint trained on a Sepsis-3 MIMIC-IV ICU cohort represented as 4-hour MDP steps. The model maps a 62-dimensional normalized clinical state vector to logits over 25 discrete treatment actions.

Model Details

  • Algorithm: discrete-action Implicit Q-Learning (IQL)
  • Selected configuration: iql_sofa_shaped_conservative_safe
  • Reward variant: SOFA-shaped offline reward
  • State dimension: 62
  • Action space: 25 actions, interpreted as 5 vasopressor bins x 5 IV fluid bins
  • Actor/value/critic hidden sizes: [256, 256]
  • Training epochs: 200
  • Final checkpoint step: 110000
  • Source project: mimic-sepsis-drl (GitHub, commit d3f1fc2)

Files

  • model.safetensors: Safetensors checkpoint containing the actor policy network weights.
  • config.json: architecture and training metadata for repository consumers.
  • modeling_iql.py: minimal PyTorch inference wrapper for loading the actor and decoding action IDs. Loads from model.safetensors by default; falls back to .bin.
  • requirements.txt: runtime dependency pins needed for local inference.
  • LICENSE.md: license and access caveats.

Intended Use

⚠️ RESEARCH ONLY — NOT FOR CLINICAL USE

This model is not a clinical decision support system. Do not use it for diagnosis, treatment, triage, live patient care, or any automated medical decision. It is a research artifact for retrospective offline reinforcement learning studies only.

Use this artifact for retrospective offline RL research, reproducibility checks, model-card review, and benchmark comparisons under the same preprocessing and cohort definition used during training.

Out-of-scope uses include clinical decision support, direct deployment, patient-level recommendations, commercial medical products, and evaluations that mix this checkpoint with a different state representation without revalidation.

Loading The Policy

from modeling_iql import decode_action, load_policy

# Loads from model.safetensors by default (recommended)
policy = load_policy("model.safetensors")
state = [0.0] * 62  # Replace with a normalized 62-feature state from the project pipeline.
action_id = policy.select_action(state)
print(action_id, decode_action(action_id))

For batch inference, call the model directly with a torch.Tensor of shape (batch_size, 62) and take argmax(dim=-1) over the returned logits.

Note: This model uses a custom PyTorch wrapper (modeling_iql.py) and is not compatible with Hugging Face transformers from_pretrained() auto-loading. Use the provided wrapper for inference.

Training Data

The policy was trained from a derived MIMIC-IV Sepsis-3 ICU cohort. The model repository does not include raw patient data. Reproducing the dataset requires authorized MIMIC-IV access through PhysioNet and the preprocessing pipeline in the source project.

The source project describes the expected raw data location, cohort extraction, onset assignment, episode grid construction, train/validation/test split, feature building, action binning, reward generation, and replay dataset creation.

Source repository: https://github.com/furkan-uzmez/mimic-sepsis-drl (commit d3f1fc2)

Evaluation

Final repeated-seed bootstrap summary from the local evaluation artifacts:

Metric Value
FQE mean 2.8478
WIS mean 8.2032
WIS 95% CI lower mean 4.9631
WIS 95% CI upper mean 10.8173
ESS mean 29.4097
Clinician agreement mean 0.4137
Support mass mean 0.9905
Low-support action rate mean 0.0095
Severe safety flags mean 0.0000

Evaluation was performed offline on held-out replay data using FQE/WIS/support and safety diagnostics.

⚠️ These results are offline estimates only. No prospective validation has been performed. They are not evidence of clinical safety or efficacy.

Limitations And Risks

  • Offline RL estimates can be biased by dataset support, confounding, cohort definition, reward design, and model selection choices.
  • The action space is discretized and does not capture all treatment context, contraindications, timing, or bedside judgment.
  • The model only supports the exact 62-feature normalized state representation used by the source pipeline.
  • MIMIC-IV access restrictions and clinical data governance requirements apply to reproduction and downstream use.
  • License compatibility for the derived model should be reviewed before publishing publicly.

Citation

No formal paper citation is available for this project. If you use this model, please cite the project repository and this model card:

License

This model is released under a research-only license. See LICENSE.md for details.

Key restrictions:

  • Research use only — no clinical, commercial, or diagnostic use
  • MIMIC-IV data use terms apply (PhysioNet credentialed access required)
  • No warranty; provided "as-is" for academic research

Maintainers

Prepared as a Hugging Face model repository from local training artifacts. Source: mimic-sepsis-drl (commit d3f1fc2).

Downloads last month
17
Safetensors
Model size
88.3k params
Tensor type
F32
·
Video Preview
loading

Evaluation results