MIMIC Sepsis CQL

This repository contains the final selected Conservative Q-Learning (CQL) checkpoint from an academic offline reinforcement learning study on a MIMIC-IV v3.1 Sepsis-3 ICU cohort. The model card cites the canonical MIMIC-IV, PhysioNet, Sepsis-3, healthcare RL, CQL, and off-policy evaluation references listed below.

Clinical safety warning: This model is a retrospective research artifact. It is not a clinical decision support system and must not be used for patient care.

Repository Contents

config.json                             Root Hub query/config file for model metadata and download statistics
model/cql_epoch0200_step0007000.pt      Final selected CQL checkpoint
configs/cql.yaml                        Training configuration
configs/runtime.mps.yaml                Apple Silicon/MPS runtime configuration
evaluation/stage1_evaluation.json       Stage 1 validation screening results
evaluation/stage2_evaluation.json       Stage 2 validation evaluation results
evaluation/stage2_multiseed_validation.json  Multi-seed validation summary
evaluation/final_test_evaluation.json   Final held-out test evaluation

Intended Use

The checkpoint is provided for reproducibility and academic inspection of the reported offline RL experiment. It is intended for:

reproducing reported CQL evaluation artifacts,
inspecting model weights and training configuration,
comparing validation-only selection and final held-out OPE reporting workflows.

It is not intended for clinical deployment, direct treatment recommendation, or real-time medical decision support.

Dataset

The study uses MIMIC-IV v3.1 data accessed through PhysioNet. Raw MIMIC-IV records and derived replay buffers are not included here because access is credentialed and subject to PhysioNet data use requirements.

Model Details

Field	Value
Algorithm	Conservative Q-Learning
State dimension	62
Action space	25 discrete actions
Decision interval	4 hours
Reward variant	sparse
Learning rate	1e-4
CQL alpha	0.05
Seed	1024
Epoch	200

Evaluation

The final checkpoint was selected using validation results only. The held-out test split was used once after final selection.

Metric	Value
FQE mean	15.689874
FQE 95% CI	[15.616595, 15.755585]
WIS mean	10.018438
WIS 95% CI	[4.121083, 12.658275]
ESS	10.408948
Test episodes	2585
Bootstrap resamples	1000

Loading the Checkpoint

The checkpoint is a PyTorch .pt artifact generated by the project training code. Use the GitHub repository code and configuration files to reconstruct the model architecture before loading weights.

import torch

checkpoint = torch.load("model/cql_epoch0200_step0007000.pt", map_location="cpu")
print(checkpoint.keys() if isinstance(checkpoint, dict) else type(checkpoint))

GitHub project: https://github.com/EnesDemir143/mimic-sepsis-drl

Download Statistics

Hugging Face model download counts are based on server-side requests to library/query files such as config.json, config.yaml, hyperparams.yaml, params.json, or library-specific weight patterns. This repository includes a root config.json so future Hub GET/HEAD requests have a standard query file for download counting. Counts are not retroactive and may update with delay.

Limitations

Retrospective observational EHR data cannot establish prospective clinical benefit.
Offline policy evaluation is sensitive to support mismatch and modeling assumptions.
MIMIC-IV access restrictions prevent bundling the replay dataset with the model.
This artifact has not been prospectively or clinically validated.

Author

Enes Demir — 230202066
Kocaeli University, Department of Computer Engineering

Citation

Please cite MIMIC-IV, PhysioNet, Conservative Q-Learning, and the project GitHub repository when using this artifact.

Dataset Citation

For MIMIC-IV v3.1, cite the PhysioNet dataset record as:

@article{PhysioNet-mimiciv-3.1,
  author = {Johnson, Alistair and Bulgarelli, Lucas and Pollard, Tom and Gow, Brian and Moody, Benjamin and Horng, Steven and Celi, Leo Anthony and Mark, Roger},
  title = {{MIMIC-IV}},
  journal = {{PhysioNet}},
  year = {2024},
  month = oct,
  note = {Version 3.1},
  doi = {10.13026/kpb9-mt58},
  url = {https://doi.org/10.13026/kpb9-mt58}
}

References

Johnson et al., MIMIC-IV, a freely accessible electronic health record dataset, Scientific Data, 2023. DOI: 10.1038/s41597-022-01899-x.
Johnson et al., MIMIC-IV (version 3.1), PhysioNet, 2024. DOI: 10.13026/kpb9-mt58.
Goldberger et al., PhysioBank, PhysioToolkit, and PhysioNet, Circulation, 2000. DOI: 10.1161/01.CIR.101.23.e215.
Singer et al., The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3), JAMA, 2016. DOI: 10.1001/jama.2016.0287.
Gottesman et al., Guidelines for reinforcement learning in healthcare, Nature Medicine, 2019. DOI: 10.1038/s41591-018-0310-5.
Kumar et al., Conservative Q-Learning for Offline Reinforcement Learning, NeurIPS, 2020. DOI: 10.48550/arXiv.2006.04779.
Levine et al., Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems, arXiv, 2020. DOI: 10.48550/arXiv.2005.01643.
Thomas and Brunskill, Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning, ICML, 2016. DOI: 10.48550/arXiv.1604.00923.

Downloads last month: 18

Video Preview

Reinforcement Learning