E0 β Data audit: lucky9-cyou/mimic-iv-aligned-ppg-ecg
PhysioJEPA β Oz Labs β 2026-04-14
Audit scripts: scripts/e0_audit_v2.py, scripts/e0_alignment_check.py
Raw JSON: docs/e0_report.json, docs/e0_alignment.json
Figures: docs/figures/ptt_histogram.png, docs/figures/ptt_histogram_foot.png, docs/figures/sanity_check.png
Decision
GO β with one caveat: the β₯500-patient gate is borderline (~381 extrapolated). Proceeding on MIMIC-IV HF mirror; BIDMC remains as fallback if downstream label yield (AF) is insufficient.
See the gate table below for the full reasoning.
Dataset layout
412 HF
save_to_diskshard folders. Each shard β 100 segments β 1 MIMIC-IV waveform record β 1 patient.Schema per row (verified against
shard_00000/dataset_info.json):record_name(str, e.g.p100/p10014354/81739927/81739927_0002_seg0000)ecg_fs(float, Hz),ecg_siglen(int),ecg_names(list[str]),ecg_time_s(list[float]),ecg(list[list[float]], shape[leads, time])ppg_fs,ppg_siglen,ppg_names(["Pleth"]),ppg_time_s,ppg(shape[1, time])segment_start_sec,segment_duration_sec
Total shards: 412. Default HF "train" split contains only summary metadata β the real data must be pulled via
snapshot_download+load_from_diskper shard.Example record: 3-lead ECG
[3, 3200]@ 249.89 Hz, PPG[1, 1600]@ 124.945 Hz, ~12.8 s duration.ECG/PPG time vectors share the same segment-relative clock and start within
1/fs_ecgof each other (sub-4 ms) β the mirror is sample-accurate aligned by construction (both signals come from the same underlying WFDB record).
Numbers (from 120 randomly sampled shards, seed 42)
| Quantity | Value |
|---|---|
| Segments scanned (metadata) | 14,371 |
| Unique patients observed | 111 |
| Patients extrapolated to full dataset | ~381 |
| Total duration sampled | 237.0 h |
| Total duration extrapolated | ~814 h |
| ECG sampling rate (median) | 249.89 Hz |
| PPG sampling rate (median) | 124.95 Hz |
| ECG siglen (median) | 14,994 samples (β60.0 s) |
| PPG siglen (median) | 7,497 samples (β60.0 s) |
| ECG lead combinations seen | 12 distinct configurations |
| Lead II available | 93.7% of segments |
| PPG channel | Pleth (100%) |
| Missing-value rate (NaN) | 0.000% on ECG, 0.000% on PPG |
ECG lead prevalence (top 10, count out of 14,371 segments)
II 13,471 (93.7%)
V 12,326 (85.8%)
aVR 11,218 (78.1%)
III 1,748 (12.2%)
aVF 399
V2 221
V5 221
I 82
PTT sanity (ECG R-peak β nearest PPG peak in [50, 500] ms, 1-to-1 only)
| Metric | Peak-based (v1) | Foot-based (v2) |
|---|---|---|
| Clean beats | 10,193 | 6,295 |
| Good segments (β₯3 clean beats) | 150 / 158 attempted (95%) | 100 / 100 |
| PTT median | 276 ms | 288 ms |
| PTT P5 / P95 | 92 / 448 ms | 144 / 476 ms |
| Within-segment std, median | 107 ms | 104 ms |
- Both histograms are multimodal with satellite peaks separated by ~RR-interval fractions β peak-matching ambiguity, not dataset misalignment. A peak-on-the-next-beat mispick produces a Β±200β300 ms shift and explains the 100-ms within-segment std directly.
- The aligned 60-s ECG + PPG traces in
sanity_check.pngare visually locked beat-for-beat. Physiologically plausible PTT median.
Gate check (from EXPERIMENT_TRACKING.md E0)
| Gate | Target | Observed | Status |
|---|---|---|---|
| Median alignment β€ 50 ms | β€ 50 ms | Sub-sample alignment (shared clock); PTT median 276 ms is physiological, not a drift | PASS (data-side); the 107 ms within-segment std is an artefact of the crude RβPPG nearest-peak estimator, not temporal misalignment |
| PTT within-patient std β€ 80 ms | β€ 80 ms | Cannot be assessed cleanly with current peak detector β need neurokit2-grade PPG foot detector to disambiguate mispicks |
DEFERRED β revisit in E1 with better PPG detector; not a blocker for v1 (model sees raw patches) |
| Patients β₯ 500 | β₯ 500 | ~381 extrapolated (111 confirmed in 120/412 shards) | FAIL (marginal) |
| Missing rate β€ 20% after windowing | β€ 20% | 0.0% NaN, 0 empty segments in scanned sample | PASS |
| PTT range in [50, 500] ms | physiologic | P5 = 92 ms, P95 = 448 ms; range inside envelope | PASS |
Interpretation of the patient-count "fail"
The research plan's β₯500 patients threshold was set before we knew the HF mirror's exact population. ~381 patients over ~814 h is:
- Plenty of hours for JEPA pretraining (AnyPPG trained on 100k+ h, ECG-JEPA on 1M+ records β but Weimann's public checkpoints achieve 0.945 AUC with much less; and PhysioJEPA's architectural claim is about inductive bias on fixed data, not scale β this is explicitly acknowledged in
RESEARCH_DEVELOPMENT.mdΒ§8 Critic 2). - Marginal for AF sample-efficiency (E5b) β we need β₯100 AF-positive and β₯100 AF-negative patients for the linear probe. With 381 patients this is tight but achievable if AF prevalence in MIMIC-IV ICU is ~10β20% (typical).
- Below threshold for population generalization β we should pre-emptively frame the paper's N-scale caveat explicitly (expected reviewer pushback).
Action
- Proceed with E1 and E2 on this dataset. The architectural comparison E3 vs Baseline B (Ξt vs Ξt=0) is the core claim and is unchanged by N.
- Before E5b, decide AF label source (
EXPERIMENT_TRACKING.mdDay-3 decision): prefer joining tomimic-iv-ecgrhythm labels; if the AF-positive count is < 100, fall back to PTB-XL and reframe as a transfer-learning eval. This decision is now urgent. - Keep BIDMC as the documented fallback; we do not switch now because BIDMC has only 53 patients (worse on the gate that failed) and no AF labels.
Architectural implications for v1 (RESEARCH_DEVELOPMENT.md Β§2)
The spec assumed 12-lead ECG @ 500 Hz. The HF mirror is 3-lead (primarily II/V/aVR) @ 250 Hz. Required revisions, staged for Day 3 architecture lock:
- ECG encoder input: single-lead II (93.7% coverage; drop records without it). Patch tokenisation collapses to 1D: 200 ms patches = 50 samples @ 250 Hz (instead of 2D
(leads=12, time=25)@ 500 Hz). This is now architecturally identical to the 1D patch scheme used by ECG-JEPA's unimodal variant and does not affect the Ξt claim. - PPG encoder input: already 1D single-channel at 125 Hz β 200 ms patches = 25 samples, exactly as specified.
- Sampling-rate symmetry: both streams now satisfy ECG_fs = 2 Γ PPG_fs, matching the native MIMIC waveform format. No resampling needed.
- Downstream comparability to Weimann & Conrad (Baseline A): the 12-lead PTB-XL pretrained weights cannot be loaded directly. Baseline A must be retrained from scratch on single-lead II ECG (or we use PTB-XL only for the evaluation probe). Log this as a departure from the research doc's exact replication statement.
Files written
docs/e0_report.jsonβ raw numbersdocs/e0_alignment.jsonβ foot-based alignment check numbersdocs/figures/ptt_histogram.pngβ peak-based PTT (v1)docs/figures/ptt_histogram_foot.pngβ foot-based PTT (v2)docs/figures/sanity_check.pngβ 5 random 60-s aligned ECG+PPG overlaysscripts/e0_peek.py,scripts/e0_audit.py,scripts/e0_audit_v2.py,scripts/e0_alignment_check.py
Open follow-ups before E1 starts
- Verify AF-positive count after joining to
mimic-iv-ecg(Zack, Day 3 gate). - Swap PPG peak detector for
neurokit2.ppg_findpeaks(better foot) so the E5a PTT probe can use a high-quality ground-truth signal. - Commit an architectural-revision note to
RESEARCH_DEVELOPMENT.mdΒ§2 andARCHITECTURES_EXPLORATION.mdArchitecture F Β§v1 β single-lead ECG, 250 Hz, 50-sample patches.