HuBERT-ECG as a self-supervised foundation model for broad and scalable cardiac application

Original code at (https://github.com/Edoar-do/HuBERT-ECG)

License: CC BY-NC 4.0

Abstract

The electrocardiogram (ECG) is a widely accessible tool for cardiovascular assessment, and thegrowing availability of ECG datasets has enabled the emergence of ECG foundation models. However, such foundation models often lack extensive evaluation across clinically heterogeneousdownstream tasks extending beyond conventional rhythm and conduction analysis. We present HuBERT-ECG, a self-supervised foundation ECG model pre-trained on 9.1 million 12-lead ECGsfrom four countries and diverse patient populations, and evaluated through fine-tuning on 21 independent datasets spanning more than 1.6k diagnostic and prognostic targets across adults and paediatric cohorts, including single-lead settings. These tasks cover conditions for which the ECG is the primary diagnostic modality, provides supportive but non-definitive diagnostic information, or enables acute-care prediction and prognostic modelling. Available in three model sizes to characterise scaling behaviour and support diverse computational constraints, HuBERT-ECG achieves AUROC ranging from 84% to 99% on ECG-primary diagnostic tasks, 76% to 97% on supportive diagnostictasks, 74% to 91% on prognostic prediction tasks, and 88% to 92% on single-lead ECG benchmarks. Moreover, a large-scale multitask fine-tuning across 2.4 million subjects and 164 tasks simultaneously shows that AUROC further increases for clinically relevant tasks without extra task-specific supervision. We release pretrained models and code as building baselines.

Models

This repository contains HuBERT-ECG LARGE fine-tuned on Cardio-Learning for a more disease-oriented baseline to futher fine-tune.

Cardio-Learning is the name we gave to the label-harmonized union of several 12-lead ECG datasets including PTB, PTB-XL, CPSC, CPSC-Extra, Georgia, Chapman, Ningbo, SPH, CODE, SaMi-Trop, Hefei. This dataset, counting 2.4 million ECGs from millions of patients in 4 countries, encompasses 164 different potentially overlapping heart-related conditions (multi-label settings) for which the ECG is either the primary or a supportive diagnostic tool, or is used to estimate the risk of future adverse cardiovascular events.

Usage

Input signals must be pre-processed first. Read Methods - Data and Preprocessing section

To download and use the model you can use the HuggingFace AutoModel API:

from transformers import AutoModel

model = AutoModel.from_pretrained("Edoardo-BS/HuBERT-ECG-SFT-CardioLearning-large")

or the traditional .pt model checkpoints:

from hubert_ecg import HuBERTECGForClassification

# Load CardioLearning finetuned model
model = HuBERTECGForClassification.from_pretrained_legacy(
    "path/to/hubert_ecg_large_cardiolearning.pt"
)

# Ready to use for inference or further finetuning

πŸ“š Citation

If you use our models or find our work useful, please consider citing us:

https://doi.org/10.1101/2024.11.14.24317328
Downloads last month
51
Safetensors
Model size
0.2B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support