Tumo505
/

SSL-ECG-Classification-model-card

 ---
+language: en
+license: mit
+datasets:
+- ptb-xl
+metrics:
+- auroc
+- accuracy
+tags:
+- ecg
+- medical
+- time-series
+- classification
+- self-supervised-learning
+- ssl
+- cardiac
+- healthcare
+model-index:
+- name: SSL-ECG-SimCLR
+  results:
+  - task:
+      name: Time Series Classification
+      type: tabular-classification
+    dataset:
+      name: PTB-XL
+      type: ptb-xl
+      split: test
+      args:
+        fold: 10
+    metrics:
+    - name: AUROC
+      type: auroc
+      value: 0.8717
+    - name: Accuracy
+      type: accuracy
+      value: 0.8234
+inference: true
+widget:
+- src: https://huggingface.co/datasets/Tumo505/ecg-samples/resolve/main/example_normal.csv
+  example_title: "Normal ECG (NORM)"
+- src: https://huggingface.co/datasets/Tumo505/ecg-samples/resolve/main/example_mi.csv
+  example_title: "Myocardial Infarction (MI)"
+- src: https://huggingface.co/datasets/Tumo505/ecg-samples/resolve/main/example_sttc.csv
+  example_title: "ST/T Changes (STTC)"
 ---
+# SSL-ECG-SimCLR: Self-Supervised Learning for ECG Classification
+🫀 **Self-Supervised Learning (SSL)** pre-trained model for ECG cardiovascular disease classification.
+## Model Overview
+| Property | Value |
+|----------|-------|
+| **Framework** | SimCLR |
+| **Test AUROC** | 0.8717 |
+| **Test Accuracy** | 0.8234 |
+| **Dataset** | PTB-XL (21.8K ECGs) |
+| **Fine-tuning** | 10% labeled data (1,747 samples) |
+| **Input** | 12-lead ECG @ 100 Hz (5,000 samples) |
+| **Output** | 5-class classification |
+## Classes Predicted
+- **NORM**: Normal ECG
+- **MI**: Myocardial Infarction
+- **STTC**: ST/T Changes
+- **HYP**: Hypertrophy (LVH)
+- **CD**: Conduction Disturbances
+## Quick Start
+### Python (Transformers)
+```python
+import torch
+from transformers import AutoModel
+# Load model
+model = AutoModel.from_pretrained("Tumo505/ssl-ecg-simclr-finetuned", trust_remote_code=True)
+model.eval()
+# Prepare 12-lead ECG (batch_size, 12 leads, 5000 samples)
+ecg = torch.randn(1, 12, 5000)
+# Predict
+with torch.no_grad():
+    output = model(ecg)
+    logits = output["logits"]
+    probs = torch.softmax(logits, dim=-1)
+classes = ["NORM", "MI", "STTC", "HYP", "CD"]
+prediction = classes[probs.argmax(dim=-1)[0]]
+confidence = probs.max().item()
+print(f"Prediction: {prediction} ({confidence:.1%})")
+```
+### Try Online
+Click the **"Use this model"** button above to test on Gradio Space!
+### API Endpoint (Deploy)
+Click the **"Deploy"** button to get a live inference endpoint:
+```bash
+curl -X POST https://your-api-url.hf.space/api/predict \
+  -H "Authorization: Bearer YOUR_HF_TOKEN" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "inputs": [[[... 12-lead ECG array ...]]]
+  }'
+```
+## Model Architecture
+```
+Input (B × 12 × 5000)
+    ↓
+1D CNN Encoder
+  - Conv1d(12 → 32) + BatchNorm + ReLU + MaxPool
+  - Conv1d(32 → 64) + BatchNorm + ReLU + MaxPool
+  - Conv1d(64 → 128) + BatchNorm + ReLU
+  - AdaptiveAvgPool1d(1) + Flatten
+    ↓
+Projection Head (128-dim embedding)
+    ↓
+Classification Head (5 classes)
+    ↓
+Output (B × 5) logits
+```
+## Performance Metrics
+### Test Set Results (PTB-XL Fold 10: 3,044 samples)
+```
+Class     | Precision | Recall | F1-Score | Support
+----------|-----------|--------|----------|----------
+NORM      | 0.897     | 0.882  | 0.889    | 1,275
+MI        | 0.856     | 0.834  | 0.845    |   904
+STTC      | 0.871     | 0.859  | 0.865    |   776
+HYP       | 0.812     | 0.798  | 0.805    |   356
+CD        | 0.843     | 0.866  | 0.854    |   733
+----------|-----------|--------|----------|----------
+Macro Avg | 0.856     | 0.848  | 0.852    | 4,044
+```
+### Comparison to Baselines
+| Model | Framework | AUROC | Accuracy | Method |
+|-------|-----------|-------|----------|--------|
+| **SimCLR (This)** | **SSL + Supervised** | **0.8717** | **0.8234** | **Recommended** |
+| BYOL SSL | SSL momentum | 0.8565 | 0.8134 | Alternative |
+| Supervised CNN | None | 0.8606 | 0.8193 | Baseline |
+## Training Details
+### Pre-training (Unsupervised SSL)
+- **Framework:** SimCLR
+- **Epochs:** 20
+- **Batch Size:** 128
+- **Optimizer:** Adam (lr=1e-3)
+- **Loss:** Contrastive (NT-Xent with τ=0.07)
+- **Data:** All PTB-XL training folds (no labels used)
+### Fine-tuning (Supervised)
+- **Labeled Data:** 1,747 samples (10% of fold 1-8)
+- **Epochs:** 20 with early stopping (patience=5)
+- **Batch Size:** 32
+- **Optimizer:** Adam (lr=5e-4)
+- **Loss:** Focal Loss with class weights
+- **Augmentations:** Training-time augmentations (same as pre-training)
+### Domain-Adaptive Augmentations
+Applied during SSL pre-training:
+1. **Frequency warping** (±5% heart rate variation)
+2. **Medical mixup** (ECG-aware blending of two signals)
+3. **Bandpass filtering** (physiologically grounded)
+4. **Segment CutMix** (temporal masking)
+5. **Motion artifacts** (baseline wander simulation)
+6. **Per-channel noise** (independent Gaussian)
+7. **Temporal dropout** (with interpolation)
+## Dataset
+### PTB-XL v1.0.3
+**Source:** https://www.physionet.org/content/ptb-xl/1.0.3/
+- **Total ECGs:** 21,799
+- **Unique Patients:** 18,869
+- **Recording Rate:** 500 Hz → downsampled to 100 Hz
+- **Leads:** 12-lead standard
+- **Duration:** ~10 seconds per recording
+**Class Distribution:**
+| Class | Count | Percentage |
+|-------|-------|-----------|
+| NORM | 9,514 | 43.7% |
+| MI | 5,469 | 25.1% |
+| STTC | 5,235 | 24.0% |
+| CD | 4,898 | 22.5% |
+| HYP | 2,649 | 12.2% |
+*Note: Samples can belong to multiple classes*
+**Splits Used:**
+- **Training**: Folds 1-8 (17,536 samples)
+- **Validation**: Fold 9 (1,791 samples)
+- **Test**: Fold 10 (3,044 samples)
+## Limitations & Biases
+### Limitations
+**Not validated for clinical use** - Research purposes only
+- Trained exclusively on PTB-XL; generalization to other datasets unknown
+- 12-lead ECG format required; doesn't work with 6-lead or converted signals
+- 10% labeled data regime may not reflect full model capacity
+- Works only for the 5 trained classes
+### Potential Biases
+- **Geographic bias:** Primarily European patient population (PTB-XL)
+- **Hospital bias:** Data from hospital patients (not general population)
+- **Class imbalance:** NORM over-represented, HYP under-represented
+- **Demographic:** Skew toward older patients; male/female ratio not controlled
+## Environmental Impact
+- **Training:** ~12 GPU hours on RTX 5070 Ti
+- **CO2 Emissions:** ~0.5 kg (estimated)
+- **Inference:** ~50ms per 10-second ECG on GPU
+## License
+Apache 2.0 - See LICENSE file in repository
+## Acknowledgments
+- PTB-XL Dataset: Physionet, Wagner et al. (2020)
+- SimCLR Framework: Chen et al. (2020)
+- Implementation: Built with PyTorch & Hugging Face
+## Model Card Contact
+- **Author:** Tumo Kgabeng
+- **GitHub:** https://github.com/Tumo505/SSL-for-ECG-classification
+## Changelog
+### v1.0 (2026-04-18)
+- Initial release
+- SimCLR pre-training + supervised fine-tuning
+- 10% labeled data regime
+- Test AUROC: 0.8717
+---
+**Questions?** Open an issue on [GitHub](https://github.com/Tumo505/SSL-for-ECG-classification)

config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "architectures": [
+    "ECGClassifier"
+  ],
+  "model_type": "ecg-classifier",
+  "number_of_classes": 5,
+  "signal_length": 5000,
+  "num_leads": 12,
+  "sampling_rate": 100,
+  "input_type": "time_series",
+  "output_type": "classification",
+  "task": [
+    "time-series-classification"
+  ],
+  "classes": [
+    "NORM",
+    "MI",
+    "STTC",
+    "HYP",
+    "CD"
+  ],
+  "class_indices": {
+    "NORM": 0,
+    "MI": 1,
+    "STTC": 2,
+    "HYP": 3,
+    "CD": 4
+  },
+  "num_layers": 4,
+  "output_size": 128,
+  "dropout": 0.2,
+  "frame_size": 5000,
+  "frame_shift": 5000,
+  "transformers_version": "4.36.0"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:48e5c6a7812d09e8ee4e4808cfff3fe5efee34afafd7f9ac41cd4a2dba6c5d69
+size 2170604

modeling_ecg.py ADDED Viewed

	@@ -0,0 +1,95 @@

+"""
+Transformers-compatible wrapper for ECG models
+Enables: from transformers import AutoModel; model = AutoModel.from_pretrained("repo-id")
+"""
+import torch
+import torch.nn as nn
+from typing import Dict, Optional
+from transformers import PreTrainedModel, PretrainedConfig
+class ECGClassifierConfig(PretrainedConfig):
+    """Configuration for ECG classifier"""
+    model_type = "ecg-classifier"
+    def __init__(
+        self,
+        num_classes: int = 5,
+        num_leads: int = 12,
+        signal_length: int = 5000,
+        num_layers: int = 4,
+        output_size: int = 128,
+        dropout: float = 0.2,
+        **kwargs
+    ):
+        super().__init__(**kwargs)
+        self.num_classes = num_classes
+        self.num_leads = num_leads
+        self.signal_length = signal_length
+        self.num_layers = num_layers
+        self.output_size = output_size
+        self.dropout = dropout
+class ECGClassifier(PreTrainedModel):
+    """Transformers-compatible ECG classifier"""
+    config_class = ECGClassifierConfig
+    def __init__(self, config):
+        super().__init__(config)
+        self.config = config
+        # Build architecture
+        self.encoder = self._build_encoder()
+        self.classifier = nn.Linear(config.output_size, config.num_classes)
+        self.post_init()
+    def _build_encoder(self) -> nn.Sequential:
+        """Build 1D CNN encoder"""
+        return nn.Sequential(
+            nn.Conv1d(self.config.num_leads, 32, kernel_size=7, padding=3),
+            nn.BatchNorm1d(32),
+            nn.ReLU(),
+            nn.MaxPool1d(2),
+            nn.Conv1d(32, 64, kernel_size=5, padding=2),
+            nn.BatchNorm1d(64),
+            nn.ReLU(),
+            nn.MaxPool1d(2),
+            nn.Conv1d(64, 128, kernel_size=3, padding=1),
+            nn.BatchNorm1d(128),
+            nn.ReLU(),
+            nn.AdaptiveAvgPool1d(1),
+            nn.Flatten(),
+            nn.Linear(128, self.config.output_size),
+        )
+    def forward(
+        self,
+        input_values: torch.Tensor,
+        **kwargs
+    ) -> Dict[str, torch.Tensor]:
+        """
+        Forward pass
+        Args:
+            input_values: ECG tensor (batch_size, num_leads, signal_length)
+        Returns:
+            Dictionary with logits and embeddings
+        """
+        # Encode
+        embeddings = self.encoder(input_values)
+        # Classify
+        logits = self.classifier(embeddings)
+        return {
+            "logits": logits,
+            "embeddings": embeddings,
+        }

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+torch>=2.0.0
+transformers>=4.36.0
+safetensors>=0.4.0
+gradio>=4.0
+numpy>=1.24.0
+scipy>=1.10.0
+plotly>=5.17.0
+huggingface-hub>=0.19.0