Snarcy commited on
Commit
3e42381
·
verified ·
1 Parent(s): d6052e5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +155 -9
README.md CHANGED
@@ -1,9 +1,155 @@
1
- ---
2
- tags:
3
- - image-classification
4
- - timm
5
- - transformers
6
- library_name: timm
7
- license: apache-2.0
8
- ---
9
- # Model card for tbd_b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: timm
3
+ license: cc-by-4.0
4
+ pipeline_tag: image-feature-extraction
5
+ tags:
6
+ - radiology
7
+ - medical-imaging
8
+ - xray
9
+ - ct
10
+ - mri
11
+ - ultrasound
12
+ - foundation-model
13
+ - vision-transformer
14
+ - self-supervised
15
+ - dino
16
+ - dinov2
17
+
18
+ model-index:
19
+ - name: OmniRad-base
20
+ results:
21
+ - task:
22
+ type: image-feature-extraction
23
+ dataset:
24
+ name: RadImageNet
25
+ type: radimagenet
26
+ metrics:
27
+ - name: Representation learning
28
+ type: other
29
+ value: "Self-supervised pretrained encoder"
30
+ ---
31
+
32
+ # OmniRad: A General-Purpose Radiological Foundation Model
33
+ <!--
34
+ [📄 Paper](https://arxiv.org/abs/XXXX.XXXXX) |
35
+ -->
36
+ [💻 Code](https://github.com/unica-visual-intelligence-lab/OmniRad)
37
+
38
+ **OmniRad** is a **self-supervised radiological foundation model** designed to learn **stable, transferable, and task-agnostic visual representations** for medical imaging. It is pretrained on large-scale, heterogeneous radiological data and intended for reuse across **classification**, **segmentation**, and **exploratory vision–language** tasks without task-specific pretraining.
39
+
40
+ This repository provides the **OmniRad-base** variant, a compact Vision Transformer encoder that offers an excellent trade-off between computational efficiency and representational power.
41
+
42
+ ---
43
+
44
+ ## Key Features
45
+
46
+ - **Radiology-focused foundation model** pretrained on >1M radiological images
47
+ - **Self-supervised learning** based on a customized DINOv2 framework
48
+ - **Task-agnostic encoder** reusable across classification, segmentation, and multimodal pipelines
49
+ - **Strong transferability** across modalities (CT, MRI, X-ray, ultrasound)
50
+ - **Radiomics-oriented design**, emphasizing representation stability and reuse
51
+
52
+ ---
53
+
54
+
55
+ ## Example Usage: Feature Extraction
56
+
57
+ ```python
58
+ from PIL import Image
59
+ from torchvision import transforms
60
+ import timm
61
+ import torch
62
+
63
+ # Load OmniRad-base from Hugging Face Hub
64
+ model = timm.create_model(
65
+ "hf_hub:Snarcy/OmniRad-base",
66
+ pretrained=True,
67
+ num_classes=0 # return embeddings
68
+ )
69
+
70
+ model.eval()
71
+ device = "cuda" if torch.cuda.is_available() else "cpu"
72
+ model.to(device)
73
+
74
+ # Preprocessing
75
+ transform = transforms.Compose([
76
+ transforms.Resize((224, 224)),
77
+ transforms.ToTensor(),
78
+ transforms.Normalize(
79
+ mean=[0.485, 0.456, 0.406],
80
+ std=[0.229, 0.224, 0.225],
81
+ ),
82
+ ])
83
+
84
+ # Load image
85
+ image = Image.open("path/to/radiology_image.png").convert("RGB")
86
+ x = transform(image).unsqueeze(0).to(device)
87
+
88
+ # Extract features
89
+ with torch.no_grad():
90
+ embedding = model(x) # shape: [1, 384]
91
+
92
+
93
+ ```
94
+ ---
95
+
96
+ ## Available Downstream Code
97
+
98
+ The **official OmniRad repository** provides **end-to-end implementations** for all evaluated downstream tasks:
99
+
100
+ 👉 **https://github.com/unica-visual-intelligence-lab/OmniRad**
101
+
102
+ Including:
103
+ - **Image-level classification** (MedMNIST v2 benchmarks)
104
+ - **Dense medical image segmentation** (MedSegBench, frozen encoder + lightweight decoders)
105
+ - **Radiological image captioning** (BART-based vision–language framework)
106
+ - Full training, evaluation, and ablation scripts
107
+ - Reproducible experimental configurations matching the paper
108
+
109
+ ---
110
+ ## Model Details
111
+
112
+ - **Architecture:** Vision Transformer (ViT-B)
113
+ - **Patch size:** 14
114
+ - **Embedding dimension:** 768
115
+ - **Pretraining framework:** Modified DINOv2 (global crops only)
116
+ - **Pretraining dataset:** RadImageNet (~1.2M radiological images)
117
+ - **Input resolution:** 224 × 224
118
+ - **Backbone type:** Encoder-only (no task-specific heads)
119
+
120
+ ### Pretraining Notes
121
+
122
+ - Local crops are removed to improve training stability and downstream transferability
123
+ - No feature collapse observed during training
124
+ - Same hyperparameter configuration used across small and base variants
125
+ - Designed to support frozen-backbone adaptation and lightweight fine-tuning
126
+
127
+ ---
128
+
129
+
130
+ ## Intended Use
131
+
132
+ OmniRad is intended as a **general-purpose radiological image encoder** for:
133
+
134
+ - Image-level classification (e.g., disease or organ recognition)
135
+ - Dense prediction (e.g., medical image segmentation via adapters or decoders)
136
+ - Radiomics feature extraction
137
+ - Representation transfer across datasets, modalities, and institutions
138
+ - Exploratory vision–language research (e.g., radiological image captioning)
139
+
140
+ **Not intended for direct clinical deployment without task-specific validation.**
141
+
142
+ ---
143
+
144
+
145
+
146
+ ## License
147
+
148
+ This project and the released model weights are licensed under the Creative Commons
149
+ Attribution 4.0 International (CC BY 4.0) license.
150
+
151
+ <div align="center">
152
+
153
+ **Made with ❤️ by [UNICA Visual Intelligence Lab](https://github.com/unica-visual-intelligence-lab)**
154
+
155
+ </div>