youngzhou12 commited on
Commit
c538add
·
verified ·
1 Parent(s): 02cce3d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -0
README.md CHANGED
@@ -138,6 +138,129 @@ Explore the dataset and runtime metrics of this model in timm [model results](ht
138
  |[ecaresnet269d.ra2_in1k](https://huggingface.co/timm/ecaresnet269d.ra2_in1k)|352 |84.96|97.22|102.1 |50.2 |101.2|291 |
139
  -->
140
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
141
  ## Citation
142
  ```bibtex
143
  @inproceedings{zhou2024benchx,
 
138
  |[ecaresnet269d.ra2_in1k](https://huggingface.co/timm/ecaresnet269d.ra2_in1k)|352 |84.96|97.22|102.1 |50.2 |101.2|291 |
139
  -->
140
 
141
+ # ConVIRT Checkpoint Model Card
142
+
143
+ ## Model Details
144
+ - **Model Type**: ConVIRT (Contrastive Learning of Medical Visual Representations from Paired Images and Text)
145
+ - **Architecture**: Dual-encoder architecture with ResNet-50 image encoder and BERT text encoder
146
+ - **Version**: 1.0.0
147
+ - **Last Updated**: November 2024
148
+ - **License**: MIT License
149
+ - **Primary Tasks**:
150
+ - Medical image-text representation learning
151
+ - Zero-shot medical image classification
152
+ - Medical image-text retrieval
153
+
154
+ ## Intended Use
155
+ - **Primary Use Cases**:
156
+ - Learning transferable medical visual representations
157
+ - Cross-modal medical image and text retrieval
158
+ - Medical image classification with limited labeled data
159
+ - Feature extraction for downstream medical imaging tasks
160
+ - **Out-of-Scope Uses**:
161
+ - Clinical decision making without human oversight
162
+ - Direct patient diagnosis
163
+ - Processing of non-medical images
164
+
165
+ ## Training Data
166
+ - **Dataset**: [Dataset details should be filled in]
167
+ - Number of image-text pairs: X
168
+ - Data source(s): e.g., MIMIC-CXR, Indiana Dataset
169
+ - Types of medical images: e.g., chest X-rays, CT scans
170
+ - Text data type: Associated radiology reports
171
+ - **Data Preprocessing**:
172
+ - Image resizing to 224x224
173
+ - Text cleaning and preprocessing
174
+ - Augmentations used: random crops, color jittering, horizontal flips
175
+
176
+ ## Performance and Limitations
177
+ ### Performance Metrics
178
+ - **Image-Text Retrieval**:
179
+ - R@1: X%
180
+ - R@5: X%
181
+ - R@10: X%
182
+ - **Transfer Learning Performance**:
183
+ - Classification accuracy on downstream tasks: X%
184
+ - Few-shot learning performance: X%
185
+
186
+ ### Limitations
187
+ - Limited to 2D medical imaging modalities
188
+ - Performance may vary across different medical specialties
189
+ - May exhibit biases present in training data
190
+ - Requires high-quality text descriptions for optimal performance
191
+
192
+ ## Ethical Considerations
193
+ - **Privacy**: Model trained on de-identified medical data
194
+ - **Bias**:
195
+ - Potential demographic biases from training data
196
+ - Geographic and institutional biases
197
+ - **Safety**:
198
+ - Not intended for standalone clinical use
199
+ - Should be used as a supportive tool only
200
+
201
+ ## Technical Specifications
202
+ ### Requirements
203
+ - Python ≥ 3.7
204
+ - PyTorch ≥ 1.7
205
+ - CUDA compatible GPU (≥ 11GB VRAM)
206
+ - Transformers library ≥ 4.0
207
+
208
+ ### Model Architecture Details
209
+ - **Image Encoder**:
210
+ - ResNet-50 backbone
211
+ - Output dimension: 512
212
+ - **Text Encoder**:
213
+ - BERT-base-uncased
214
+ - Output dimension: 512
215
+ - **Training Parameters**:
216
+ - Batch size: 256
217
+ - Learning rate: 1e-4
218
+ - Temperature parameter: 0.1
219
+ - Training epochs: X
220
+
221
+ ### Input Requirements
222
+ - **Images**:
223
+ - Resolution: 224x224 pixels
224
+ - Format: RGB
225
+ - Supported types: DICOM, PNG, JPEG
226
+ - **Text**:
227
+ - Maximum length: 512 tokens
228
+ - Language: English
229
+
230
+ ## Citation
231
+ ```bibtex
232
+ @article{zhang2020contrastive,
233
+ title={Contrastive Learning of Medical Visual Representations from Paired Images and Text},
234
+ author={Zhang, Yuhao and Jiang, Hang and Miura, Yasuhide and Manning, Christopher D and Langlotz, Curtis P},
235
+ journal={arXiv preprint arXiv:2010.00747},
236
+ year={2020}
237
+ }
238
+ ```
239
+
240
+ ## Maintainers
241
+ [Your organization/team information]
242
+
243
+ ## Updates and Versions
244
+ - v1.0.0 (Current):
245
+ - Initial release
246
+ - Base model trained on [dataset]
247
+ - Performance benchmarks established
248
+
249
+ ## Getting Started
250
+ ```python
251
+ from convirt import ConVIRT
252
+
253
+ # Load the model
254
+ model = ConVIRT.from_pretrained('path/to/checkpoint')
255
+
256
+ # Extract features
257
+ image_features = model.encode_image(image)
258
+ text_features = model.encode_text(text)
259
+
260
+ # Compute similarity
261
+ similarity = model.compute_similarity(image_features, text_features)
262
+ ```
263
+
264
  ## Citation
265
  ```bibtex
266
  @inproceedings{zhou2024benchx,