youngzhou12
/

ConVIRT

PyTorch

Medical Vsion-Language Pre-Training

BenchX

Model card Files Files and versions Community

youngzhou12 commited on Nov 7, 2024

Commit

c538add

verified ·

1 Parent(s): 02cce3d

Update README.md

Browse files

Files changed (1) hide show

README.md +123 -0

README.md CHANGED Viewed

@@ -138,6 +138,129 @@ Explore the dataset and runtime metrics of this model in timm [model results](ht
 |[ecaresnet269d.ra2_in1k](https://huggingface.co/timm/ecaresnet269d.ra2_in1k)|352     |84.96|97.22|102.1      |50.2 |101.2|291    |
  -->
 ## Citation
 ```bibtex
 @inproceedings{zhou2024benchx,

 |[ecaresnet269d.ra2_in1k](https://huggingface.co/timm/ecaresnet269d.ra2_in1k)|352     |84.96|97.22|102.1      |50.2 |101.2|291    |
  -->
+ # ConVIRT Checkpoint Model Card
+## Model Details
+- **Model Type**: ConVIRT (Contrastive Learning of Medical Visual Representations from Paired Images and Text)
+- **Architecture**: Dual-encoder architecture with ResNet-50 image encoder and BERT text encoder
+- **Version**: 1.0.0
+- **Last Updated**: November 2024
+- **License**: MIT License
+- **Primary Tasks**:
+  - Medical image-text representation learning
+  - Zero-shot medical image classification
+  - Medical image-text retrieval
+## Intended Use
+- **Primary Use Cases**:
+  - Learning transferable medical visual representations
+  - Cross-modal medical image and text retrieval
+  - Medical image classification with limited labeled data
+  - Feature extraction for downstream medical imaging tasks
+- **Out-of-Scope Uses**:
+  - Clinical decision making without human oversight
+  - Direct patient diagnosis
+  - Processing of non-medical images
+## Training Data
+- **Dataset**: [Dataset details should be filled in]
+  - Number of image-text pairs: X
+  - Data source(s): e.g., MIMIC-CXR, Indiana Dataset
+  - Types of medical images: e.g., chest X-rays, CT scans
+  - Text data type: Associated radiology reports
+- **Data Preprocessing**:
+  - Image resizing to 224x224
+  - Text cleaning and preprocessing
+  - Augmentations used: random crops, color jittering, horizontal flips
+## Performance and Limitations
+### Performance Metrics
+- **Image-Text Retrieval**:
+  - R@1: X%
+  - R@5: X%
+  - R@10: X%
+- **Transfer Learning Performance**:
+  - Classification accuracy on downstream tasks: X%
+  - Few-shot learning performance: X%
+### Limitations
+- Limited to 2D medical imaging modalities
+- Performance may vary across different medical specialties
+- May exhibit biases present in training data
+- Requires high-quality text descriptions for optimal performance
+## Ethical Considerations
+- **Privacy**: Model trained on de-identified medical data
+- **Bias**:
+  - Potential demographic biases from training data
+  - Geographic and institutional biases
+- **Safety**:
+  - Not intended for standalone clinical use
+  - Should be used as a supportive tool only
+## Technical Specifications
+### Requirements
+- Python ≥ 3.7
+- PyTorch ≥ 1.7
+- CUDA compatible GPU (≥ 11GB VRAM)
+- Transformers library ≥ 4.0
+### Model Architecture Details
+- **Image Encoder**:
+  - ResNet-50 backbone
+  - Output dimension: 512
+- **Text Encoder**:
+  - BERT-base-uncased
+  - Output dimension: 512
+- **Training Parameters**:
+  - Batch size: 256
+  - Learning rate: 1e-4
+  - Temperature parameter: 0.1
+  - Training epochs: X
+### Input Requirements
+- **Images**:
+  - Resolution: 224x224 pixels
+  - Format: RGB
+  - Supported types: DICOM, PNG, JPEG
+- **Text**:
+  - Maximum length: 512 tokens
+  - Language: English
+## Citation
+```bibtex
+@article{zhang2020contrastive,
+  title={Contrastive Learning of Medical Visual Representations from Paired Images and Text},
+  author={Zhang, Yuhao and Jiang, Hang and Miura, Yasuhide and Manning, Christopher D and Langlotz, Curtis P},
+  journal={arXiv preprint arXiv:2010.00747},
+  year={2020}
+}
+```
+## Maintainers
+[Your organization/team information]
+## Updates and Versions
+- v1.0.0 (Current):
+  - Initial release
+  - Base model trained on [dataset]
+  - Performance benchmarks established
+## Getting Started
+```python
+from convirt import ConVIRT
+# Load the model
+model = ConVIRT.from_pretrained('path/to/checkpoint')
+# Extract features
+image_features = model.encode_image(image)
+text_features = model.encode_text(text)
+# Compute similarity
+similarity = model.compute_similarity(image_features, text_features)
+```
 ## Citation
 ```bibtex
 @inproceedings{zhou2024benchx,