YYama0
/

RadFig-classifier

Model card Files Files and versions

xet

Community

YYama0 commited on May 30

Commit

b2bb8df

verified ·

1 Parent(s): 1fcfe94

Update README.md

Browse files

Files changed (1) hide show

README.md +110 -3

README.md CHANGED Viewed

@@ -1,3 +1,110 @@
----
-license: mit
----

+---
+license: mit
+---
+# RadFig-classifier
+A deep learning model for classifying medical images as suitable or unsuitable for Visual Question Answering (VQA) tasks. This classifier helps filter medical images to identify those that are appropriate for VQA applications.
+## Overview
+RadFig-classifier is based on EfficientNetV2-S architecture and trained on medical imaging data to determine whether an image contains sufficient visual information for meaningful question-answering tasks. The model uses ensemble prediction across 5-fold cross-validation models for robust performance.
+## Installation
+### Requirements
+```bash
+pip install torch torchvision timm opencv-python albumentations pandas tqdm pillow numpy
+```
+### Command Line Usage
+#### Single Image Classification
+```bash
+# Get probability score
+python inference.py --input image.jpg
+# Get binary classification
+python inference.py --input image.jpg --binary
+```
+#### Batch Processing
+```bash
+# Process all images in directory
+python inference.py --input /path/to/images/ --output results.csv
+# Binary classification with CSV output
+python inference.py --input /path/to/images/ --output results.csv --binary
+```
+## Model Architecture
+- **Base Model**: EfficientNetV2-S
+- **Input Size**: 512×512 pixels
+- **Output**: Single probability score (0-1)
+- **Training**: 5-fold cross-validation ensemble
+- **Framework**: PyTorch + timm
+## Directory Structure
+```
+RadFig-classifier/
+├── inference.py           # Main inference script
+├── models/                # Pre-trained model weights
+│   ├── tf_efficientnetv2_s.in21k_ft_in1k_fold0_best_loss.pth
+│   ├── tf_efficientnetv2_s.in21k_ft_in1k_fold1_best_loss.pth
+│   ├── tf_efficientnetv2_s.in21k_ft_in1k_fold2_best_loss.pth
+│   ├── tf_efficientnetv2_s.in21k_ft_in1k_fold3_best_loss.pth
+│   └── tf_efficientnetv2_s.in21k_ft_in1k_fold4_best_loss.pth
+├── README.md
+└── requirements.txt
+```
+## Output Format
+### Single Image Output
+```
+Image: medical_scan.jpg
+Probability suitable for VQA: 0.8542
+Classification: suitable
+```
+### CSV Output
+| image_path | filename | prediction | suitable_for_vqa |
+|------------|----------|------------|------------------|
+| /path/img1.jpg | img1.jpg | 0.8542 | True |
+| /path/img2.jpg | img2.jpg | 0.2156 | False |
+| /path/img3.jpg | img3.jpg | 0.9234 | True |
+## Command Line Arguments
+| Argument | Description | Required |
+|----------|-------------|----------|
+| `--input` | Input image file or directory | Yes |
+| `--models` | Directory containing model files | No (default: "models") |
+| `--output` | Output CSV file path | No |
+| `--binary` | Return binary predictions instead of probabilities | No |
+## Use Cases
+- **Medical VQA Systems**: Pre-filter images before VQA processing
+- **Dataset Curation**: Automatically filter medical image datasets
+- **Quality Control**: Assess image quality for medical AI applications
+- **Research**: Filter images for medical computer vision studies
+## Citation
+If you use RadFig-classifier in your research, please cite:
+```bibtex
+coming soon...
+```
+## License
+This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.