YYama0 commited on
Commit
b2bb8df
Β·
verified Β·
1 Parent(s): 1fcfe94

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +110 -3
README.md CHANGED
@@ -1,3 +1,110 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # RadFig-classifier
6
+
7
+ A deep learning model for classifying medical images as suitable or unsuitable for Visual Question Answering (VQA) tasks. This classifier helps filter medical images to identify those that are appropriate for VQA applications.
8
+
9
+ ## Overview
10
+
11
+ RadFig-classifier is based on EfficientNetV2-S architecture and trained on medical imaging data to determine whether an image contains sufficient visual information for meaningful question-answering tasks. The model uses ensemble prediction across 5-fold cross-validation models for robust performance.
12
+
13
+ ## Installation
14
+
15
+ ### Requirements
16
+
17
+ ```bash
18
+ pip install torch torchvision timm opencv-python albumentations pandas tqdm pillow numpy
19
+ ```
20
+
21
+ ### Command Line Usage
22
+
23
+ #### Single Image Classification
24
+
25
+ ```bash
26
+ # Get probability score
27
+ python inference.py --input image.jpg
28
+
29
+ # Get binary classification
30
+ python inference.py --input image.jpg --binary
31
+ ```
32
+
33
+ #### Batch Processing
34
+
35
+ ```bash
36
+ # Process all images in directory
37
+ python inference.py --input /path/to/images/ --output results.csv
38
+
39
+ # Binary classification with CSV output
40
+ python inference.py --input /path/to/images/ --output results.csv --binary
41
+ ```
42
+
43
+ ## Model Architecture
44
+
45
+ - **Base Model**: EfficientNetV2-S
46
+ - **Input Size**: 512Γ—512 pixels
47
+ - **Output**: Single probability score (0-1)
48
+ - **Training**: 5-fold cross-validation ensemble
49
+ - **Framework**: PyTorch + timm
50
+
51
+ ## Directory Structure
52
+
53
+ ```
54
+ RadFig-classifier/
55
+ β”œβ”€β”€ inference.py # Main inference script
56
+ β”œβ”€β”€ models/ # Pre-trained model weights
57
+ β”‚ β”œβ”€β”€ tf_efficientnetv2_s.in21k_ft_in1k_fold0_best_loss.pth
58
+ β”‚ β”œβ”€β”€ tf_efficientnetv2_s.in21k_ft_in1k_fold1_best_loss.pth
59
+ β”‚ β”œβ”€β”€ tf_efficientnetv2_s.in21k_ft_in1k_fold2_best_loss.pth
60
+ β”‚ β”œβ”€β”€ tf_efficientnetv2_s.in21k_ft_in1k_fold3_best_loss.pth
61
+ β”‚ └── tf_efficientnetv2_s.in21k_ft_in1k_fold4_best_loss.pth
62
+ β”œβ”€β”€ README.md
63
+ └── requirements.txt
64
+ ```
65
+
66
+ ## Output Format
67
+
68
+ ### Single Image Output
69
+
70
+ ```
71
+ Image: medical_scan.jpg
72
+ Probability suitable for VQA: 0.8542
73
+ Classification: suitable
74
+ ```
75
+
76
+ ### CSV Output
77
+
78
+ | image_path | filename | prediction | suitable_for_vqa |
79
+ |------------|----------|------------|------------------|
80
+ | /path/img1.jpg | img1.jpg | 0.8542 | True |
81
+ | /path/img2.jpg | img2.jpg | 0.2156 | False |
82
+ | /path/img3.jpg | img3.jpg | 0.9234 | True |
83
+
84
+ ## Command Line Arguments
85
+
86
+ | Argument | Description | Required |
87
+ |----------|-------------|----------|
88
+ | `--input` | Input image file or directory | Yes |
89
+ | `--models` | Directory containing model files | No (default: "models") |
90
+ | `--output` | Output CSV file path | No |
91
+ | `--binary` | Return binary predictions instead of probabilities | No |
92
+
93
+ ## Use Cases
94
+
95
+ - **Medical VQA Systems**: Pre-filter images before VQA processing
96
+ - **Dataset Curation**: Automatically filter medical image datasets
97
+ - **Quality Control**: Assess image quality for medical AI applications
98
+ - **Research**: Filter images for medical computer vision studies
99
+
100
+ ## Citation
101
+
102
+ If you use RadFig-classifier in your research, please cite:
103
+
104
+ ```bibtex
105
+ coming soon...
106
+ ```
107
+
108
+ ## License
109
+
110
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.