vit-pneumonia-x-ray_3_class

The model is a finetuned ViT model from 21k task.

Model description

The model outputs a distribution on 3 classes (Normal vs Bacterial Pneumonia vs Viral Pneumonia )

Intended uses & limitations

The intended use is only academic, as the limitations of this model are severe. First of all, it was trained on a very limited dataset (Kermany et al., 2018), which includes only around 5k chest x-ray images (2306 bacterial, 1224 viral, and 1116 normal). The dataset consisted of only PA chest x-rays, and as such, it was only used for these types of x-rays. Additionally, most of the images have been marked with the letter R, indicating the right side of the body; however, not all chest-x-rays used in the world have such a marking (some have the letter L). There is also a problem that sometimes a direct diagnosis of a chest x-ray pneumonia type cannot be made simply because one patient can be infected with both viral and bacterial pneumonia. Moreover, some patients have been diagnosed with pneumonia, but the underlying cause of it is non-infectious. Please consult this paper for a deeper understanding of the causes of pneumonia.

Training and evaluation data

The model followed a standard procudure of finetuning a ViT model. The only difference is that the first 11 layers of the encoder have been frozen consult the code, to get a better idea. Additionnaly data augmentation applied is very much subtle that is rotation by around (-10,10) degrees and very small light change, this was concious choice as the chest x-ray data is very homogenous in structure and a more extreme data augmentation scheme could introduce too much noise, see this paper to understand the challanges of data augmentation for these type of data.

Training procedure

The max epochs was set to 50, early stopping on epoch 5 based on the eval_loss was chosen to prevent overfitting.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 50
we use a weights that penelize errors based on the the number of instances ina class (to circumvent the class imbalance)
early stopping: 5 steps with no improvment on val loss
validation step: every 100 steps

Framework versions

Transformers 4.38.1
Pytorch 2.3.0
Datasets 2.19.1
Tokenizers 0.15.2

Test Metrics

Test metric performed on the (Kermany et al., 2018) test set.

Metric	ViT Model (DA)
Test Accuracy	0.8686
Test Precision	0.8777
Test Recall	0.8686
Test F1 Score	0.8697
Bacterial Accuracy	0.9541
Viral Accuracy	0.8209
Normal Accuracy	0.8162

pawlo2013
/

vit-pneumonia-x-ray_3_class