EfficientNetV2-S HAM10000 Image-Only Baseline

Model Summary

This repository contains an EfficientNetV2-S image-only baseline trained on the HAM10000 dataset for 7-class dermatoscopic skin-lesion classification.

The checkpoint is intended as a research baseline for a multimodal learning study comparing:

image-only classification,
metadata-only classification,
late-fusion image + metadata classification.

This model uses dermatoscopic images only. It does not use patient metadata such as age, sex, or anatomical site.

Important: This model is not intended for clinical diagnosis, treatment decisions, patient triage, or deployment in medical settings.

Intended Use

Intended Uses

Research and education.
Baseline comparison for medical image classification experiments.
Reproducible comparison against metadata-only and late-fusion HAM10000 models.
Portfolio demonstration of medical AI model development, class-imbalance handling, and evaluation.

Out-of-Scope Uses

Clinical diagnosis or screening.
Replacing dermatologists, clinicians, or qualified medical professionals.
Patient-facing decision support.
Treatment recommendation or medical reassurance.
Real-world medical deployment without clinical validation, regulatory review, and appropriate safety controls.

Dataset

The model was trained and evaluated on HAM10000, a dermatoscopic image dataset containing common pigmented skin lesions.

The label mapping used in this project is:

Label ID	Class Code	Lesion Type
0	`akiec`	Actinic keratoses and intraepithelial carcinoma / Bowen's disease
1	`bcc`	Basal cell carcinoma
2	`bkl`	Benign keratosis-like lesions
3	`df`	Dermatofibroma
4	`mel`	Melanoma
5	`nv`	Melanocytic nevi
6	`vasc`	Vascular lesions

Data Split

The model was trained using stratified train/validation/test splits.

Split	Size
Train	7,966
Validation	996
Test	996

Training-set class counts:

Label ID	Class Code	Train Count
0	`akiec`	261
1	`bcc`	411
2	`bkl`	871
3	`df`	92
4	`mel`	889
5	`nv`	5,328
6	`vasc`	114

Model Architecture

Backbone: torchvision.models.efficientnet_v2_s
Pretraining: ImageNet-1K pretrained weights
Classifier head: final linear layer replaced with a 7-class output layer
Input modality: RGB dermatoscopic images only
Output: 7-class lesion prediction

Preprocessing

All images were resized and normalized before being passed into the model.

Input image mode: RGB
Image size: 224 x 224
Normalization: ImageNet mean and standard deviation
- Mean: [0.485, 0.456, 0.406]
- Standard deviation: [0.229, 0.224, 0.225]

Training augmentations:

Resize to 224 x 224
Random horizontal flip
Random vertical flip
Random rotation up to 15 degrees
ImageNet normalization

Evaluation preprocessing:

Resize to 224 x 224
ImageNet normalization

Training Details

Training setup:

Setting	Value
Framework	PyTorch / torchvision
Hardware used in notebook	NVIDIA Tesla T4
Batch size	32
Maximum epochs	10
Early stopping patience	3 epochs
Selection metric	Validation macro-F1
Loss	Class-weighted cross-entropy
Best epoch	6
Best validation macro-F1	0.8370
Best validation balanced accuracy	0.8312
Best validation accuracy	0.8785

Class weights were computed from the training split as:

Label ID	Class Code	Class Weight
0	`akiec`	4.3602
1	`bcc`	2.7689
2	`bkl`	1.3065
3	`df`	12.3696
4	`mel`	1.2801
5	`nv`	0.2136
6	`vasc`	9.9825

Evaluation

The model was evaluated on a held-out test set of 996 images.

Test Metrics

Metric	Value
Accuracy	0.8665
Macro-F1	0.8042
Weighted F1	0.8679
Balanced Accuracy	0.8342

Per-Class Test Performance

Label ID	Class Code	Precision	Recall	F1-score	Support
0	`akiec`	0.7778	0.8485	0.8116	33
1	`bcc`	0.7742	0.9231	0.8421	52
2	`bkl`	0.7921	0.7339	0.7619	109
3	`df`	0.8889	0.7273	0.8000	11
4	`mel`	0.6364	0.6937	0.6638	111
5	`nv`	0.9397	0.9129	0.9261	666
6	`vasc`	0.7000	1.0000	0.8235	14

Confusion Matrix

Rows are true labels and columns are predicted labels.

True \ Pred	0	1	2	3	4	5	6
0	28	3	0	1	0	1	0
1	0	48	1	0	2	1	0
2	5	3	80	0	10	10	1
3	0	1	0	8	0	2	0
4	1	0	6	0	77	25	2
5	2	7	14	0	32	608	3
6	0	0	0	0	0	0	14

Example Usage

This checkpoint stores the model weights for an EfficientNetV2-S architecture with a 7-class classifier head.

import torch
import torch.nn as nn
from torchvision import models, transforms
from PIL import Image

label_mapping = {
    0: "akiec",
    1: "bcc",
    2: "bkl",
    3: "df",
    4: "mel",
    5: "nv",
    6: "vasc",
}

image_size = 224
preprocess = transforms.Compose([
    transforms.Resize((image_size, image_size)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    ),
])

model = models.efficientnet_v2_s(weights=None)
in_features = model.classifier[1].in_features
model.classifier[1] = nn.Linear(in_features, 7)

state_dict = torch.load("efficientnetv2s_image_only_state_dict.pt", map_location="cpu")
model.load_state_dict(state_dict)
model.eval()

image = Image.open("example.jpg").convert("RGB")
inputs = preprocess(image).unsqueeze(0)

with torch.no_grad():
    logits = model(inputs)
    probs = torch.softmax(logits, dim=1)
    pred_id = int(probs.argmax(dim=1).item())

print(label_mapping[pred_id], float(probs[0, pred_id]))

If using a full training checkpoint instead of a plain state dictionary, load the nested key:

checkpoint = torch.load("best_efficientnetv2s_image_only_ham10000.pt", map_location="cpu")
model.load_state_dict(checkpoint["model_state_dict"])

Limitations

The model was trained on HAM10000 and may learn dataset-specific patterns or shortcuts.
HAM10000 is highly class-imbalanced, with melanocytic nevi (nv) heavily represented.
Some classes have small test support, such as dermatofibroma (df) and vascular lesions (vasc), so per-class estimates may be unstable.
The model does not use patient metadata such as age, sex, or anatomical site.
Performance may vary across demographic groups, imaging devices, clinical contexts, and lesion presentations.
The model has not been clinically validated.
This checkpoint is a research baseline and should not be interpreted as a medical device.

Ethical and Safety Considerations

This model concerns medical image classification. Incorrect predictions could cause harm if used for clinical or patient-facing decisions. The model should only be used for research, education, and controlled experimentation.

Do not use this model to diagnose skin cancer, decide whether a lesion is benign or malignant, delay care, recommend treatment, or replace consultation with qualified medical professionals.

Project Context

This model is part of a broader portfolio project on multimodal HAM10000 classification. The planned comparison is:

Image-only EfficientNetV2-S baseline — this model.
Metadata-only MLP baseline — age, sex, and anatomical-site features only.
Late-fusion image + metadata model — image features combined with tabular metadata.

The purpose is to test whether metadata improves classification performance beyond the image-only baseline and to document the strengths, limitations, and possible shortcut risks of metadata fusion.

Training Notebook

The training and evaluation workflow is documented in:

ham10000-image-baseline.ipynb

Citation

If using this model or reproducing the project, cite the HAM10000 dataset paper:

@article{tschandl2018ham10000,
  title={The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions},
  author={Tschandl, Philipp and Rosendahl, Cliff and Kittler, Harald},
  journal={Scientific Data},
  volume={5},
  number={1},
  pages={1--9},
  year={2018},
  publisher={Nature Publishing Group}
}

License

This model repository is released under the Apache License 2.0.

Downloads last month: -; Downloads are not tracked for this model. How to track

True \ Pred	0	1	2	3	4	5	6
0	28	3	0	1	0	1	0
1	0	48	1	0	2	1	0
2	5	3	80	0	10	10	1
3	0	1	0	8	0	2	0
4	1	0	6	0	77	25	2
5	2	7	14	0	32	608	3
6	0	0	0	0	0	0	14

True \ Pred	0	1	2	3	4	5	6
0	28	3	0	1	0	1	0
1	0	48	1	0	2	1	0
2	5	3	80	0	10	10	1
3	0	1	0	8	0	2	0
4	1	0	6	0	77	25	2
5	2	7	14	0	32	608	3
6	0	0	0	0	0	0	14

True \ Pred	0	1	2	3	4	5	6
0	28	3	0	1	0	1	0
1	0	48	1	0	2	1	0
2	5	3	80	0	10	10	1
3	0	1	0	8	0	2	0
4	1	0	6	0	77	25	2
5	2	7	14	0	32	608	3
6	0	0	0	0	0	0	14