EfficientNetV2-S HAM10000 Image-Only Baseline
Model Summary
This repository contains an EfficientNetV2-S image-only baseline trained on the HAM10000 dataset for 7-class dermatoscopic skin-lesion classification.
The checkpoint is intended as a research baseline for a multimodal learning study comparing:
- image-only classification,
- metadata-only classification,
- late-fusion image + metadata classification.
This model uses dermatoscopic images only. It does not use patient metadata such as age, sex, or anatomical site.
Important: This model is not intended for clinical diagnosis, treatment decisions, patient triage, or deployment in medical settings.
Intended Use
Intended Uses
- Research and education.
- Baseline comparison for medical image classification experiments.
- Reproducible comparison against metadata-only and late-fusion HAM10000 models.
- Portfolio demonstration of medical AI model development, class-imbalance handling, and evaluation.
Out-of-Scope Uses
- Clinical diagnosis or screening.
- Replacing dermatologists, clinicians, or qualified medical professionals.
- Patient-facing decision support.
- Treatment recommendation or medical reassurance.
- Real-world medical deployment without clinical validation, regulatory review, and appropriate safety controls.
Dataset
The model was trained and evaluated on HAM10000, a dermatoscopic image dataset containing common pigmented skin lesions.
The label mapping used in this project is:
| Label ID | Class Code | Lesion Type |
|---|---|---|
| 0 | akiec |
Actinic keratoses and intraepithelial carcinoma / Bowen's disease |
| 1 | bcc |
Basal cell carcinoma |
| 2 | bkl |
Benign keratosis-like lesions |
| 3 | df |
Dermatofibroma |
| 4 | mel |
Melanoma |
| 5 | nv |
Melanocytic nevi |
| 6 | vasc |
Vascular lesions |
Data Split
The model was trained using stratified train/validation/test splits.
| Split | Size |
|---|---|
| Train | 7,966 |
| Validation | 996 |
| Test | 996 |
Training-set class counts:
| Label ID | Class Code | Train Count |
|---|---|---|
| 0 | akiec |
261 |
| 1 | bcc |
411 |
| 2 | bkl |
871 |
| 3 | df |
92 |
| 4 | mel |
889 |
| 5 | nv |
5,328 |
| 6 | vasc |
114 |
Model Architecture
- Backbone:
torchvision.models.efficientnet_v2_s - Pretraining: ImageNet-1K pretrained weights
- Classifier head: final linear layer replaced with a 7-class output layer
- Input modality: RGB dermatoscopic images only
- Output: 7-class lesion prediction
Preprocessing
All images were resized and normalized before being passed into the model.
- Input image mode: RGB
- Image size:
224 x 224 - Normalization: ImageNet mean and standard deviation
- Mean:
[0.485, 0.456, 0.406] - Standard deviation:
[0.229, 0.224, 0.225]
- Mean:
Training augmentations:
- Resize to
224 x 224 - Random horizontal flip
- Random vertical flip
- Random rotation up to 15 degrees
- ImageNet normalization
Evaluation preprocessing:
- Resize to
224 x 224 - ImageNet normalization
Training Details
Training setup:
| Setting | Value |
|---|---|
| Framework | PyTorch / torchvision |
| Hardware used in notebook | NVIDIA Tesla T4 |
| Batch size | 32 |
| Maximum epochs | 10 |
| Early stopping patience | 3 epochs |
| Selection metric | Validation macro-F1 |
| Loss | Class-weighted cross-entropy |
| Best epoch | 6 |
| Best validation macro-F1 | 0.8370 |
| Best validation balanced accuracy | 0.8312 |
| Best validation accuracy | 0.8785 |
Class weights were computed from the training split as:
| Label ID | Class Code | Class Weight |
|---|---|---|
| 0 | akiec |
4.3602 |
| 1 | bcc |
2.7689 |
| 2 | bkl |
1.3065 |
| 3 | df |
12.3696 |
| 4 | mel |
1.2801 |
| 5 | nv |
0.2136 |
| 6 | vasc |
9.9825 |
Evaluation
The model was evaluated on a held-out test set of 996 images.
Test Metrics
| Metric | Value |
|---|---|
| Accuracy | 0.8665 |
| Macro-F1 | 0.8042 |
| Weighted F1 | 0.8679 |
| Balanced Accuracy | 0.8342 |
Per-Class Test Performance
| Label ID | Class Code | Precision | Recall | F1-score | Support |
|---|---|---|---|---|---|
| 0 | akiec |
0.7778 | 0.8485 | 0.8116 | 33 |
| 1 | bcc |
0.7742 | 0.9231 | 0.8421 | 52 |
| 2 | bkl |
0.7921 | 0.7339 | 0.7619 | 109 |
| 3 | df |
0.8889 | 0.7273 | 0.8000 | 11 |
| 4 | mel |
0.6364 | 0.6937 | 0.6638 | 111 |
| 5 | nv |
0.9397 | 0.9129 | 0.9261 | 666 |
| 6 | vasc |
0.7000 | 1.0000 | 0.8235 | 14 |
Confusion Matrix
Rows are true labels and columns are predicted labels.
| True \ Pred | 0 | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|---|
| 0 | 28 | 3 | 0 | 1 | 0 | 1 | 0 |
| 1 | 0 | 48 | 1 | 0 | 2 | 1 | 0 |
| 2 | 5 | 3 | 80 | 0 | 10 | 10 | 1 |
| 3 | 0 | 1 | 0 | 8 | 0 | 2 | 0 |
| 4 | 1 | 0 | 6 | 0 | 77 | 25 | 2 |
| 5 | 2 | 7 | 14 | 0 | 32 | 608 | 3 |
| 6 | 0 | 0 | 0 | 0 | 0 | 0 | 14 |
Example Usage
This checkpoint stores the model weights for an EfficientNetV2-S architecture with a 7-class classifier head.
import torch
import torch.nn as nn
from torchvision import models, transforms
from PIL import Image
label_mapping = {
0: "akiec",
1: "bcc",
2: "bkl",
3: "df",
4: "mel",
5: "nv",
6: "vasc",
}
image_size = 224
preprocess = transforms.Compose([
transforms.Resize((image_size, image_size)),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
),
])
model = models.efficientnet_v2_s(weights=None)
in_features = model.classifier[1].in_features
model.classifier[1] = nn.Linear(in_features, 7)
state_dict = torch.load("efficientnetv2s_image_only_state_dict.pt", map_location="cpu")
model.load_state_dict(state_dict)
model.eval()
image = Image.open("example.jpg").convert("RGB")
inputs = preprocess(image).unsqueeze(0)
with torch.no_grad():
logits = model(inputs)
probs = torch.softmax(logits, dim=1)
pred_id = int(probs.argmax(dim=1).item())
print(label_mapping[pred_id], float(probs[0, pred_id]))
If using a full training checkpoint instead of a plain state dictionary, load the nested key:
checkpoint = torch.load("best_efficientnetv2s_image_only_ham10000.pt", map_location="cpu")
model.load_state_dict(checkpoint["model_state_dict"])
Limitations
- The model was trained on HAM10000 and may learn dataset-specific patterns or shortcuts.
- HAM10000 is highly class-imbalanced, with melanocytic nevi (
nv) heavily represented. - Some classes have small test support, such as dermatofibroma (
df) and vascular lesions (vasc), so per-class estimates may be unstable. - The model does not use patient metadata such as age, sex, or anatomical site.
- Performance may vary across demographic groups, imaging devices, clinical contexts, and lesion presentations.
- The model has not been clinically validated.
- This checkpoint is a research baseline and should not be interpreted as a medical device.
Ethical and Safety Considerations
This model concerns medical image classification. Incorrect predictions could cause harm if used for clinical or patient-facing decisions. The model should only be used for research, education, and controlled experimentation.
Do not use this model to diagnose skin cancer, decide whether a lesion is benign or malignant, delay care, recommend treatment, or replace consultation with qualified medical professionals.
Project Context
This model is part of a broader portfolio project on multimodal HAM10000 classification. The planned comparison is:
- Image-only EfficientNetV2-S baseline โ this model.
- Metadata-only MLP baseline โ age, sex, and anatomical-site features only.
- Late-fusion image + metadata model โ image features combined with tabular metadata.
The purpose is to test whether metadata improves classification performance beyond the image-only baseline and to document the strengths, limitations, and possible shortcut risks of metadata fusion.
Training Notebook
The training and evaluation workflow is documented in:
ham10000-image-baseline.ipynb
Citation
If using this model or reproducing the project, cite the HAM10000 dataset paper:
@article{tschandl2018ham10000,
title={The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions},
author={Tschandl, Philipp and Rosendahl, Cliff and Kittler, Harald},
journal={Scientific Data},
volume={5},
number={1},
pages={1--9},
year={2018},
publisher={Nature Publishing Group}
}
License
This model repository is released under the Apache License 2.0.