EfficientNetV2-S HAM10000 Image-Only Baseline

Model Summary

This repository contains an EfficientNetV2-S image-only baseline trained on the HAM10000 dataset for 7-class dermatoscopic skin-lesion classification.

The checkpoint is intended as a research baseline for a multimodal learning study comparing:

  1. image-only classification,
  2. metadata-only classification,
  3. late-fusion image + metadata classification.

This model uses dermatoscopic images only. It does not use patient metadata such as age, sex, or anatomical site.

Important: This model is not intended for clinical diagnosis, treatment decisions, patient triage, or deployment in medical settings.

Intended Use

Intended Uses

  • Research and education.
  • Baseline comparison for medical image classification experiments.
  • Reproducible comparison against metadata-only and late-fusion HAM10000 models.
  • Portfolio demonstration of medical AI model development, class-imbalance handling, and evaluation.

Out-of-Scope Uses

  • Clinical diagnosis or screening.
  • Replacing dermatologists, clinicians, or qualified medical professionals.
  • Patient-facing decision support.
  • Treatment recommendation or medical reassurance.
  • Real-world medical deployment without clinical validation, regulatory review, and appropriate safety controls.

Dataset

The model was trained and evaluated on HAM10000, a dermatoscopic image dataset containing common pigmented skin lesions.

The label mapping used in this project is:

Label ID Class Code Lesion Type
0 akiec Actinic keratoses and intraepithelial carcinoma / Bowen's disease
1 bcc Basal cell carcinoma
2 bkl Benign keratosis-like lesions
3 df Dermatofibroma
4 mel Melanoma
5 nv Melanocytic nevi
6 vasc Vascular lesions

Data Split

The model was trained using stratified train/validation/test splits.

Split Size
Train 7,966
Validation 996
Test 996

Training-set class counts:

Label ID Class Code Train Count
0 akiec 261
1 bcc 411
2 bkl 871
3 df 92
4 mel 889
5 nv 5,328
6 vasc 114

Model Architecture

  • Backbone: torchvision.models.efficientnet_v2_s
  • Pretraining: ImageNet-1K pretrained weights
  • Classifier head: final linear layer replaced with a 7-class output layer
  • Input modality: RGB dermatoscopic images only
  • Output: 7-class lesion prediction

Preprocessing

All images were resized and normalized before being passed into the model.

  • Input image mode: RGB
  • Image size: 224 x 224
  • Normalization: ImageNet mean and standard deviation
    • Mean: [0.485, 0.456, 0.406]
    • Standard deviation: [0.229, 0.224, 0.225]

Training augmentations:

  • Resize to 224 x 224
  • Random horizontal flip
  • Random vertical flip
  • Random rotation up to 15 degrees
  • ImageNet normalization

Evaluation preprocessing:

  • Resize to 224 x 224
  • ImageNet normalization

Training Details

Training setup:

Setting Value
Framework PyTorch / torchvision
Hardware used in notebook NVIDIA Tesla T4
Batch size 32
Maximum epochs 10
Early stopping patience 3 epochs
Selection metric Validation macro-F1
Loss Class-weighted cross-entropy
Best epoch 6
Best validation macro-F1 0.8370
Best validation balanced accuracy 0.8312
Best validation accuracy 0.8785

Class weights were computed from the training split as:

Label ID Class Code Class Weight
0 akiec 4.3602
1 bcc 2.7689
2 bkl 1.3065
3 df 12.3696
4 mel 1.2801
5 nv 0.2136
6 vasc 9.9825

Evaluation

The model was evaluated on a held-out test set of 996 images.

Test Metrics

Metric Value
Accuracy 0.8665
Macro-F1 0.8042
Weighted F1 0.8679
Balanced Accuracy 0.8342

Per-Class Test Performance

Label ID Class Code Precision Recall F1-score Support
0 akiec 0.7778 0.8485 0.8116 33
1 bcc 0.7742 0.9231 0.8421 52
2 bkl 0.7921 0.7339 0.7619 109
3 df 0.8889 0.7273 0.8000 11
4 mel 0.6364 0.6937 0.6638 111
5 nv 0.9397 0.9129 0.9261 666
6 vasc 0.7000 1.0000 0.8235 14

Confusion Matrix

Rows are true labels and columns are predicted labels.

True \ Pred 0 1 2 3 4 5 6
0 28 3 0 1 0 1 0
1 0 48 1 0 2 1 0
2 5 3 80 0 10 10 1
3 0 1 0 8 0 2 0
4 1 0 6 0 77 25 2
5 2 7 14 0 32 608 3
6 0 0 0 0 0 0 14

Example Usage

This checkpoint stores the model weights for an EfficientNetV2-S architecture with a 7-class classifier head.

import torch
import torch.nn as nn
from torchvision import models, transforms
from PIL import Image

label_mapping = {
    0: "akiec",
    1: "bcc",
    2: "bkl",
    3: "df",
    4: "mel",
    5: "nv",
    6: "vasc",
}

image_size = 224
preprocess = transforms.Compose([
    transforms.Resize((image_size, image_size)),
    transforms.ToTensor(),
    transforms.Normalize(
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    ),
])

model = models.efficientnet_v2_s(weights=None)
in_features = model.classifier[1].in_features
model.classifier[1] = nn.Linear(in_features, 7)

state_dict = torch.load("efficientnetv2s_image_only_state_dict.pt", map_location="cpu")
model.load_state_dict(state_dict)
model.eval()

image = Image.open("example.jpg").convert("RGB")
inputs = preprocess(image).unsqueeze(0)

with torch.no_grad():
    logits = model(inputs)
    probs = torch.softmax(logits, dim=1)
    pred_id = int(probs.argmax(dim=1).item())

print(label_mapping[pred_id], float(probs[0, pred_id]))

If using a full training checkpoint instead of a plain state dictionary, load the nested key:

checkpoint = torch.load("best_efficientnetv2s_image_only_ham10000.pt", map_location="cpu")
model.load_state_dict(checkpoint["model_state_dict"])

Limitations

  • The model was trained on HAM10000 and may learn dataset-specific patterns or shortcuts.
  • HAM10000 is highly class-imbalanced, with melanocytic nevi (nv) heavily represented.
  • Some classes have small test support, such as dermatofibroma (df) and vascular lesions (vasc), so per-class estimates may be unstable.
  • The model does not use patient metadata such as age, sex, or anatomical site.
  • Performance may vary across demographic groups, imaging devices, clinical contexts, and lesion presentations.
  • The model has not been clinically validated.
  • This checkpoint is a research baseline and should not be interpreted as a medical device.

Ethical and Safety Considerations

This model concerns medical image classification. Incorrect predictions could cause harm if used for clinical or patient-facing decisions. The model should only be used for research, education, and controlled experimentation.

Do not use this model to diagnose skin cancer, decide whether a lesion is benign or malignant, delay care, recommend treatment, or replace consultation with qualified medical professionals.

Project Context

This model is part of a broader portfolio project on multimodal HAM10000 classification. The planned comparison is:

  1. Image-only EfficientNetV2-S baseline โ€” this model.
  2. Metadata-only MLP baseline โ€” age, sex, and anatomical-site features only.
  3. Late-fusion image + metadata model โ€” image features combined with tabular metadata.

The purpose is to test whether metadata improves classification performance beyond the image-only baseline and to document the strengths, limitations, and possible shortcut risks of metadata fusion.

Training Notebook

The training and evaluation workflow is documented in:

  • ham10000-image-baseline.ipynb

Citation

If using this model or reproducing the project, cite the HAM10000 dataset paper:

@article{tschandl2018ham10000,
  title={The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions},
  author={Tschandl, Philipp and Rosendahl, Cliff and Kittler, Harald},
  journal={Scientific Data},
  volume={5},
  number={1},
  pages={1--9},
  year={2018},
  publisher={Nature Publishing Group}
}

License

This model repository is released under the Apache License 2.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support