MaxViT Tiny Fine-tuned on Synset Signset Germany - GTSRB Subset

A MaxViT Tiny model addressing the task of German traffic sign recognition. The network was initially pretrained on the ImageNet-1K dataset and subsequently fine-tuned on the Synset Signset Germany (SSG) GTSRB subset. Under this configuration, the model attains a classification accuracy of 99.7% on the SSG validation set.

Model Details

Model Description

This model is a MaxViT Tiny fine-tuned on the synthetic Synset Signset Germany - GTSRB subset dataset for the task of German traffic sign recognition. It was initialized from ImageNet-1K pretrained weights and trained using a standard fine-tuning pipeline. The resulting model performs image classification over 43 German traffic sign categories, matching the classes of the well-known GTSRB dataset.

Model Sources

🔹Repository: MaxViT
🔹Paper: MaxViT: Multi-Axis Vision Transformer

Uses

The model addresses the task of traffic sign recognition in Germany.

Direct Use

This model is intended for the following use cases:

✅ Traffic sign recognition: Classifying German traffic signs into 43 categories, excluding safety-critical or high-risk applications.
✅ Sim-to-real gap analysis: Comparing model performance between synthetic and real-world training data.
✅ Benchmarking and prototyping: Performing cross-dataset evaluations.
✅ Educational purposes: Learning about fine-tuning vision models for classification tasks.\

Downstream Use

The model can be:

➡️ Fine-tuned further on real-world traffic sign datasets for domain adaptation.
➡️ Integrated into larger perception pipelines, especially for research prototypes.
➡️ Used as a baseline for comparing synthetic vs. real-world training approaches.\

Out-of-Scope Use

This model should not be used for:

🚫 Safety-critical applications: Including but not limited to autonomous driving systems, advanced driver-assistance systems (ADAS), or any real-time traffic management systems.
🚫 High-risk AI applications: As defined by the European AI Act under Annex III, such as AI systems intended for safety components in road traffic management and operation.
🚫 Production deployment: Without exhaustive validation to ensure the model is "relevant, sufficiently representative, and to the best extent possible free of errors and complete in view of the intended purpose of the system."\

⚠️ The model has not been validated for real-world safety-critical deployment.

Bias, Risks, and Limitations

🔹 Synthetic training data: The model was trained exclusively on synthetic images that may not fully represent real-world conditions. Performance on real-world traffic sign images may differ from the reported validation metrics.
🔹 Limited class coverage: The model only recognizes 43 specific German traffic sign classes and will not generalize to other sign types or countries.

Recommendations

Users should:

➡️ Validate the model on real-world data before drawing conclusions about deployment readiness.
➡️ Note that high accuracy on synthetic data does not necessarily guarantee equivalent performance in real-world scenarios.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch

# Load model and processor
model_name = "FraunhoferIOSB/maxvit-tiny-1k-224-finetuned-ssg"
processor = AutoImageProcessor.from_pretrained(model_name)
model = AutoModelForImageClassification.from_pretrained(model_name)

# Load and preprocess image (here a single image as an example)
image = Image.open("path/to/traffic_sign.jpg")
inputs = processor(images=image, return_tensors="pt")

# Inference
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class_idx = logits.argmax(-1).item()

print(f"Predicted class: {model.config.id2label[predicted_class_idx]}")

Training Details

Training Data

The model was trained on the Synset Signset Germany - GTSRB Subset dataset:

🔹 Description: A synthetic traffic sign recognition dataset containing 43 German traffic sign classes (matching GTSRB classes)
🔹 Size: 17,200 images (400 images per class)
🔹 Split: Ogre train
🔹 Paper: Synset Signset Germany: a Synthetic Dataset for German Traffic Sign Recognition\

Training Procedure

Preprocessing

🔹 Input resolution: 224×224 pixels
🔹 Normalization: ImageNet default mean and standard deviation
🔹 Interpolation: Bicubic\

Data Augmentation

🔹 Random erasing: Pixel mode with a probability of 0.25
🔹 RandAugment: Geometric augmentations have been disabled since traffic signs are sensitive to orientation

Training Hyperparameters

Parameter	Value
Training regime	fp16 mixed precision (AMP)
Optimizer	AdamW
Learning rate	1e-3
Minimum learning rate	1e-6
Weight decay	0.01
Batch size	128 (per GPU)
Effective batch size	256 (2 GPUs)
Epochs	100
Warmup epochs	10
Label smoothing	0.1
Drop path rate	0.1
Gradient clipping	1.0
Seed	67

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was test on the Ogre validation split of the Synset Signset Germany - GTSRB Subset dataset:

🔹 Description: A synthetic traffic sign recognition dataset containing 43 German traffic sign classes (matching GTSRB classes)
🔹 Size: 4,300 images (100 images per class)
🔹 Split: Ogre validation
🔹 Paper: Synset Signset Germany: a Synthetic Dataset for German Traffic Sign Recognition\

and the validation split of the German Traffic Sign Recognition Benchmark (GTSRB) dataset:

🔹 Description: A multi-class classification dataset featuring 43 classes of traffic signs
🔹 Size: 12,630 images
🔹 Split: test
🔹 Paper: The German Traffic Sign Recognition Benchmark: A multi-class classification competition

Results

Metric	SSG	GTSRB
Accuracy	99.67%	96.47%
Top-5 Accuracy	99.88%	99.07%
F1 Score	99.67%	95.60%

Summary

The model achieves excellent performance on the SSG validation set with 99.6% accuracy across all 43 traffic sign classes and 96.4% accuracy on the real-world GTSRB test set.

Technical Specifications

Model Architecture and Objective

🔹 Architecture: MaxViT Tiny
🔹 Parameters: ~30.4M
🔹 Objective: Multi-class classification (Cross-Entropy with label smoothing)
🔹 Output: 43-class confidence distribution

Compute Infrastructure

Hardware

🔹 2× NVIDIA L40 GPUs
🔹 Distributed Data Parallel (DDP) training

Citation

If you consider our work helpful, please cite our Synset Signset Germany dataset alongside MaxViT:

BibTeX:

@inproceedings{synset_signset_ger_sielemann_2024,
  title={{Synset Signset Germany: A Synthetic Dataset for German Traffic Sign Recognition}},
  author={Sielemann, Anne and Loercher, Lena and Schumacher, Max-Lion and Wolf, Stefan and Roschani, Masoud and Ziehn, Jens and Beyerer, Juergen},
  booktitle={2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC)},
  year={2024}
}

APA:

Sielemann, A., Loercher, L., Schumacher, M., Wolf, S., Roschani, M., Ziehn, J., and Beyerer, J. (2024).
Synset Signset Germany: A Synthetic Dataset for German Traffic Sign Recognition.
In 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC).

Model Card Contact

Anne Sielemann
Fraunhofer IOSB
Group »Automotive and Simulation«
Fraunhoferstr. | 76131 Karlsruhe | Germany
anne.sielemann@iosb.fraunhofer.de
www.iosb.fraunhofer.de

Jens Ziehn
Fraunhofer IOSB
Group leader »Automotive and Simulation«
Fraunhoferstr. | 76131 Karlsruhe | Germany
Phone +49 721 6091 – 633
jens.ziehn@iosb.fraunhofer.de
www.iosb.fraunhofer.de

Downloads last month: 4

Safetensors

Model size

30.5M params

Tensor type

F32

Model tree for FraunhoferIOSB/maxvit-tiny-1k-224-finetuned-ssg

Base model

timm/maxvit_tiny_tf_224.in1k

Finetuned

(2)

this model

Dataset used to train FraunhoferIOSB/maxvit-tiny-1k-224-finetuned-ssg

Paper for FraunhoferIOSB/maxvit-tiny-1k-224-finetuned-ssg

MaxViT: Multi-Axis Vision Transformer

Paper • 2204.01697 • Published Apr 4, 2022