MaxViT Tiny Fine-tuned on Synset Signset Germany - GTSRB Subset
A MaxViT Tiny model addressing the task of German traffic sign recognition. The network was initially pretrained on the ImageNet-1K dataset and subsequently fine-tuned on the Synset Signset Germany (SSG) GTSRB subset. Under this configuration, the model attains a classification accuracy of 99.7% on the SSG validation set.
Model Details
Model Description
This model is a MaxViT Tiny fine-tuned on the synthetic Synset Signset Germany - GTSRB subset dataset for the task of German traffic sign recognition. It was initialized from ImageNet-1K pretrained weights and trained using a standard fine-tuning pipeline. The resulting model performs image classification over 43 German traffic sign categories, matching the classes of the well-known GTSRB dataset.
Model Sources
🔹Repository: MaxViT
🔹Paper: MaxViT: Multi-Axis Vision Transformer
Uses
The model addresses the task of traffic sign recognition in Germany.
Direct Use
This model is intended for the following use cases:
✅ Traffic sign recognition: Classifying German traffic signs into 43 categories, excluding safety-critical or high-risk applications.
✅ Sim-to-real gap analysis: Comparing model performance between synthetic and real-world training data.
✅ Benchmarking and prototyping: Performing cross-dataset evaluations.
✅ Educational purposes: Learning about fine-tuning vision models for classification tasks.\
Downstream Use
The model can be:
➡️ Fine-tuned further on real-world traffic sign datasets for domain adaptation.
➡️ Integrated into larger perception pipelines, especially for research prototypes.
➡️ Used as a baseline for comparing synthetic vs. real-world training approaches.\
Out-of-Scope Use
This model should not be used for:
🚫 Safety-critical applications: Including but not limited to autonomous driving systems, advanced driver-assistance systems (ADAS), or any real-time
traffic management systems.
🚫 High-risk AI applications: As defined by the European AI Act under Annex III, such as AI systems intended for safety components in road traffic management
and operation.
🚫 Production deployment: Without exhaustive validation to ensure the model is "relevant, sufficiently representative, and to the best extent possible free of
errors and complete in view of the intended purpose of the system."\
⚠️ The model has not been validated for real-world safety-critical deployment.
Bias, Risks, and Limitations
🔹 Synthetic training data: The model was trained exclusively on synthetic images that may not fully represent real-world conditions.
Performance on real-world traffic sign images may differ from the reported validation metrics.
🔹 Limited class coverage: The model only recognizes 43 specific German traffic sign classes and will not generalize to other sign types or countries.
Recommendations
Users should:
➡️ Validate the model on real-world data before drawing conclusions about deployment readiness.
➡️ Note that high accuracy on synthetic data does not necessarily guarantee equivalent performance in real-world scenarios.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "FraunhoferIOSB/maxvit-tiny-1k-224-finetuned-ssg"
processor = AutoImageProcessor.from_pretrained(model_name)
model = AutoModelForImageClassification.from_pretrained(model_name)
# Load and preprocess image (here a single image as an example)
image = Image.open("path/to/traffic_sign.jpg")
inputs = processor(images=image, return_tensors="pt")
# Inference
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class_idx = logits.argmax(-1).item()
print(f"Predicted class: {model.config.id2label[predicted_class_idx]}")
Training Details
Training Data
The model was trained on the Synset Signset Germany - GTSRB Subset dataset:
🔹 Description: A synthetic traffic sign recognition dataset containing 43 German traffic sign classes (matching GTSRB classes)
🔹 Size: 17,200 images (400 images per class)
🔹 Split: Ogre train
🔹 Paper: Synset Signset Germany: a Synthetic Dataset for German Traffic Sign Recognition\
Training Procedure
Preprocessing
🔹 Input resolution: 224×224 pixels
🔹 Normalization: ImageNet default mean and standard deviation
🔹 Interpolation: Bicubic\
Data Augmentation
🔹 Random erasing: Pixel mode with a probability of 0.25
🔹 RandAugment: Geometric augmentations have been disabled
since traffic signs are sensitive to orientation
Training Hyperparameters
| Parameter | Value |
|---|---|
| Training regime | fp16 mixed precision (AMP) |
| Optimizer | AdamW |
| Learning rate | 1e-3 |
| Minimum learning rate | 1e-6 |
| Weight decay | 0.01 |
| Batch size | 128 (per GPU) |
| Effective batch size | 256 (2 GPUs) |
| Epochs | 100 |
| Warmup epochs | 10 |
| Label smoothing | 0.1 |
| Drop path rate | 0.1 |
| Gradient clipping | 1.0 |
| Seed | 67 |
Evaluation
Testing Data, Factors & Metrics
Testing Data
The model was test on the Ogre validation split of the Synset Signset Germany - GTSRB Subset dataset:
🔹 Description: A synthetic traffic sign recognition dataset containing 43 German traffic sign classes (matching GTSRB classes)
🔹 Size: 4,300 images (100 images per class)
🔹 Split: Ogre validation
🔹 Paper: Synset Signset Germany: a Synthetic Dataset for German Traffic Sign Recognition\
and the validation split of the German Traffic Sign Recognition Benchmark (GTSRB) dataset:
🔹 Description: A multi-class classification dataset featuring 43 classes of traffic signs
🔹 Size: 12,630 images
🔹 Split: test
🔹 Paper: The German Traffic Sign Recognition Benchmark: A multi-class classification competition
Results
| Metric | SSG | GTSRB |
|---|---|---|
| Accuracy | 99.67% | 96.47% |
| Top-5 Accuracy | 99.88% | 99.07% |
| F1 Score | 99.67% | 95.60% |
Summary
The model achieves excellent performance on the SSG validation set with 99.6% accuracy across all 43 traffic sign classes and 96.4% accuracy on the real-world GTSRB test set.
Technical Specifications
Model Architecture and Objective
🔹 Architecture: MaxViT Tiny
🔹 Parameters: ~30.4M
🔹 Objective: Multi-class classification (Cross-Entropy with label smoothing)
🔹 Output: 43-class confidence distribution
Compute Infrastructure
Hardware
🔹 2× NVIDIA L40 GPUs
🔹 Distributed Data Parallel (DDP) training
Citation
If you consider our work helpful, please cite our Synset Signset Germany dataset alongside MaxViT:
BibTeX:
@inproceedings{synset_signset_ger_sielemann_2024,
title={{Synset Signset Germany: A Synthetic Dataset for German Traffic Sign Recognition}},
author={Sielemann, Anne and Loercher, Lena and Schumacher, Max-Lion and Wolf, Stefan and Roschani, Masoud and Ziehn, Jens and Beyerer, Juergen},
booktitle={2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC)},
year={2024}
}
APA:
Sielemann, A., Loercher, L., Schumacher, M., Wolf, S., Roschani, M., Ziehn, J., and Beyerer, J. (2024).
Synset Signset Germany: A Synthetic Dataset for German Traffic Sign Recognition.
In 2024 IEEE 27th International Conference on Intelligent Transportation Systems (ITSC).
Model Card Contact
Anne Sielemann
Fraunhofer IOSB
Group »Automotive and Simulation«
Fraunhoferstr. | 76131 Karlsruhe | Germany
anne.sielemann@iosb.fraunhofer.de
www.iosb.fraunhofer.de
Jens Ziehn
Fraunhofer IOSB
Group leader »Automotive and Simulation«
Fraunhoferstr. | 76131 Karlsruhe | Germany
Phone +49 721 6091 – 633
jens.ziehn@iosb.fraunhofer.de
www.iosb.fraunhofer.de
- Downloads last month
- 4
Model tree for FraunhoferIOSB/maxvit-tiny-1k-224-finetuned-ssg
Base model
timm/maxvit_tiny_tf_224.in1k