KD-Teachers

Pre-trained teacher models used for Knowledge Distillation with Mixup augmentation on CIFAR-10 and CIFAR-100. These checkpoints are the teacher component of the KD-Mixup framework.

All models were fine-tuned from ImageNet pre-trained weights on CIFAR-10 and CIFAR-100 using SGD with momentum, ReduceLROnPlateau scheduling, and mixed precision training (float16).


Models

Model Backbone Pre-training
best_resnet152v2 ResNet-152 V2 ImageNet
best_convnexttiny ConvNeXt-Tiny ImageNet
best_convnextlarge ConvNeXt-Large ImageNet
best_vitbase ViT-B/16 ImageNet

Performance

CIFAR-10

Model Accuracy Confidence ECE
ConvNeXt-Large 0.9856 0.9789 0.0070
ViT-B/16 0.9850 0.9927 0.0085
ResNet-152 V2 0.9698 0.9838 0.0152
ConvNeXt-Tiny 0.9672 0.9744 0.0104

CIFAR-100

Model Accuracy Confidence ECE
ConvNeXt-Large 0.9217 0.9189 0.0049
ViT-B/16 0.9151 0.9272 0.0174
ResNet-152 V2 0.8257 0.8873 0.0619
ConvNeXt-Tiny 0.8196 0.7908 0.0288

File Structure

cifar10/
β”œβ”€β”€ best_resnet152v2.keras
β”œβ”€β”€ best_convnexttiny.keras
β”œβ”€β”€ best_convnextlarge.keras
└── best_vitbase.keras
cifar100/
β”œβ”€β”€ best_resnet152v2.keras
β”œβ”€β”€ best_convnexttiny.keras
β”œβ”€β”€ best_convnextlarge.keras
└── best_vitbase.keras

Usage

Download a checkpoint and place it in your local checkpoints/teachers/{dataset}/ folder:

from huggingface_hub import hf_hub_download

path = hf_hub_download(
    repo_id="josemedina/KD-Teachers",
    filename="cifar100/best_resnet152v2.keras"
)

Then load it with Keras:

import tensorflow as tf

model = tf.keras.models.load_model(path)

The expected checkpoint path for the KD-Mixup training script is:

checkpoints/teachers/{dataset}/best_{teacher_name}.keras

Training Details

  • Input size: 224 Γ— 224 Γ— 3
  • Batch size: 250
  • Optimizer: SGD (momentum=0.9, lr=1e-4)
  • LR schedule: ReduceLROnPlateau (patience=3, factor=0.9, min_lr=1e-5)
  • Max epochs: 500 (best checkpoint saved by val accuracy)
  • Augmentation: Random crop, horizontal flip
  • Precision: Mixed float16
  • ViT-B/16 normalization: ImageNet mean/std ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

Citation

If you use these checkpoints in your research, please cite:

@misc{medina2025kdmixup,
  author    = {Medina, Jos{\'e} and Hadachi, Amnir and Honeine, Paul and Bensrhair, Abdelaziz},
  title     = {Beyond Dark Knowledge: Mixup-Based Knowledge Distillation Under Vicinal Teacher Distributions},
  year      = {2025},
  publisher = {University of Tartu},
  url       = {https://github.com/JoseLMedinaC/KD-Mixup}
}
Downloads last month
146
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train josemedina/KD-Teachers