LEAP: Layer-skipping Efficiency via Adaptive Progression for Vision Transformer Distillation

Paper | GitHub | Project Page

This repository contains the ViT-Tiny and ViT-S checkpoints (No Register) distilled from ViT-G DINOv2 on ImageNet-100 and ImageNet-1K. The knowledge distillation process follows the procedure proposed in the paper "LEAP: Layer-skipping Efficiency via Adaptive Progression for Vision Transformer Distillation".

Introduction

Vision Foundation Models (VFMs) with ViT backbones, such as DINOv2, are computationally demanding. LEAP (Layer-skipping Efficiency via Adaptive Progression) is a training curriculum for ViT feature-based knowledge distillation. Instead of supervising the student against a fixed teacher block, LEAP advances the supervisory target through the teacher's feature maps (shallow-to-deep) based on online CKA alignment. This allows the student to build a foundational representation before tackling higher-level abstractions.

Use cases

The ViT models output feature maps that can be used for a variety of downstream tasks, including:

  • Image Classification
  • Instance Retrieval
  • Semantic Segmentation

Performance

ImageNet-100:

image

image

image

ImageNet-1K:

image

image

image

Citation

@article{leap2026,
  title={LEAP: Layer-skipping Efficiency via Adaptive Progression for Vision Transformer Distillation},
  author={Zhang, Jiaqi and Lee, Ashton and Wong, Anthony and Zou, John and BuGhanem, Sami and Balestriero, Randall},
  journal={arXiv preprint arXiv:2606.19483},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Kevin-Z/LEAP_Distilled_ViT