File size: 1,538 Bytes
d608b4d
7a7f81c
 
 
 
 
d608b4d
7a7f81c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
---

license: apache-2.0
tags:
- image-classification
datasets:
- Face-Mask18K 
---


# Distilled Data-efficient Image Transformer for Face Mask Detection

Distilled data-efficient Image Transformer (DeiT) model pre-trained and fine-tuned on Self Currated Custom Face-Mask18K Dataset (18k images, 2 classes) at resolution 224x224. It was first introduced in the paper Training data-efficient image transformers & distillation through attention by Touvron et al.  

## Model description

This model is a distilled Vision Transformer (ViT). It uses a distillation token, besides the class token, to effectively learn from a teacher (CNN) during both pre-training and fine-tuning. The distillation token is learned through backpropagation, by interacting with the class ([CLS]) and patch tokens through the self-attention layers.

Images are presented to the model as a sequence of fixed-size patches (resolution 16x16), which are linearly embedded. 

## Training Metrics
    epoch                    =          2.0

    total_flos               = 2078245655GF

    train_loss               =       0.0438

    train_runtime            =   1:37:16.87

    train_samples_per_second =        9.887

    train_steps_per_second   =        0.309


---

## Evaluation Metrics
    epoch                   =        2.0

    eval_accuracy           =     0.9922

    eval_loss               =     0.0271

    eval_runtime            = 0:03:17.36

    eval_samples_per_second =      18.22

    eval_steps_per_second   =       2.28