Vision Transformer (ViT) for Facial Expression Recognition Model Card

Model Overview

Model Description

The vit-face-expression model is a Vision Transformer fine-tuned for the task of facial emotion recognition.

It is trained on the FER2013 dataset, which consists of facial images categorized into seven different emotions:

  • Angry
  • Disgust
  • Fear
  • Happy
  • Sad
  • Surprise
  • Neutral

Data Preprocessing

The input images are preprocessed before being fed into the model. The preprocessing steps include:

  • Resizing: Images are resized to the specified input size.
  • Normalization: Pixel values are normalized to a specific range.
  • Data Augmentation: Random transformations such as rotations, flips, and zooms are applied to augment the training dataset.

Evaluation Metrics

  • Validation set accuracy: 0.7113
  • Test set accuracy: 0.7116

Limitations

  • Data Bias: The model's performance may be influenced by biases present in the training data.
  • Generalization: The model's ability to generalize to unseen data is subject to the diversity of the training dataset.
Downloads last month
644,995
Safetensors
Model size
85.8M params
Tensor type
F32
Β·
Inference API
Drag image file here or click to browse from your device

Spaces using trpakov/vit-face-expression 17