--- license: apache-2.0 base_model: google/vit-base-patch16-224-in21k tags: - generated_from_trainer metrics: - accuracy model-index: - name: vit-eGTZANplus results: [] datasets: - ghermoso/egtzan_plus pipeline_tag: image-classification --- # Vision Transformer (ViT) for Music Genre Classification ## Model Overview - **Model Name:** [ghermoso/vit-eGTZANplus](https://huggingface.co/ghermoso/vit-eGTZANplus) - **Task:** Image Classification - **Dataset:** [egtzan_plus](https://huggingface.co/datasets/ghermoso/egtzan_plus) - **Model Architecture:** [Vision Transformer (ViT)](https://huggingface.co/docs/transformers/model_doc/vit) - **Finetuned from model:** This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on an [egtzan_plus](https://huggingface.co/datasets/ghermoso/egtzan_plus) dataset. It achieves the following results on the evaluation set: - Loss: 0.8358 - Accuracy: 0.7460