Audio Classification

This repo contains code and notes for this tutorial.

Dataset

GTZAN is used.

Usage

export HUGGINGFACE_TOKEN=<your_token>
python main.py

Performance

Acc: 0.81 (default setting)

Notes

  1. 🤗 Datasets support train_test_split() method to split the dataset.

  2. feature_extractor can not handle resampling

    • To resample, one can use dataset.map()
from datasets import Audio

gtzan = gtzan.cast_column("audio", Audio(sampling_rate=feature_extractor.sampling_rate))
  1. feature_extractor do the normalization and returns input_values and attention_mask.

  2. .map() support batched preprocess.

  3. Why AutoModelForAudioClassification.from_pretrained takes label2id and id2label?

Downloads last month
3
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train anthony-wss/distilhubert-finetuned-gtzan