metadata

license: apache-2.0
base_model: facebook/wav2vec2-base
tags:
  - audio-classification
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: wav2vec2-base_down_on
    results: []

wav2vec2-base_down_on

This model is a fine-tuned version of facebook/wav2vec2-base on the MatsRooth/down_on dataset. It achieves the following results on the evaluation set:

Loss: 0.1385
Accuracy: 0.9962

MatsRooth/down_on is the part of superb ks with the labels down and on. Superb ks is in turn derived from (Speech Commands dataset v1.0)[https://www.tensorflow.org/datasets/catalog/speech_commands]. Train/validation/test splits are as in superb ks.

Intended uses

MatsRooth/down_on and this model exercise methodology for creating an audio classification dataset from local directory structures and audio files, and check whether fine tuning wav2vec2 classification with two labels works well.

Training procedure

Training used 'sbatch' on a cluster and the program run_audio_classification.py. 'down_on.sub' is below, start it with 'sbatch down_on.sub'.

''' #!/bin/bash #SBATCH -J down_on # Job name #SBATCH -o down_on_%j.out # Name of stdout output log file (%j expands to jobID) #SBATCH -e down_on_%j.err # Name of stderr output log file (%j expands to jobID) #SBATCH -N 1 # Total number of nodes requested #SBATCH -n 1 # Total number of cores requested #SBATCH --mem=5000 # Total amount of (real) memory requested (per node) #SBATCH -t 10:00:00 # Time limit (hh:mm:ss) #SBATCH --partition=gpu # Request partition for resource allocation #SBATCH --gres=gpu:1 # Specify a list of generic consumable resources (per node)

cd ~/ac_h /home/mr249/env/hugh/bin/python run_audio_classification.py
--model_name_or_path facebook/wav2vec2-base
--dataset_name MatsRooth/down_on
--output_dir wav2vec2-base_down_on
--overwrite_output_dir
--remove_unused_columns False
--do_train
--do_eval
--fp16
--learning_rate 3e-5
--max_length_seconds 1
--attention_mask False
--warmup_ratio 0.1
--num_train_epochs 5
--per_device_train_batch_size 32
--gradient_accumulation_steps 4
--per_device_eval_batch_size 32
--dataloader_num_workers 1
--logging_strategy steps
--logging_steps 10
--evaluation_strategy epoch
--save_strategy epoch
--load_best_model_at_end True
--metric_for_best_model accuracy
--save_total_limit 3
--seed 0 '''

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 32
eval_batch_size: 32
seed: 0
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.6089	1.0	29	0.1385	0.9962
0.1297	2.0	58	0.0513	0.9962
0.0835	3.0	87	0.0389	0.9885
0.058	4.0	116	0.0302	0.9923
0.0481	5.0	145	0.0245	0.9942

Framework versions

Transformers 4.31.0.dev0
Pytorch 2.0.1+cu117
Datasets 2.13.1
Tokenizers 0.13.3