werning's picture
Create README.md
936a7e6 verified
metadata
pipeline_tag: audio-classification

This repository contains the models submitted to Task 1 of the DCASE 2024 Challenge

Description

The task is to develop a data-efficient and low-complexity acoustic scene classification system. The challenge dataset consists of 1 second audio clips from one of 10 classes: airport, bus, metro, metro_station, park, public_square, shopping_mall, street_pedestrian, street_traffic, tram. Five models are trained on splits of the training data: 5%, 10%, 25%, 50%, and 100%, respectively.

We chose to use the baseline model architecture and apply a target-specific training process which involves a pretraining dataset that is pruned to match the target dataset. Knowledge distillation is used to transfer knowledge from a pre-trained audio tagging ensemble to the target model. A technical report describing the training process can be found here

Results

The full results of all participants can be found here: https://dcase.community/challenge2024/task-data-efficient-low-complexity-acoustic-scene-classification-results

The results of our submission compared to the baseline on the evaluation data are as follows:

Name Official rank Rank score Split 5% Split 10% Split 25% Split 50% Split 100%
Werning_UPBNT 8 54.35 49.21 % 52.51 % 55.49 % 56.20 % 58.34 %
Baseline 17 50.73 44.00 % 46.95 % 51.47 % 54.40 % 56.84 %

Usage

The example notebook shows how to predict the acoustic scene for a given audio file using the models.

The model code is adapted from the baseline repository: https://github.com/CPJKU/dcase2024_task1_baseline