README.md · WestAI-SC/dcase24_challenge_task1

metadata

pipeline_tag: audio-classification

This repository contains the models submitted to Task 1 of the DCASE 2024 Challenge

Description

The task is to develop a data-efficient and low-complexity acoustic scene classification system. The challenge dataset consists of 1 second audio clips from one of 10 classes: airport, bus, metro, metro_station, park, public_square, shopping_mall, street_pedestrian, street_traffic, tram. Five models are trained on splits of the training data: 5%, 10%, 25%, 50%, and 100%, respectively.

We chose to use the baseline model architecture and apply a target-specific training process which involves a pretraining dataset that is pruned to match the target dataset. Knowledge distillation is used to transfer knowledge from a pre-trained audio tagging ensemble to the target model. A technical report describing the training process can be found here

Results

The full results of all participants can be found here: https://dcase.community/challenge2024/task-data-efficient-low-complexity-acoustic-scene-classification-results

The results of our submission compared to the baseline on the evaluation data are as follows:

Name	Official rank	Rank score	Split 5%	Split 10%	Split 25%	Split 50%	Split 100%
Werning_UPBNT	8	54.35	49.21 %	52.51 %	55.49 %	56.20 %	58.34 %
Baseline	17	50.73	44.00 %	46.95 %	51.47 %	54.40 %	56.84 %

Usage

The example notebook shows how to predict the acoustic scene for a given audio file using the models.

The model code is adapted from the baseline repository: https://github.com/CPJKU/dcase2024_task1_baseline