metadata

license: apache-2.0

Whisper Medium ATC short

This model is a fine-tuned openai/whisper-medium on Czech and English air traffic communication recordings from Czech airport LKKU.

It was created as a product of bachelor's thesis at Faculty of Information Technology, Brno University of Technology.

Model description

Developed by: Veronika Nevarilova (@xnevar00), Igor Szoke (@iszoke)
Shared by: BUT FIT
Model type: Whisper
Languages: Czech, English
License: Apache 2.0
Finetuned from model: openai/whisper-medium

Usage

import torch
from transformers import pipeline

audio = "path/to/audio.xx"
device = "cuda:0" if torch.cuda.is_available() else "cpu"

transcribe = pipeline(task="automatic-speech-recognition", model="BUT-FIT/whisper-ATC-czech-short", chunk_length_s=30, device=device)
transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(task="transcribe", language="czech")
print('Transcription:', transcribe(audio)["text"])

Dataset

Training dataset was made of ~5 hours of air traffic communication recordings. Recordings were Czech and English (80:20) and sporadically Slovak.

Output format

The model was learned to abbreviate some information, especially numbers and callsigns. Transcription format of a recording is as follows:

Recording: Oscar Kilo Alpha Bravo Charlie dráha dva nula střední pro přistání volná vítr nula jedna nula stupňů pět uzlů

Transcription: OKABC dráha 20C pro přistání volná vítr 010 stupňů 5 uzlů

Note: See also model BUT-FIT/whisper-ATC-czech-full, which does not abbreviate any values and transcribes recordings word by word.

Results

The model reached total WER of 20.0 % on unseen Czech and English LKKU recordings. 34.0 % WER was achieved on a testset containing Czech air traffic recordings from other airports, LKPR and LKTB.

WER of callsings in LKKU recordings was evaluated to be 7.8 %, while on LKPR and LKTB dataset the model reached 11.6 %.

Training hyperparameters

learning_rate: 3e-5
per_device_train_batch_size: 2
gradient_accumulation_steps: 8
warmup_ratio: 0.12
fp16: True
gradient_checkpointing: True
evaluation_strategy: "epoch"
save_strategy: "epoch"
load_best_model_at_end: True
metric_for_best_model: "wer"
num_train_epochs: 45

Contact

For further information don't hesitate to contact Veronika Nevarilova (xnevar00@stud.fit.vutbr.cz) or Igor Szoke (szoke@fit.vutbr.cz).