license: apache-2.0
Whisper Medium ATC short
This model is a fine-tuned openai/whisper-medium on Czech and English air traffic communication recordings from Czech airport LKKU.
It was created as a product of bachelor's thesis at Faculty of Information Technology, Brno University of Technology.
Model description
- Developed by: Veronika Nevarilova (@xnevar00), Igor Szoke (@iszoke)
- Shared by: BUT FIT
- Model type: Whisper
- Languages: Czech, English
- License: Apache 2.0
- Finetuned from model: openai/whisper-medium
Usage
import torch
from transformers import pipeline
audio = "path/to/audio.xx"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
transcribe = pipeline(task="automatic-speech-recognition", model="BUT-FIT/whisper-ATC-czech-short", chunk_length_s=30, device=device)
transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(task="transcribe", language="czech")
print('Transcription:', transcribe(audio)["text"])
Dataset
Training dataset was made of ~5 hours of air traffic communication recordings. Recordings were Czech and English (80:20) and sporadically Slovak.
Output format
The model was learned to abbreviate some information, especially numbers and callsigns. Transcription format of a recording is as follows:
Recording: Oscar Kilo Alpha Bravo Charlie dráha dva nula střední pro přistání volná vítr nula jedna nula stupňů pět uzlů
Transcription: OKABC dráha 20C pro přistání volná vítr 010 stupňů 5 uzlů
Note: See also model BUT-FIT/whisper-ATC-czech-full, which does not abbreviate any values and transcribes recordings word by word.
Results
The model reached total WER of 20.0 % on unseen Czech and English LKKU recordings. 34.0 % WER was achieved on a testset containing Czech air traffic recordings from other airports, LKPR and LKTB.
WER of callsings in LKKU recordings was evaluated to be 7.8 %, while on LKPR and LKTB dataset the model reached 11.6 %.
Training hyperparameters
- learning_rate: 3e-5
- per_device_train_batch_size: 2
- gradient_accumulation_steps: 8
- warmup_ratio: 0.12
- fp16: True
- gradient_checkpointing: True
- evaluation_strategy: "epoch"
- save_strategy: "epoch"
- load_best_model_at_end: True
- metric_for_best_model: "wer"
- num_train_epochs: 45
Contact
For further information don't hesitate to contact Veronika Nevarilova (xnevar00@stud.fit.vutbr.cz) or Igor Szoke (szoke@fit.vutbr.cz).