Whisper Medium ATC full
This model is a fine-tuned openai/whisper-medium on Czech and English air traffic communication recordings from Czech airport LKKU.
It was created as a product of bachelor's thesis at Faculty of Information Technology Brno University of Technology.
Model description
- Developed by: Veronika Nevarilova (@xnevar00), Igor Szoke (@iszoke)
- Shared by: BUT FIT
- Model type: Whisper
- Languages: Czech, English
- License: Apache 2.0
- Finetuned from model: openai/whisper-medium
Usage
import torch
from transformers import pipeline
audio = "path/to/audio.xx"
device = "cuda:0" if torch.cuda.is_available() else "cpu"
transcribe = pipeline(task="automatic-speech-recognition", model="BUT-FIT/whisper-ATC-czech-full", chunk_length_s=30, device=device)
transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(task="transcribe", language="czech")
print('Transcription:', transcribe(audio)["text"])
Dataset
Training dataset was made of ~5 hours of air traffic communication recordings. Recordings were Czech and English (80:20) and sporadically Slovak.
Output format
The model was learned to transcribe every recording word by word. Transcription format of a recording is as follows:
Recording: Oscar Kilo Alpha Bravo Charlie dráha dva nula střední pro přistání volná vítr nula jedna nula stupňů pět uzlů
Transcription: Oscar Kilo Alpha Bravo Charlie dráha dva nula střední pro přistání volná vítr nula jedna nula stupňů pět uzlů
Note: See also model BUT-FIT/whisper-ATC-czech-short, which abbreviates callsigns and numbers.
Results
The model reached total WER of 14.7 % on unseen Czech and English LKKU recordings. 19.6 % WER was achieved on a testset containing Czech air traffic recordings from other airports, LKPR and LKTB.
WER of callsings in LKKU recordings was evaluated to be 6.2 %, while on LKPR and LKTB dataset the model reached 3.6 %.
Training hyperparameters
- learning_rate: 3e-5
- per_device_train_batch_size: 2
- gradient_accumulation_steps: 8
- warmup_ratio: 0.12
- fp16: True
- gradient_checkpointing: True
- evaluation_strategy: "epoch"
- save_strategy: "epoch"
- load_best_model_at_end: True
- metric_for_best_model: "wer"
- num_train_epochs: 45
Contact
For further information don't hesitate to contact Veronika Nevarilova (xnevar00@stud.fit.vutbr.cz) or Igor Szoke (szoke@fit.vutbr.cz).
- Downloads last month
- 341