Edit model card

Whisper Medium ATC full

This model is a fine-tuned openai/whisper-medium on Czech and English air traffic communication recordings from Czech airport LKKU.

It was created as a product of bachelor's thesis at Faculty of Information Technology Brno University of Technology.

Model description

Usage

import torch
from transformers import pipeline

audio = "path/to/audio.xx"
device = "cuda:0" if torch.cuda.is_available() else "cpu"

transcribe = pipeline(task="automatic-speech-recognition", model="BUT-FIT/whisper-ATC-czech-full", chunk_length_s=30, device=device)
transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(task="transcribe", language="czech")
print('Transcription:', transcribe(audio)["text"])

Dataset

Training dataset was made of ~5 hours of air traffic communication recordings. Recordings were Czech and English (80:20) and sporadically Slovak.

Output format

The model was learned to transcribe every recording word by word. Transcription format of a recording is as follows:

Recording: Oscar Kilo Alpha Bravo Charlie dráha dva nula střední pro přistání volná vítr nula jedna nula stupňů pět uzlů

Transcription: Oscar Kilo Alpha Bravo Charlie dráha dva nula střední pro přistání volná vítr nula jedna nula stupňů pět uzlů

Note: See also model BUT-FIT/whisper-ATC-czech-short, which abbreviates callsigns and numbers.

Results

The model reached total WER of 14.7 % on unseen Czech and English LKKU recordings. 19.6 % WER was achieved on a testset containing Czech air traffic recordings from other airports, LKPR and LKTB.

WER of callsings in LKKU recordings was evaluated to be 6.2 %, while on LKPR and LKTB dataset the model reached 3.6 %.

Training hyperparameters

  • learning_rate: 3e-5
  • per_device_train_batch_size: 2
  • gradient_accumulation_steps: 8
  • warmup_ratio: 0.12
  • fp16: True
  • gradient_checkpointing: True
  • evaluation_strategy: "epoch"
  • save_strategy: "epoch"
  • load_best_model_at_end: True
  • metric_for_best_model: "wer"
  • num_train_epochs: 45

Contact

For further information don't hesitate to contact Veronika Nevarilova (xnevar00@stud.fit.vutbr.cz) or Igor Szoke (szoke@fit.vutbr.cz).

Downloads last month
2
Safetensors
Model size
764M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.