|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
|
|
# Whisper Medium ATC short |
|
|
|
This model is a fine-tuned [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on Czech and English air traffic communication recordings from Czech airport LKKU. |
|
|
|
It was created as a product of bachelor's thesis at Faculty of Information Technology, Brno University of Technology. |
|
|
|
# Model description |
|
|
|
|
|
- **Developed by:** Veronika Nevarilova ([@xnevar00](https://huggingface.co/xnevar00)), Igor Szoke ([@iszoke](https://huggingface.co/iszoke)) |
|
- **Shared by:** [BUT FIT](https://huggingface.co/BUT-FIT) |
|
- **Model type:** Whisper |
|
- **Languages:** Czech, English |
|
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
|
- **Finetuned from model:** [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) |
|
|
|
|
|
# Usage |
|
|
|
```python |
|
import torch |
|
from transformers import pipeline |
|
|
|
audio = "path/to/audio.xx" |
|
device = "cuda:0" if torch.cuda.is_available() else "cpu" |
|
|
|
transcribe = pipeline(task="automatic-speech-recognition", model="BUT-FIT/whisper-ATC-czech-short", chunk_length_s=30, device=device) |
|
transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(task="transcribe", language="czech") |
|
print('Transcription:', transcribe(audio)["text"]) |
|
``` |
|
|
|
# Dataset |
|
|
|
Training dataset was made of ~5 hours of air traffic communication recordings. Recordings were Czech and English (80:20) and sporadically Slovak. |
|
|
|
# Output format |
|
|
|
The model was learned to abbreviate some information, especially numbers and callsigns. Transcription format of a recording is as follows: |
|
|
|
Recording: *Oscar Kilo Alpha Bravo Charlie dráha dva nula střední pro přistání volná vítr nula jedna nula stupňů pět uzlů* |
|
|
|
Transcription: `OKABC dráha 20C pro přistání volná vítr 010 stupňů 5 uzlů` |
|
|
|
**Note:** See also model [BUT-FIT/whisper-ATC-czech-full](https://huggingface.co/BUT-FIT/whisper-ATC-czech-full), which does not abbreviate any values and transcribes recordings word by word. |
|
|
|
|
|
# Results |
|
|
|
The model reached total WER of 20.0 % on unseen Czech and English LKKU recordings. 34.0 % WER was achieved on a testset containing Czech air traffic recordings from other airports, LKPR and LKTB. |
|
|
|
WER of callsings in LKKU recordings was evaluated to be 7.8 %, while on LKPR and LKTB dataset the model reached 11.6 %. |
|
|
|
# Training hyperparameters |
|
|
|
- **learning_rate:** 3e-5 |
|
- **per_device_train_batch_size:** 2 |
|
- **gradient_accumulation_steps:** 8 |
|
- **warmup_ratio:** 0.12 |
|
- **fp16:** True |
|
- **gradient_checkpointing:** True |
|
- **evaluation_strategy:** "epoch" |
|
- **save_strategy:** "epoch" |
|
- **load_best_model_at_end:** True |
|
- **metric_for_best_model:** "wer" |
|
- **num_train_epochs:** 45 |
|
|
|
# Contact |
|
|
|
For further information don't hesitate to contact Veronika Nevarilova (**[xnevar00@stud.fit.vutbr.cz](xnevar00@stud.fit.vutbr.cz)**) or Igor Szoke (**[szoke@fit.vutbr.cz](szoke@fit.vutbr.cz)**). |