README.md · filipzawadka/polish

metadata

title: Polish Whisper
emoji: 🏃
colorFrom: red
colorTo: blue
sdk: gradio
sdk_version: 4.8.0
app_file: app.py
pinned: false
license: apache-2.0

Possible model improvments

(a) model-centric approach - for sure the biggest improvment is using the bigger whisper architecture increase the batch size and train for longer, we could use a scheduler to rise it consistently, until the model stabilizes completly multi-head training: we could train on all languages with common part of the architecture, which could iprove generalization and help us be able to use much more data

(b) data-centric approach - we can use a dataset with better phonetic desctiption like TIMIT dataset we can use more data, and more diverse data, here most of the files are recorder from a laptop microphone, which can influence predictions on other sourses add noise and other transformations to the dataset

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference