--- title: Polish Whisper emoji: 🏃 colorFrom: red colorTo: blue sdk: gradio sdk_version: 4.8.0 app_file: app.py pinned: false license: apache-2.0 --- Possible model improvments (a) model-centric approach - for sure the biggest improvment is using the bigger whisper architecture increase the batch size and train for longer, we could use a scheduler to rise it consistently, until the model stabilizes completly multi-head training: we could train on all languages with common part of the architecture, which could iprove generalization and help us be able to use much more data (b) data-centric approach - we can use a dataset with better phonetic desctiption like TIMIT dataset we can use more data, and more diverse data, here most of the files are recorder from a laptop microphone, which can influence predictions on other sourses add noise and other transformations to the dataset Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference