File size: 1,050 Bytes
3ab7221
 
 
 
 
 
 
 
 
 
 
 
2505a15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3ab7221
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
---
title: Polish Whisper
emoji: 🏃
colorFrom: red
colorTo: blue
sdk: gradio
sdk_version: 4.8.0
app_file: app.py
pinned: false
license: apache-2.0
---

Possible model improvments

(a) model-centric approach - 
    for sure the biggest improvment is using the bigger whisper architecture
    increase the batch size and train for longer, we could use a scheduler to rise it consistently,
        until the model stabilizes completly
    multi-head training: we could train on all languages with common part of the architecture, which could iprove generalization
        and help us be able to use much more data

(b) data-centric approach - 
    we can use a dataset with better phonetic desctiption like TIMIT dataset
    we can use more data, and more diverse data, here most of the files
        are recorder from a laptop microphone, which can influence 
        predictions on other sourses
    add noise and other transformations to the dataset

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference