thennal commited on
Commit
f3e4b92
1 Parent(s): 33783f5

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -0
README.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - ml
4
+ license: apache-2.0
5
+ tags:
6
+ - whisper-event
7
+ datasets:
8
+ - mozilla-foundation/common_voice_11_0
9
+ - google/fleurs
10
+ - thennal/IMaSC
11
+ - thennal/ulca_ml
12
+ - thennal/msc
13
+ - thennal/indic_tts_ml
14
+ metrics:
15
+ - wer
16
+ model-index:
17
+ - name: "Whisper Medium Malayalam - Thennal D K"
18
+ results:
19
+ - task:
20
+ name: Automatic Speech Recognition
21
+ type: automatic-speech-recognition
22
+ dataset:
23
+ name: Common Voice 11.0
24
+ type: mozilla-foundation/common_voice_11_0
25
+ config: ml
26
+ split: test
27
+ args: ml
28
+ metrics:
29
+ - name: Wer
30
+ type: wer
31
+ value: 42.98850574712644
32
+ - name: Cer
33
+ type: cer
34
+ value: 10.390585878818229
35
+ ---
36
+
37
+
38
+ # Whisper Medium Malayalam - Thennal D K
39
+
40
+ This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on a combined dataset sourced from IMaSC,
41
+ SMC, Indic TTS, FLEURS (train set), Common Voice 11 (train + other set), OpenSLR, and ULCA.
42
+ It achieves the following results on the evaluation set (Common Voice 11 test split):
43
+ - Loss: 0.0730
44
+ - WER: 42.9886
45
+ - CER: 10.3906
46
+
47
+ ## Model description
48
+
49
+ More information needed
50
+
51
+ ## Intended uses & limitations
52
+
53
+ More information needed
54
+
55
+ ## Training and evaluation data
56
+
57
+ More information needed
58
+
59
+ ## Training procedure
60
+
61
+ ### Training hyperparameters
62
+
63
+ The following hyperparameters were used during training:
64
+ - learning_rate: 1e-05
65
+ - train_batch_size: 32
66
+ - eval_batch_size: 16
67
+ - seed: 42
68
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
69
+ - lr_scheduler_type: linear
70
+ - lr_scheduler_warmup_steps: 500
71
+ - training_steps: 4000
72
+ - mixed_precision_training: Native AMP
73
+
74
+ ### Framework versions
75
+
76
+ - Transformers 4.26.0.dev0
77
+ - Pytorch 1.13.0+cu117
78
+ - Datasets 2.7.1.dev0
79
+ - Tokenizers 0.13.2