imvladikon commited on
Commit
5685ae1
1 Parent(s): 88c729c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -5
README.md CHANGED
@@ -7,19 +7,33 @@ tags:
7
  - he
8
  - generated_from_trainer
9
  model-index:
10
- - name: wav2vec2-xls-r-300m-a-hebrew
11
  results: []
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
- # wav2vec2-xls-r-300m-a-hebrew
18
 
19
- This model is a fine-tuned version of [imvladikon/wav2vec2-xls-r-300m-a-hebrew](https://huggingface.co/imvladikon/wav2vec2-xls-r-300m-a-hebrew) on the None dataset.
20
- It achieves the following results on the evaluation set:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  - Loss: 0.4502
22
- - Wer: 0.2318
23
 
24
  ## Model description
25
 
@@ -37,6 +51,63 @@ More information needed
37
 
38
  ### Training hyperparameters
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  The following hyperparameters were used during training:
41
  - learning_rate: 0.0003
42
  - train_batch_size: 8
 
7
  - he
8
  - generated_from_trainer
9
  model-index:
10
+ - name: wav2vec2-xls-r-300m-hebrew
11
  results: []
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
+ # wav2vec2-xls-r-300m-hebrew
18
 
19
+ This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the private datasets in 2 stages - firstly was fine-tuned on a small dataset with good samples and it achieves the following results on the evaluation set with the dataset:
20
+
21
+
22
+ | split |size(gb) | n_samples | duration(hrs)| |
23
+ |---|---|---|---|---|
24
+ |train|4.19| 20306 | 28 | |
25
+ |dev |1.05| 5076 | 7 | |
26
+
27
+ - Loss: 0.5438
28
+ - Wer: 0.1773
29
+
30
+ Then the obtained model was fine-tuned on a large dataset with the small good dataset, with various samples from different sources, and with an unlabeled dataset that was weakly labeled using a previously trained model.
31
+ on a small dataset from previous step achieves
32
+ - WER: 0.1697
33
+
34
+ on a whole dataset
35
  - Loss: 0.4502
36
+ - WER: 0.2318
37
 
38
  ## Model description
39
 
 
51
 
52
  ### Training hyperparameters
53
 
54
+
55
+ #### First training
56
+
57
+ The following hyperparameters were used during training:
58
+ - learning_rate: 0.0003
59
+ - train_batch_size: 8
60
+ - eval_batch_size: 8
61
+ - seed: 42
62
+ - distributed_type: multi-GPU
63
+ - num_devices: 2
64
+ - gradient_accumulation_steps: 4
65
+ - total_train_batch_size: 64
66
+ - total_eval_batch_size: 16
67
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
68
+ - lr_scheduler_type: linear
69
+ - lr_scheduler_warmup_steps: 1000
70
+ - num_epochs: 100.0
71
+ - mixed_precision_training: Native AMP
72
+
73
+ Training results
74
+
75
+ | Training Loss | Epoch | Step | Validation Loss | Wer |
76
+ |:-------------:|:-----:|:-----:|:---------------:|:------:|
77
+ | No log | 3.15 | 1000 | 0.5203 | 0.4333 |
78
+ | 1.4284 | 6.31 | 2000 | 0.4816 | 0.3951 |
79
+ | 1.4284 | 9.46 | 3000 | 0.4315 | 0.3546 |
80
+ | 1.283 | 12.62 | 4000 | 0.4278 | 0.3404 |
81
+ | 1.283 | 15.77 | 5000 | 0.4090 | 0.3054 |
82
+ | 1.1777 | 18.93 | 6000 | 0.3893 | 0.3006 |
83
+ | 1.1777 | 22.08 | 7000 | 0.3968 | 0.2857 |
84
+ | 1.0994 | 25.24 | 8000 | 0.3892 | 0.2751 |
85
+ | 1.0994 | 28.39 | 9000 | 0.4061 | 0.2690 |
86
+ | 1.0323 | 31.54 | 10000 | 0.4114 | 0.2507 |
87
+ | 1.0323 | 34.7 | 11000 | 0.4021 | 0.2508 |
88
+ | 0.9623 | 37.85 | 12000 | 0.4032 | 0.2378 |
89
+ | 0.9623 | 41.01 | 13000 | 0.4148 | 0.2374 |
90
+ | 0.9077 | 44.16 | 14000 | 0.4350 | 0.2323 |
91
+ | 0.9077 | 47.32 | 15000 | 0.4515 | 0.2246 |
92
+ | 0.8573 | 50.47 | 16000 | 0.4474 | 0.2180 |
93
+ | 0.8573 | 53.63 | 17000 | 0.4649 | 0.2171 |
94
+ | 0.8083 | 56.78 | 18000 | 0.4455 | 0.2102 |
95
+ | 0.8083 | 59.94 | 19000 | 0.4587 | 0.2092 |
96
+ | 0.769 | 63.09 | 20000 | 0.4794 | 0.2012 |
97
+ | 0.769 | 66.25 | 21000 | 0.4845 | 0.2007 |
98
+ | 0.7308 | 69.4 | 22000 | 0.4937 | 0.2008 |
99
+ | 0.7308 | 72.55 | 23000 | 0.4920 | 0.1895 |
100
+ | 0.6927 | 75.71 | 24000 | 0.5179 | 0.1911 |
101
+ | 0.6927 | 78.86 | 25000 | 0.5202 | 0.1877 |
102
+ | 0.6622 | 82.02 | 26000 | 0.5266 | 0.1840 |
103
+ | 0.6622 | 85.17 | 27000 | 0.5351 | 0.1854 |
104
+ | 0.6315 | 88.33 | 28000 | 0.5373 | 0.1811 |
105
+ | 0.6315 | 91.48 | 29000 | 0.5331 | 0.1792 |
106
+ | 0.6075 | 94.64 | 30000 | 0.5390 | 0.1779 |
107
+ | 0.6075 | 97.79 | 31000 | 0.5459 | 0.1773 |
108
+
109
+ #### Second training
110
+
111
  The following hyperparameters were used during training:
112
  - learning_rate: 0.0003
113
  - train_batch_size: 8