ales commited on
Commit
4aae45b
1 Parent(s): 52d55ef

update model card README.md

Browse files
Files changed (3) hide show
  1. README.md +81 -0
  2. train_20221217-004912.log +7 -0
  3. train_run_1.log +17 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - generated_from_trainer
5
+ datasets:
6
+ - common_voice_11_0
7
+ metrics:
8
+ - wer
9
+ model-index:
10
+ - name: whisper-base-belarusian
11
+ results:
12
+ - task:
13
+ name: Automatic Speech Recognition
14
+ type: automatic-speech-recognition
15
+ dataset:
16
+ name: common_voice_11_0
17
+ type: common_voice_11_0
18
+ config: be
19
+ split: validation
20
+ args: be
21
+ metrics:
22
+ - name: Wer
23
+ type: wer
24
+ value: 12.206885082321635
25
+ ---
26
+
27
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
+ should probably proofread and complete it, then remove this comment. -->
29
+
30
+ # whisper-base-belarusian
31
+
32
+ This model is a fine-tuned version of [openai/whisper-base](https://huggingface.co/openai/whisper-base) on the common_voice_11_0 dataset.
33
+ It achieves the following results on the evaluation set:
34
+ - Loss: 0.1080
35
+ - Wer: 12.2069
36
+
37
+ ## Model description
38
+
39
+ More information needed
40
+
41
+ ## Intended uses & limitations
42
+
43
+ More information needed
44
+
45
+ ## Training and evaluation data
46
+
47
+ More information needed
48
+
49
+ ## Training procedure
50
+
51
+ ### Training hyperparameters
52
+
53
+ The following hyperparameters were used during training:
54
+ - learning_rate: 0.0001
55
+ - train_batch_size: 64
56
+ - eval_batch_size: 32
57
+ - seed: 42
58
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
59
+ - lr_scheduler_type: linear
60
+ - lr_scheduler_warmup_steps: 500
61
+ - training_steps: 6000
62
+ - mixed_precision_training: Native AMP
63
+
64
+ ### Training results
65
+
66
+ | Training Loss | Epoch | Step | Validation Loss | Wer |
67
+ |:-------------:|:-----:|:----:|:---------------:|:-------:|
68
+ | 0.2445 | 0.17 | 1000 | 0.3059 | 32.4163 |
69
+ | 0.1823 | 0.33 | 2000 | 0.2004 | 22.1259 |
70
+ | 0.1412 | 0.5 | 3000 | 0.1752 | 20.0700 |
71
+ | 0.1093 | 0.67 | 4000 | 0.1413 | 16.0533 |
72
+ | 0.1137 | 0.83 | 5000 | 0.1155 | 13.3108 |
73
+ | 0.0585 | 1.1 | 6000 | 0.1080 | 12.2069 |
74
+
75
+
76
+ ### Framework versions
77
+
78
+ - Transformers 4.26.0.dev0
79
+ - Pytorch 1.13.0+cu117
80
+ - Datasets 2.7.1.dev0
81
+ - Tokenizers 0.13.2
train_20221217-004912.log CHANGED
@@ -131,3 +131,10 @@ xpu_backend=None,
131
  12/17/2022 02:39:16 - WARNING - datasets.download.streaming_download_manager - Got disconnected from remote data host. Retrying in 5sec [1/20]
132
  12/17/2022 03:40:45 - WARNING - huggingface_hub.repository - Adding files tracked by Git LFS: ['src/__pycache__/preprocess.cpython-38.pyc', 'src/__pycache__/run_speech_recognition_seq2seq_streaming.cpython-38.pyc']. This may take a bit of time if the files are large.
133
  12/17/2022 16:02:39 - INFO - __main__ - ShuffleCallback. shuffling train dataset. seed: 42. dataset epoch: 1
 
 
 
 
 
 
 
 
131
  12/17/2022 02:39:16 - WARNING - datasets.download.streaming_download_manager - Got disconnected from remote data host. Retrying in 5sec [1/20]
132
  12/17/2022 03:40:45 - WARNING - huggingface_hub.repository - Adding files tracked by Git LFS: ['src/__pycache__/preprocess.cpython-38.pyc', 'src/__pycache__/run_speech_recognition_seq2seq_streaming.cpython-38.pyc']. This may take a bit of time if the files are large.
133
  12/17/2022 16:02:39 - INFO - __main__ - ShuffleCallback. shuffling train dataset. seed: 42. dataset epoch: 1
134
+ 12/17/2022 17:58:40 - WARNING - huggingface_hub.repository - Several commits (2) will be pushed upstream.
135
+ 12/17/2022 17:58:40 - WARNING - huggingface_hub.repository - The progress bars may be unreliable.
136
+ 12/17/2022 17:58:44 - WARNING - huggingface_hub.repository - remote: Scanning LFS files for validity, may be slow...
137
+ remote: LFS file scan complete.
138
+ To https://huggingface.co/ales/whisper-base-belarusian
139
+ 4074dad..52d55ef main -> main
140
+
train_run_1.log CHANGED
@@ -36372,3 +36372,20 @@ Training completed. Do not forget to share your model on huggingface.co/models =
36372
  [INFO|tokenization_utils_base.py:2157] 2022-12-17 17:58:19,260 >> tokenizer config file saved in ./tokenizer_config.json
36373
  [INFO|tokenization_utils_base.py:2164] 2022-12-17 17:58:19,260 >> Special tokens file saved in ./special_tokens_map.json
36374
  [INFO|tokenization_utils_base.py:2210] 2022-12-17 17:58:19,260 >> added tokens file saved in ./added_tokens.json
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36372
  [INFO|tokenization_utils_base.py:2157] 2022-12-17 17:58:19,260 >> tokenizer config file saved in ./tokenizer_config.json
36373
  [INFO|tokenization_utils_base.py:2164] 2022-12-17 17:58:19,260 >> Special tokens file saved in ./special_tokens_map.json
36374
  [INFO|tokenization_utils_base.py:2210] 2022-12-17 17:58:19,260 >> added tokens file saved in ./added_tokens.json
36375
+ Several commits (2) will be pushed upstream.
36376
+ {'eval_loss': 0.10796044021844864, 'eval_wer': 12.206885082321635, 'eval_runtime': 2137.698, 'eval_samples_per_second': 7.425, 'eval_steps_per_second': 0.232, 'epoch': 1.1}
36377
+ {'train_runtime': 61737.3829, 'train_samples_per_second': 6.22, 'train_steps_per_second': 0.097, 'train_loss': 0.17721048017342886, 'epoch': 1.1}
36378
+ 12/17/2022 17:58:40 - WARNING - huggingface_hub.repository - Several commits (2) will be pushed upstream.
36379
+ The progress bars may be unreliable.
36380
+ 12/17/2022 17:58:40 - WARNING - huggingface_hub.repository - The progress bars may be unreliable.
36381
+
36382
+ remote: LFS file scan complete.
36383
+ To https://huggingface.co/ales/whisper-base-belarusian
36384
+ 4074dad..52d55ef main -> main
36385
+
36386
+ 12/17/2022 17:58:44 - WARNING - huggingface_hub.repository - remote: Scanning LFS files for validity, may be slow...
36387
+ remote: LFS file scan complete.
36388
+ To https://huggingface.co/ales/whisper-base-belarusian
36389
+ 4074dad..52d55ef main -> main
36390
+
36391
+