pere commited on
Commit
4dfe3f0
1 Parent(s): c0d914a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -9
README.md CHANGED
@@ -30,28 +30,34 @@ model-index:
30
  ---
31
 
32
  # Norwegian Wav2Vec2 Model - 1B - Bokmål
33
- This achieves the following results on the evaluation set:
34
  - WER: 0.0668
35
  - CER: 0.0256
36
 
37
- ## Model description
38
- This is one of several Wav2Vec-models created during the HuggingFace hosted [Robust Speech Event](https://discuss.huggingface.co/t/open-to-the-community-robust-speech-recognition-challenge/13614?s=09). In parallell with the event, the team also converted the [Norwegian Parliamentary Speech Corpus (NPSC)](https://huggingface.co/datasets/NbAiLab/NPSC) to the HuggingFace Dataset format and used that as the main source for training.
 
39
 
 
 
40
 
41
  We do release all code developed during the event so that the Norwegian NLP community can build upon this to develop even better Norwegian ASR models. The finetuning of these models are not very compute demanding. You should after following the instructions here, be able to train your own automatic speech recognition system in less than a day with an average GPU.
42
 
 
 
 
43
  ## Training procedure
44
- To reproduce this, we strongly recommend that you follow the [instructions from HuggingFace](https://github.com/huggingface/transformers/tree/master/examples/research_projects/robust-speech-event#talks) to train a simple Swedish model.
45
 
46
  When you have verified that you are able to do this, create a new repo. You can then start by copying the files **run.sh** and **run_speech_recognition_ctc.py** from our repo. You should be able to reproduce our results by just running this script. With some tweaking, you will most likely be able to build an even better ASR.
47
 
48
  ### 5-gram Language Model
49
- HuggingFace has provided another [very nice blog](https://huggingface.co/blog/wav2vec2-with-ngram) about how to add a 5-gram language model to improve the ASR model. You can build this from your own corpus, for instance by extracting some suitable text from the [Norwegian Colossal Corpus](https://huggingface.co/datasets/NbAiLab/NCC). You can also skip some of the steps in the guide, and copy the [5-gram model from this repo](https://huggingface.co/NbAiLab/XLSR-300M-bokmaal/tree/main/language_model).
50
 
51
 
52
- ### Training settings
53
 
54
- The following settings and hyperparameters were used during training:
55
  ```
56
  --dataset_name="NbAiLab/NPSC"
57
  --model_name_or_path="facebook/wav2vec2-xls-r-1b"
@@ -81,7 +87,7 @@ The following settings and hyperparameters were used during training:
81
  --mask_time_length="10"
82
  --mask_feature_prob="0.25"
83
  --mask_feature_length="64"
84
- --gradient_checkpointing \
85
  --min_duration_in_seconds="0.5"
86
  --max_duration_in_seconds="30.0"
87
  --ctc_zero_infinity=True
@@ -92,4 +98,12 @@ The following settings and hyperparameters were used during training:
92
  --do_train --do_eval
93
  --push_to_hub
94
  --preprocessing_num_workers="16"
95
- ```
 
 
 
 
 
 
 
 
 
30
  ---
31
 
32
  # Norwegian Wav2Vec2 Model - 1B - Bokmål
33
+ This achieves the following results on the test set with a 5-gram KenLM:
34
  - WER: 0.0668
35
  - CER: 0.0256
36
 
37
+ Without using a language model, we are getting these results:
38
+ - WER: ???
39
+ - CER: ???
40
 
41
+ ## Model description
42
+ This is one of several Wav2Vec-models created during the 🤗 hosted [Robust Speech Event](https://discuss.huggingface.co/t/open-to-the-community-robust-speech-recognition-challenge/13614?s=09). In parallell with the event, the team also converted the [Norwegian Parliamentary Speech Corpus (NPSC)](https://huggingface.co/datasets/NbAiLab/NPSC) to the 🤗 Dataset format and used that as the main source for training.
43
 
44
  We do release all code developed during the event so that the Norwegian NLP community can build upon this to develop even better Norwegian ASR models. The finetuning of these models are not very compute demanding. You should after following the instructions here, be able to train your own automatic speech recognition system in less than a day with an average GPU.
45
 
46
+ ## Team
47
+ The following people contributed to building this model: Rolv-Arild Braaten, Per Egil Kummervold, Andre Kåsen, Javier de la Rosa, Per Erik Solberg, and Freddy Wetjen.
48
+
49
  ## Training procedure
50
+ To reproduce these results, we strongly recommend that you follow the [instructions from HuggingFace](https://github.com/huggingface/transformers/tree/master/examples/research_projects/robust-speech-event#talks) to train a simple Swedish model.
51
 
52
  When you have verified that you are able to do this, create a new repo. You can then start by copying the files **run.sh** and **run_speech_recognition_ctc.py** from our repo. You should be able to reproduce our results by just running this script. With some tweaking, you will most likely be able to build an even better ASR.
53
 
54
  ### 5-gram Language Model
55
+ Adding a language model will improve the results of the model. 🤗 has provided another [very nice blog](https://huggingface.co/blog/wav2vec2-with-ngram) about how to add a 5-gram language model to improve the ASR model. You can build this from your own corpus, for instance by extracting some suitable text from the [Norwegian Colossal Corpus](https://huggingface.co/datasets/NbAiLab/NCC). You can also skip some of the steps in the guide, and copy the [5-gram model from this repo](https://huggingface.co/NbAiLab/XLSR-300M-bokmaal/tree/main/language_model).
56
 
57
 
58
+ ### Parameters
59
 
60
+ The following parameters were used during training:
61
  ```
62
  --dataset_name="NbAiLab/NPSC"
63
  --model_name_or_path="facebook/wav2vec2-xls-r-1b"
 
87
  --mask_time_length="10"
88
  --mask_feature_prob="0.25"
89
  --mask_feature_length="64"
90
+ --gradient_checkpointing
91
  --min_duration_in_seconds="0.5"
92
  --max_duration_in_seconds="30.0"
93
  --ctc_zero_infinity=True
 
98
  --do_train --do_eval
99
  --push_to_hub
100
  --preprocessing_num_workers="16"
101
+ ```
102
+
103
+ This training will take 3-4 days on an average GPU. You might get a decent model and faster results by changing these parameters:
104
+ ```
105
+ --per_device_train_batch_size - Adjust this to the maximum of available memory. 16 or 24 might be good settings depending on your system
106
+ --gradient_accumulation_steps - Can be adjusted even further up to increase batch size and speed up training without running into memory issues
107
+ --learning_rate - Can be increased, maybe as high as 1e-4. Speeds up training but might add instability
108
+ --epochs - Can be decreased significantly. This is a huge dataset and you might get a decent result already after a couple of epochs
109
+ ```