iqbalc commited on
Commit
380d2be
1 Parent(s): 6d6269c
Files changed (1) hide show
  1. README.md +16 -19
README.md CHANGED
@@ -58,13 +58,10 @@ asr_model = nemo_asr.models.ASRModel.from_pretrained("iqbalc/stt_de_conformer_tr
58
  ```
59
 
60
  ### Transcribing using Python
61
- First, let's get a sample
62
  ```
63
- wget https://dldata-public.s3.us-east-2.amazonaws.com/2086-149220-0033.wav
64
  ```
65
- Then simply do:
66
- ```
67
- asr_model.transcribe(['2086-149220-0033.wav'])
68
  ```
69
 
70
  ### Transcribing many audio files
@@ -83,32 +80,32 @@ This model provides transcribed speech as a string for a given audio sample.
83
 
84
  ## Model Architecture
85
 
86
- <ADD SOME INFORMATION ABOUT THE ARCHITECTURE>
87
 
88
  ## Training
89
 
90
- <ADD INFORMATION ABOUT HOW THE MODEL WAS TRAINED - HOW MANY EPOCHS, AMOUNT OF COMPUTE ETC>
 
 
91
 
92
  ### Datasets
93
 
94
- <LIST THE NAME AND SPLITS OF DATASETS USED TO TRAIN THIS MODEL (ALONG WITH LANGUAGE AND ANY ADDITIONAL INFORMATION)>
 
 
 
 
95
 
96
  ## Performance
97
 
98
- <LIST THE SCORES OF THE MODEL -
99
- OR
100
- USE THE Hugging Face Evaluate LiBRARY TO UPLOAD METRICS>
101
 
102
- ## Limitations
103
 
104
- <DECLARE ANY POTENTIAL LIMITATIONS OF THE MODEL>
105
 
106
- Eg:
107
- Since this model was trained on publically available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
108
 
109
 
110
  ## References
111
-
112
- <ADD ANY REFERENCES HERE AS NEEDED>
113
-
114
- [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
 
58
  ```
59
 
60
  ### Transcribing using Python
 
61
  ```
62
+ simply do:
63
  ```
64
+ asr_model.transcribe(['filename.wav'])
 
 
65
  ```
66
 
67
  ### Transcribing many audio files
 
80
 
81
  ## Model Architecture
82
 
83
+ Conformer-Transducer model is an autoregressive variant of Conformer model for Automatic Speech Recognition which uses Transducer loss/decoding
84
 
85
  ## Training
86
 
87
+ The NeMo toolkit was used for training the models. These models are fine-tuned with this example script and this base config.
88
+
89
+ The tokenizers for these models were built using the text transcripts of the train set with this script.
90
 
91
  ### Datasets
92
 
93
+ All the models in this collection are trained on a composite dataset comprising of over two thousand hours of cleaned German speech:
94
+
95
+ 1. MCV7.0 567 hours
96
+ 2. MLS 1524 hours
97
+ 3. VoxPopuli 214 hours
98
 
99
  ## Performance
100
 
101
+ Performances of the ASR models are reported in terms of Word Error Rate (WER%) with greedy decoding.
 
 
102
 
103
+ MCV7.0 test = 4.93
104
 
105
+ ## Limitations
106
 
107
+ The model might perform worse for accented speech
 
108
 
109
 
110
  ## References
111
+ [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)