dominguesm commited on
Commit
14fecbf
1 Parent(s): 31b09b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -3
README.md CHANGED
@@ -93,8 +93,6 @@ QuartzNet models take in audio segments and transcribe them to letter, byte pair
93
 
94
  All training scripts will be available at: [DominguesM/stt_pt_quartznet15x5_ctc_small](https://github.com/DominguesM/stt_pt_quartznet15x5_ctc_small)
95
 
96
- **Soon more information**
97
-
98
 
99
  ### Datasets
100
 
@@ -104,12 +102,45 @@ The model was trained with a part of the Common Voices 9.0 dataset in Portuguese
104
 
105
  ## Performance
106
 
107
- **Coming soon**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
 
109
  ## Limitations
110
 
111
  Since this model was trained on publically available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
112
 
 
 
 
 
 
 
 
 
 
 
 
 
113
 
114
  ## References
115
 
 
93
 
94
  All training scripts will be available at: [DominguesM/stt_pt_quartznet15x5_ctc_small](https://github.com/DominguesM/stt_pt_quartznet15x5_ctc_small)
95
 
 
 
96
 
97
  ### Datasets
98
 
 
102
 
103
  ## Performance
104
 
105
+ | Metric | Score |
106
+ | ------- | ----- |
107
+ | WER | 49% |
108
+ | CER | 18% |
109
+
110
+ The metrics were obtained using the following code:
111
+
112
+ **Attention**: The steps below must be performed after downloading the dataset (Mozilla Commom Voices 9.0 PT) and following the steps of pre-processing the audio data and `manifest` files contained in the file [`notebooks/Finetuning CTC model Portuguese.ipynb`](https://github.com/DominguesM/stt_pt_quartznet15x5_ctc_small)
113
+
114
+ ```bash
115
+ $ wget -P scripts/ "https://raw.githubusercontent.com/NVIDIA/NeMo/v1.9.0/examples/asr/speech_to_text_eval.py"
116
+
117
+ $ wget -P scripts/ "https://raw.githubusercontent.com/NVIDIA/NeMo/v1.9.0/examples/asr/transcribe_speech.py"
118
+
119
+ $ python scripts/speech_to_text_eval.py \
120
+ pretrained_name="dominguesm/stt_pt_quartznet15x5_ctc_small" \
121
+ dataset_manifest="manifests/pt/commonvoice_test_manifest_processed.json" \
122
+ output_filename="./evaluation_transcripts.json" \
123
+ batch_size=32 \
124
+ amp=true \
125
+ use_cer=false
126
+ ```
127
 
128
  ## Limitations
129
 
130
  Since this model was trained on publically available speech datasets, the performance of this model might degrade for speech which includes technical terms, or vernacular that the model has not been trained on. The model might also perform worse for accented speech.
131
 
132
+ ## Citation
133
+
134
+ If you use our work, please cite:
135
+
136
+ ```cite
137
+ @misc{domingues2022quartznet15x15-small-portuguese,
138
+ title={Fine-tuned {Quartznet}-15x5 CTC small model for speech recognition in {P}ortuguese},
139
+ author={Domingues, Maicon},
140
+ howpublished={\url{https://huggingface.co/dominguesm/stt_pt_quartznet15x5_ctc_small}},
141
+ year={2022}
142
+ }
143
+ ```
144
 
145
  ## References
146