transiteration commited on
Commit
10782b9
1 Parent(s): 5970086

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -101
README.md CHANGED
@@ -1,101 +1,9 @@
1
- ---
2
- language:
3
- - kk
4
- metrics:
5
- - wer
6
- library_name: nemo
7
- pipeline_tag: automatic-speech-recognition
8
- tags:
9
- - automatic-speech-recognition
10
- - speech
11
- - audio
12
- - pytorch
13
- - stt
14
- ---
15
-
16
-
17
- ## Model Overview
18
-
19
- In order to prepare and experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].\
20
- \
21
- This model have been trained on NVIDIA GeForce RTX 2070:\
22
- Python 3.7.15\
23
- NumPy 1.21.6\
24
- PyTorch 1.21.1\
25
- NVIDIA NeMo 1.7.0
26
-
27
- ```
28
- pip3 install nemo_toolkit['all']
29
- ```
30
-
31
- ## Model Usage:
32
-
33
- The model is accessible within the NeMo toolkit [1] and can serve as a pre-trained checkpoint for either making inferences or for fine-tuning on a different dataset.
34
-
35
- #### How to Import
36
-
37
- ```
38
- import nemo.collections.asr as nemo_asr
39
- model = nemo_asr.models.EncDecCTCModel.restore_from(restore_path="stt_kz_quartznet15x5.nemo")
40
- ```
41
-
42
- #### How to Train
43
-
44
- ```
45
- python3 train.py --train_manifest path/to/manifest.json --val_manifest path/to/manifest.json --batch_size BATCH_SIZE --num_epochs NUM_EPOCHS --model_save_path path/to/save/model.nemo
46
- ```
47
-
48
- #### How to Evaluate
49
-
50
- ```
51
- python3 evaluate.py --model_path /path/to/stt_kz_quartznet15x5.nemo --test_manifest path/to/manifest.json"
52
- ```
53
-
54
- #### How to Transcribe Audio File
55
-
56
- Sample audio to test the model:
57
- ```
58
- wget https://asr-kz-example.s3.us-west-2.amazonaws.com/sample_kz.wav
59
- ```
60
- This line is to transcribe the single audio:
61
- ```
62
- python3 transcibe.py --model_path /path/to/stt_kz_quartznet15x5.nemo --audio_file_path path/to/audio/file
63
- ```
64
-
65
- ## Input and Output
66
-
67
- This model can take input from mono-channel audio .WAV files with a sample rate of 16,000 KHz.\
68
- Then, this model gives you the spoken words in a text format for a given audio sample.
69
-
70
- ## Model Architecture
71
-
72
- [QuartzNet 15x5](https://catalog.ngc.nvidia.com/orgs/nvidia/models/quartznet15x5) [2] is a Jasper-like network that uses separable convolutions and larger filter sizes. It has comparable accuracy to Jasper while having much fewer parameters. This particular model has 15 blocks each repeated 5 times.
73
-
74
- ## Training and Dataset
75
-
76
- The model was finetuned to Kazakh speech based on the pre-trained English Model for over several epochs.
77
- [Kazakh Speech Corpus 2](https://issai.nu.edu.kz/kz-speech-corpus/?version=1.1) (KSC2) [3] is the first industrial-scale open-source Kazakh speech corpus.\
78
- In total, KSC2 contains around 1.2k hours of high-quality transcribed data comprising over 600k utterances.
79
-
80
- ## Performance
81
- The model achieved:\
82
- Average WER: 13.53%\
83
- through the applying of **Greedy Decoding**.
84
-
85
- ## Limitations
86
-
87
- Because the GPU has limited power, lightweight model architecture was used for fine-tuning.\
88
- In general, this makes it faster for inference but might show less overall performance.\
89
- In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.
90
-
91
- ## Demonstration
92
-
93
- For inference and downloading the model, check on Hugging Face Space: [NeMo_STT_KZ_Quartznet15x5](https://huggingface.co/spaces/transiteration/nemo_stt_kz_quartznet15x5)
94
-
95
- ## References
96
-
97
- [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
98
-
99
- [2] [QuartzNet 15x5](https://catalog.ngc.nvidia.com/orgs/nvidia/models/quartznet15x5)
100
-
101
- [3] [Kazakh Speech Corpus 2](https://issai.nu.edu.kz/kz-speech-corpus/?version=1.1)
 
1
+ title: stt_kz_quartznet15xt
2
+ emoji: 🎤
3
+ colorFrom: green
4
+ colorTo: blue
5
+ sdk: gradio
6
+ sdk_version: 3.0.5
7
+ app_file: app.py
8
+ pinned: false
9
+ license: mit