Spaces:

transiteration
/

nemo_stt_kz_quartznet15x5

Runtime error

App Files Files Community

transiteration commited on Jan 19

Commit

10782b9

•

1 Parent(s): 5970086

Update README.md

Browse files

Files changed (1) hide show

README.md +9 -101

README.md CHANGED Viewed

@@ -1,101 +1,9 @@
----
-language:
-- kk
-metrics:
-- wer
-library_name: nemo
-pipeline_tag: automatic-speech-recognition
-tags:
-- automatic-speech-recognition
-- speech
-- audio
-- pytorch
-- stt
----
-## Model Overview
-In order to prepare and experiment with the model, it's necessary to install [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo) [1].\
-\
-This model have been trained on NVIDIA GeForce RTX 2070:\
-Python 3.7.15\
-NumPy 1.21.6\
-PyTorch 1.21.1\
-NVIDIA NeMo 1.7.0
-```
-pip3 install nemo_toolkit['all']
-```
-## Model Usage:
-The model is accessible within the NeMo toolkit [1] and can serve as a pre-trained checkpoint for either making inferences or for fine-tuning on a different dataset.
-#### How to Import
-```
-import nemo.collections.asr as nemo_asr
-model = nemo_asr.models.EncDecCTCModel.restore_from(restore_path="stt_kz_quartznet15x5.nemo")
-```
-#### How to Train
-```
-python3 train.py --train_manifest path/to/manifest.json --val_manifest path/to/manifest.json --batch_size BATCH_SIZE --num_epochs NUM_EPOCHS  --model_save_path path/to/save/model.nemo
-```
-#### How to Evaluate
-```
-python3 evaluate.py --model_path /path/to/stt_kz_quartznet15x5.nemo --test_manifest path/to/manifest.json"
-```
-#### How to Transcribe Audio File
-Sample audio to test the model:
-```
-wget https://asr-kz-example.s3.us-west-2.amazonaws.com/sample_kz.wav
-```
-This line is to transcribe the single audio:
-```
-python3 transcibe.py --model_path /path/to/stt_kz_quartznet15x5.nemo --audio_file_path path/to/audio/file
-```
-## Input and Output
-This model can take input from mono-channel audio .WAV files with a sample rate of 16,000 KHz.\
-Then, this model gives you the spoken words in a text format for a given audio sample.
-## Model Architecture
-[QuartzNet 15x5](https://catalog.ngc.nvidia.com/orgs/nvidia/models/quartznet15x5) [2] is a Jasper-like network that uses separable convolutions and larger filter sizes. It has comparable accuracy to Jasper while having much fewer parameters. This particular model has 15 blocks each repeated 5 times.
-## Training and Dataset
-The model was finetuned to Kazakh speech based on the pre-trained English Model for over several epochs.
-[Kazakh Speech Corpus 2](https://issai.nu.edu.kz/kz-speech-corpus/?version=1.1) (KSC2) [3] is the first industrial-scale open-source Kazakh speech corpus.\
-In total, KSC2 contains around 1.2k hours of high-quality transcribed data comprising over 600k utterances.
-## Performance
-The model achieved:\
-Average WER: 13.53%\
-through the applying of **Greedy Decoding**.
-## Limitations
-Because the GPU has limited power, lightweight model architecture was used for fine-tuning.\
-In general, this makes it faster for inference but might show less overall performance.\
-In addition, if the speech includes technical terms or dialect words the model hasn't learned, it may not work as well.
-## Demonstration
-For inference and downloading the model, check on Hugging Face Space: [NeMo_STT_KZ_Quartznet15x5](https://huggingface.co/spaces/transiteration/nemo_stt_kz_quartznet15x5)
-## References
-[1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
-[2] [QuartzNet 15x5](https://catalog.ngc.nvidia.com/orgs/nvidia/models/quartznet15x5)
-[3] [Kazakh Speech Corpus 2](https://issai.nu.edu.kz/kz-speech-corpus/?version=1.1)

+title: stt_kz_quartznet15xt
+emoji: 🎤
+colorFrom: green
+colorTo: blue
+sdk: gradio
+sdk_version: 3.0.5
+app_file: app.py
+pinned: false
+license: mit