ypluit
/

stt_kr_citrinet1024_PublicCallCenter_1000H_0.26

Automatic Speech Recognition

Model card Files Files and versions Community

ypluit commited on Jan 5, 2023

Commit

8c1a38e

•

1 Parent(s): 99312fd

Update README.md

Files changed (1) hide show

README.md +89 -1

README.md CHANGED Viewed

@@ -1,3 +1,91 @@
----
 license: cc-by-4.0
 ---

+language:
+- kr
 license: cc-by-4.0
+library_name: nemo
+datasets:
+- RealCallData
+thumbnail: null
+tags:
+- automatic-speech-recognition
+- speech
+- audio
+- Citrinet1024
+- NeMo
+- pytorch
+model-index:
+- name: stt_kr_citrinet1024_PublicCallCenter_1000H_0.26
+  results: []
 ---
+## Model Overview
+<DESCRIBE IN ONE LINE THE MODEL AND ITS USE>
+## NVIDIA NeMo: Training
+To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version.
+```
+pip install nemo_toolkit['all']
+```
+## How to Use this Model
+The model is available for use in the NeMo toolkit [1], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.
+### Automatically instantiate the model
+```python
+import nemo.collections.asr as nemo_asr
+asr_model = nemo_asr.models.ASRModel.from_pretrained("ypluit/stt_kr_citrinet1024_PublicCallCenter_1000H_0.26")
+```
+### Transcribing using Python
+First, let's get a sample
+```
+get any korean telephone voice wave file
+```
+Then simply do:
+```
+asr_model.transcribe(['sample-kr.wav'])
+```
+### Transcribing many audio files
+```shell
+python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py  pretrained_name="model"  audio_dir="<DIRECTORY CONTAINING AUDIO FILES>"
+```
+### Input
+This model accepts 16000Hz Mono-channel Audio (wav files) as input.
+### Output
+This model provides transcribed speech as a string for a given audio sample.
+## Model Architecture
+See nemo toolkit and reference papers.
+## Training
+Learned about 20 days on 2 A6000
+### Datasets
+Private call center real data (1200hour)
+## Performance
+0.26 WER
+## Limitations
+This model was trained with 1200 hours of Korean telephone voice data for customer service in a call center. might be Poor performance for general-purpose dialogue and specific accents.
+## References
+[1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)