--- language: - kr license: cc-by-4.0 library_name: nemo datasets: - RealCallData thumbnail: null tags: - automatic-speech-recognition - speech - audio - Citrinet1024 - NeMo - pytorch model-index: - name: stt_kr_citrinet1024_PublicCallCenter_1000H_0.26 results: [] --- ## Model Overview ## NVIDIA NeMo: Training To train, fine-tune or play with the model you will need to install [NVIDIA NeMo](https://github.com/NVIDIA/NeMo). We recommend you install it after you've installed latest Pytorch version. ``` pip install nemo_toolkit['all'] ``` ## How to Use this Model The model is available for use in the NeMo toolkit [1], and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset. ### Automatically instantiate the model ```python import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("ypluit/stt_kr_citrinet1024_PublicCallCenter_1000H_0.26") ``` ### Transcribing using Python First, let's get a sample ``` get any korean telephone voice wave file ``` Then simply do: ``` asr_model.transcribe(['sample-kr.wav']) ``` ### Transcribing many audio files ```shell python [NEMO_GIT_FOLDER]/examples/asr/transcribe_speech.py pretrained_name="model" audio_dir="" ``` ### Input This model accepts 16000Hz Mono-channel Audio (wav files) as input. ### Output This model provides transcribed speech as a string for a given audio sample. ## Model Architecture See nemo toolkit and reference papers. ## Training Learned about 20 days on 2 A6000 ### Datasets Private call center real data (1200hour) ## Performance 0.26 WER ## Limitations This model was trained with 1200 hours of Korean telephone voice data for customer service in a call center. might be Poor performance for general-purpose dialogue and specific accents. ## References [1] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)