nue-asr / README.md
yky-h's picture
add readme
d538c6e
|
raw
history blame
No virus
5.44 kB
metadata
thumbnail: https://github.com/rinnakk/japanese-pretrained-models/blob/master/rinna.png
language: ja
datasets:
  - reazon-research/reazonspeech
tags:
  - automatic-speech-recognition
  - speech
  - audio
  - hubert
  - gpt_neox
  - asr
  - nlp
license: apache-2.0

rinna/nue-asr

rinna-icon

Overview

[Paper] [GitHub]

We propose a novel end-to-end speech recognition model, Nue ASR, which integrates pre-trained speech and language models.

The name Nue comes from the Japanese word (鵺/ぬえ/Nue), one of the Japanese legendary creatures (妖怪/ようかい/Yōkai).

This model is capable of performing highly accurate Japanese speech recognition. By utilizing a GPU, it can recognize speech at speeds exceeding real-time.

Benchmark score including our models can be seen at https://rinnakk.github.io/research/benchmarks/asr/


How to use the model

First, install the code for inference this model.

pip install git+https://github.com/rinnakk/nue-asr.git

Command-line interface and python interface are available.

Command-line usage

The following command will transcribe the audio file via the command line interface. Audio files will be automatically downsampled to 16kHz.

nue-asr audio1.wav

You can specify multiple audio files.

nue-asr audio1.wav audio2.flac audio3.mp3

We can use DeepSpeed-Inference to accelerate the inference speed of GPT-NeoX module. If you use DeepSpeed-Inference, you need to install DeepSpeed.

pip install deepspeed

Then, you can use DeepSpeed-Inference as follows:

nue-asr --use-deepspeed audio1.wav

Run nue-asr --help for more information.

Python usage

The example of python interface is as follows:

import nue_asr

model = nue_asr.load_model("rinna/nue-asr")
tokenizer = nue_asr.load_tokenizer("rinna/nue-asr")

result = nue_asr.transcribe(model, tokenizer, "path_to_audio.wav")
print(result.text)

nue_asr.transcribe function can accept audio data as either a numpy.array or a torch.Tensor, in addition to traditional audio waveform file paths.

Accelerating the inference speed of models using DeepSpeed-Inference is also available through the python interface.

import nue_asr

model = nue_asr.load_model("rinna/nue-asr", use_deepspeed=True)
tokenizer = nue_asr.load_tokenizer("rinna/nue-asr")

result = nue_asr.transcribe(model, tokenizer, "path_to_audio.wav")
print(result.text)

Tokenization

The model uses the same sentencepiece-based tokenizer as japanese-gpt-neox-3.6b.


How to cite

@article{hono2023integration,
    title={An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition},
    author={Hono, Yukiya and Mitsuda, Koh and Zhao, Tianyu and Mitsui, Kentaro and Wakatsuki, Toshiaki and Sawada, Kei},
    journal={arXiv preprint arXiv:2312.03668},
    year={2023}
}

@misc{rinna-nue-asr,
    title={rinna/nue-asr},
    author={Hono, Yukiya and Mitsuda, Koh and Zhao, Tianyu and Mitsui, Kentaro and Wakatsuki, Toshiaki and Sawada, Kei},
    url={https://huggingface.co/rinna/nue-asr}
}

Citations

@article{hsu2021hubert,
    title={HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units},
    author={Hsu, Wei-Ning and Bolte, Benjamin and Tsai, Yao-Hung Hubert and Lakhotia, Kushal and Salakhutdinov, Ruslan and Mohamed, Abdelrahman},
    journal={IEEE/ACM Transactions on Audio, Speech, and Language Processing},
    year={2021},
    volume={29},
    pages={3451-3460},
    doi={10.1109/TASLP.2021.3122291}
}

@software{andoniangpt2021gpt,
    title={{GPT-NeoX: Large Scale Autoregressive Language Modeling in PyTorch}},
    author={Andonian, Alex and Anthony, Quentin and Biderman, Stella and Black, Sid and Gali, Preetham and Gao, Leo and Hallahan, Eric and Levy-Kramer, Josh and Leahy, Connor and Nestler, Lucas and Parker, Kip and Pieler, Michael and Purohit, Shivanshu and Songz, Tri and Phil, Wang and Weinbach, Samuel},
    url={https://www.github.com/eleutherai/gpt-neox},
    doi={10.5281/zenodo.5879544},
    month={8},
    year={2021},
    version={0.0.1},
}

License

The Apache 2.0 license