update readme
Browse files
README.md
CHANGED
@@ -26,10 +26,10 @@ We propose a novel end-to-end speech recognition model, `Nue ASR`, which integra
|
|
26 |
|
27 |
The name `Nue` comes from the Japanese word ([`鵺/ぬえ/Nue`](https://en.wikipedia.org/wiki/Nue)), one of the Japanese legendary creatures ([`妖怪/ようかい/Yōkai`](https://en.wikipedia.org/wiki/Y%C5%8Dkai)).
|
28 |
|
29 |
-
This model
|
30 |
-
|
31 |
|
32 |
-
Benchmark
|
33 |
|
34 |
* **Model architecture**
|
35 |
|
@@ -40,7 +40,8 @@ Benchmark score including our models can be seen at https://rinnakk.github.io/re
|
|
40 |
|
41 |
* **Training**
|
42 |
|
43 |
-
The model was trained on approximately 19,000 hours of following Japanese speech corpus.
|
|
|
44 |
- [ReazonSpeech](https://huggingface.co/datasets/reazon-research/reazonspeech)
|
45 |
|
46 |
|
@@ -57,7 +58,11 @@ Benchmark score including our models can be seen at https://rinnakk.github.io/re
|
|
57 |
|
58 |
# How to use the model
|
59 |
|
60 |
-
|
|
|
|
|
|
|
|
|
61 |
|
62 |
```bash
|
63 |
pip install git+https://github.com/rinnakk/nue-asr.git
|
@@ -66,7 +71,7 @@ pip install git+https://github.com/rinnakk/nue-asr.git
|
|
66 |
Command-line interface and python interface are available.
|
67 |
|
68 |
## Command-line usage
|
69 |
-
The following command
|
70 |
Audio files will be automatically downsampled to 16kHz.
|
71 |
```bash
|
72 |
nue-asr audio1.wav
|
@@ -76,7 +81,7 @@ You can specify multiple audio files.
|
|
76 |
nue-asr audio1.wav audio2.flac audio3.mp3
|
77 |
```
|
78 |
|
79 |
-
We can use DeepSpeed-Inference to accelerate the inference speed of GPT-NeoX module.
|
80 |
If you use DeepSpeed-Inference, you need to install DeepSpeed.
|
81 |
```bash
|
82 |
pip install deepspeed
|
@@ -90,7 +95,7 @@ nue-asr --use-deepspeed audio1.wav
|
|
90 |
Run `nue-asr --help` for more information.
|
91 |
|
92 |
## Python usage
|
93 |
-
The example of
|
94 |
```python
|
95 |
import nue_asr
|
96 |
|
@@ -100,9 +105,9 @@ tokenizer = nue_asr.load_tokenizer("rinna/nue-asr")
|
|
100 |
result = nue_asr.transcribe(model, tokenizer, "path_to_audio.wav")
|
101 |
print(result.text)
|
102 |
```
|
103 |
-
`nue_asr.transcribe` function can accept audio data as either a `numpy.array` or a `torch.Tensor`, in addition to
|
104 |
|
105 |
-
|
106 |
```python
|
107 |
import nue_asr
|
108 |
|
@@ -158,6 +163,14 @@ The model uses the same sentencepiece-based tokenizer as [japanese-gpt-neox-3.6b
|
|
158 |
year={2021},
|
159 |
version={0.0.1},
|
160 |
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
161 |
```
|
162 |
---
|
163 |
|
|
|
26 |
|
27 |
The name `Nue` comes from the Japanese word ([`鵺/ぬえ/Nue`](https://en.wikipedia.org/wiki/Nue)), one of the Japanese legendary creatures ([`妖怪/ようかい/Yōkai`](https://en.wikipedia.org/wiki/Y%C5%8Dkai)).
|
28 |
|
29 |
+
This model provides end-to-end Japanese speech recognition with recognition accuracy comparable to the recent ASR models.
|
30 |
+
You can recognize speech faster than real time by using a GPU.
|
31 |
|
32 |
+
Benchmark scores, including our models, can be found at https://rinnakk.github.io/research/benchmarks/asr/
|
33 |
|
34 |
* **Model architecture**
|
35 |
|
|
|
40 |
|
41 |
* **Training**
|
42 |
|
43 |
+
The model was trained on approximately 19,000 hours of following Japanese speech corpus ReazonSpeech.
|
44 |
+
Note that speech samples longer than 16 seconds were excluded before training.
|
45 |
- [ReazonSpeech](https://huggingface.co/datasets/reazon-research/reazonspeech)
|
46 |
|
47 |
|
|
|
58 |
|
59 |
# How to use the model
|
60 |
|
61 |
+
We tested our code using Python 3.8.10 and 3.10.12 with [PyTorch](https://pytorch.org/) 2.1.1 and [Transformers](https://huggingface.co/docs/transformers) 4.35.2.
|
62 |
+
This codebase is expected to be compatible with Python 3.8 or later and recent PyTorch versions.
|
63 |
+
The version of Transformers should be 4.33.0 or higher.
|
64 |
+
|
65 |
+
First, install the code for inference of this model.
|
66 |
|
67 |
```bash
|
68 |
pip install git+https://github.com/rinnakk/nue-asr.git
|
|
|
71 |
Command-line interface and python interface are available.
|
72 |
|
73 |
## Command-line usage
|
74 |
+
The following command transcribes the audio file using the command line interface.
|
75 |
Audio files will be automatically downsampled to 16kHz.
|
76 |
```bash
|
77 |
nue-asr audio1.wav
|
|
|
81 |
nue-asr audio1.wav audio2.flac audio3.mp3
|
82 |
```
|
83 |
|
84 |
+
We can use [DeepSpeed-Inference](https://www.deepspeed.ai/inference/) to accelerate the inference speed of GPT-NeoX module.
|
85 |
If you use DeepSpeed-Inference, you need to install DeepSpeed.
|
86 |
```bash
|
87 |
pip install deepspeed
|
|
|
95 |
Run `nue-asr --help` for more information.
|
96 |
|
97 |
## Python usage
|
98 |
+
The example of Python interface is as follows:
|
99 |
```python
|
100 |
import nue_asr
|
101 |
|
|
|
105 |
result = nue_asr.transcribe(model, tokenizer, "path_to_audio.wav")
|
106 |
print(result.text)
|
107 |
```
|
108 |
+
`nue_asr.transcribe` function can accept audio data as either a `numpy.array` or a `torch.Tensor`, in addition to audio file paths.
|
109 |
|
110 |
+
Acceleration of inference speed using DeepSpeed-Inference is also available within the Python interface.
|
111 |
```python
|
112 |
import nue_asr
|
113 |
|
|
|
163 |
year={2021},
|
164 |
version={0.0.1},
|
165 |
}
|
166 |
+
|
167 |
+
@inproceedings{aminabadi2022deepspeed,
|
168 |
+
title={{DeepSpeed-Inference}: enabling efficient inference of transformer models at unprecedented scale},
|
169 |
+
author={Aminabadi, Reza Yazdani and Rajbhandari, Samyam and Awan, Ammar Ahmad and Li, Cheng and Li, Du and Zheng, Elton and Ruwase, Olatunji and Smith, Shaden and Zhang, Minjia and Rasley, Jeff and others},
|
170 |
+
booktitle={SC22: International Conference for High Performance Computing, Networking, Storage and Analysis},
|
171 |
+
pages={1--15},
|
172 |
+
year={2022}
|
173 |
+
}
|
174 |
```
|
175 |
---
|
176 |
|