yky-h commited on
Commit
802fd42
1 Parent(s): d538c6e

update readme

Browse files
Files changed (1) hide show
  1. README.md +23 -10
README.md CHANGED
@@ -26,10 +26,10 @@ We propose a novel end-to-end speech recognition model, `Nue ASR`, which integra
26
 
27
  The name `Nue` comes from the Japanese word ([`鵺/ぬえ/Nue`](https://en.wikipedia.org/wiki/Nue)), one of the Japanese legendary creatures ([`妖怪/ようかい/Yōkai`](https://en.wikipedia.org/wiki/Y%C5%8Dkai)).
28
 
29
- This model is capable of performing highly accurate Japanese speech recognition.
30
- By utilizing a GPU, it can recognize speech at speeds exceeding real-time.
31
 
32
- Benchmark score including our models can be seen at https://rinnakk.github.io/research/benchmarks/asr/
33
 
34
  * **Model architecture**
35
 
@@ -40,7 +40,8 @@ Benchmark score including our models can be seen at https://rinnakk.github.io/re
40
 
41
  * **Training**
42
 
43
- The model was trained on approximately 19,000 hours of following Japanese speech corpus.
 
44
  - [ReazonSpeech](https://huggingface.co/datasets/reazon-research/reazonspeech)
45
 
46
 
@@ -57,7 +58,11 @@ Benchmark score including our models can be seen at https://rinnakk.github.io/re
57
 
58
  # How to use the model
59
 
60
- First, install the code for inference this model.
 
 
 
 
61
 
62
  ```bash
63
  pip install git+https://github.com/rinnakk/nue-asr.git
@@ -66,7 +71,7 @@ pip install git+https://github.com/rinnakk/nue-asr.git
66
  Command-line interface and python interface are available.
67
 
68
  ## Command-line usage
69
- The following command will transcribe the audio file via the command line interface.
70
  Audio files will be automatically downsampled to 16kHz.
71
  ```bash
72
  nue-asr audio1.wav
@@ -76,7 +81,7 @@ You can specify multiple audio files.
76
  nue-asr audio1.wav audio2.flac audio3.mp3
77
  ```
78
 
79
- We can use DeepSpeed-Inference to accelerate the inference speed of GPT-NeoX module.
80
  If you use DeepSpeed-Inference, you need to install DeepSpeed.
81
  ```bash
82
  pip install deepspeed
@@ -90,7 +95,7 @@ nue-asr --use-deepspeed audio1.wav
90
  Run `nue-asr --help` for more information.
91
 
92
  ## Python usage
93
- The example of python interface is as follows:
94
  ```python
95
  import nue_asr
96
 
@@ -100,9 +105,9 @@ tokenizer = nue_asr.load_tokenizer("rinna/nue-asr")
100
  result = nue_asr.transcribe(model, tokenizer, "path_to_audio.wav")
101
  print(result.text)
102
  ```
103
- `nue_asr.transcribe` function can accept audio data as either a `numpy.array` or a `torch.Tensor`, in addition to traditional audio waveform file paths.
104
 
105
- Accelerating the inference speed of models using DeepSpeed-Inference is also available through the python interface.
106
  ```python
107
  import nue_asr
108
 
@@ -158,6 +163,14 @@ The model uses the same sentencepiece-based tokenizer as [japanese-gpt-neox-3.6b
158
  year={2021},
159
  version={0.0.1},
160
  }
 
 
 
 
 
 
 
 
161
  ```
162
  ---
163
 
 
26
 
27
  The name `Nue` comes from the Japanese word ([`鵺/ぬえ/Nue`](https://en.wikipedia.org/wiki/Nue)), one of the Japanese legendary creatures ([`妖怪/ようかい/Yōkai`](https://en.wikipedia.org/wiki/Y%C5%8Dkai)).
28
 
29
+ This model provides end-to-end Japanese speech recognition with recognition accuracy comparable to the recent ASR models.
30
+ You can recognize speech faster than real time by using a GPU.
31
 
32
+ Benchmark scores, including our models, can be found at https://rinnakk.github.io/research/benchmarks/asr/
33
 
34
  * **Model architecture**
35
 
 
40
 
41
  * **Training**
42
 
43
+ The model was trained on approximately 19,000 hours of following Japanese speech corpus ReazonSpeech.
44
+ Note that speech samples longer than 16 seconds were excluded before training.
45
  - [ReazonSpeech](https://huggingface.co/datasets/reazon-research/reazonspeech)
46
 
47
 
 
58
 
59
  # How to use the model
60
 
61
+ We tested our code using Python 3.8.10 and 3.10.12 with [PyTorch](https://pytorch.org/) 2.1.1 and [Transformers](https://huggingface.co/docs/transformers) 4.35.2.
62
+ This codebase is expected to be compatible with Python 3.8 or later and recent PyTorch versions.
63
+ The version of Transformers should be 4.33.0 or higher.
64
+
65
+ First, install the code for inference of this model.
66
 
67
  ```bash
68
  pip install git+https://github.com/rinnakk/nue-asr.git
 
71
  Command-line interface and python interface are available.
72
 
73
  ## Command-line usage
74
+ The following command transcribes the audio file using the command line interface.
75
  Audio files will be automatically downsampled to 16kHz.
76
  ```bash
77
  nue-asr audio1.wav
 
81
  nue-asr audio1.wav audio2.flac audio3.mp3
82
  ```
83
 
84
+ We can use [DeepSpeed-Inference](https://www.deepspeed.ai/inference/) to accelerate the inference speed of GPT-NeoX module.
85
  If you use DeepSpeed-Inference, you need to install DeepSpeed.
86
  ```bash
87
  pip install deepspeed
 
95
  Run `nue-asr --help` for more information.
96
 
97
  ## Python usage
98
+ The example of Python interface is as follows:
99
  ```python
100
  import nue_asr
101
 
 
105
  result = nue_asr.transcribe(model, tokenizer, "path_to_audio.wav")
106
  print(result.text)
107
  ```
108
+ `nue_asr.transcribe` function can accept audio data as either a `numpy.array` or a `torch.Tensor`, in addition to audio file paths.
109
 
110
+ Acceleration of inference speed using DeepSpeed-Inference is also available within the Python interface.
111
  ```python
112
  import nue_asr
113
 
 
163
  year={2021},
164
  version={0.0.1},
165
  }
166
+
167
+ @inproceedings{aminabadi2022deepspeed,
168
+ title={{DeepSpeed-Inference}: enabling efficient inference of transformer models at unprecedented scale},
169
+ author={Aminabadi, Reza Yazdani and Rajbhandari, Samyam and Awan, Ammar Ahmad and Li, Cheng and Li, Du and Zheng, Elton and Ruwase, Olatunji and Smith, Shaden and Zhang, Minjia and Rasley, Jeff and others},
170
+ booktitle={SC22: International Conference for High Performance Computing, Networking, Storage and Analysis},
171
+ pages={1--15},
172
+ year={2022}
173
+ }
174
  ```
175
  ---
176