langgz commited on
Commit
158b0f3
·
1 Parent(s): d4c46ef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -1
README.md CHANGED
@@ -4,4 +4,64 @@ pipeline_tag: voice-activity-detection
4
  tags:
5
  - FunASR
6
  - FSMN-VAD
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  tags:
5
  - FunASR
6
  - FSMN-VAD
7
+ ---
8
+
9
+ ## Introduce
10
+
11
+
12
+ Voice activity detection (VAD) plays a important role in speech recognition systems by detecting the beginning and end of effective speech. FunASR provides an efficient VAD model based on the [FSMN structure](https://arxiv.org/abs/1803.05030). To improve model discrimination, we use monophones as modeling units, given the relatively rich speech information. During inference, the VAD system requires post-processing for improved robustness, including operations such as threshold settings and sliding windows.
13
+
14
+ This repository demonstrates how to leverage FSMN-VAD in conjunction with the funasr_onnx runtime. The underlying model is derived from [FunASR](https://github.com/alibaba-damo-academy/FunASR), which was trained on a massive 60,000-hour Mandarin dataset. Notably, Paraformer's performance secured the top spot on the [SpeechIO leaderboard](https://github.com/SpeechColab/Leaderboard), highlighting its exceptional capabilities in speech recognition.
15
+
16
+ We have relesed numerous industrial-grade models, including speech recognition, voice activity detection, punctuation restoration, speaker verification, speaker diarization, and timestamp prediction (force alignment). To learn more about these models, kindly refer to the [documentation](https://alibaba-damo-academy.github.io/FunASR/en/index.html) available on FunASR. If you are interested in leveraging advanced AI technology for your speech-related projects, we invite you to explore the possibilities offered by [FunASR](https://github.com/alibaba-damo-academy/FunASR).
17
+
18
+ ## Install funasr_onnx
19
+
20
+ ```shell
21
+ pip install -U funasr_onnx
22
+ # For the users in China, you could install with the command:
23
+ # pip install -U funasr_onnx -i https://mirror.sjtu.edu.cn/pypi/web/simple
24
+ ```
25
+
26
+ ## Download the model
27
+
28
+ ```shell
29
+ git clone https://huggingface.co/funasr/paraformer-large
30
+ ```
31
+
32
+ ## Inference with runtime
33
+
34
+ ### Voice Activity Detection
35
+ #### FSMN-VAD
36
+ ```python
37
+ from funasr_onnx import Fsmn_vad
38
+
39
+ model_dir = "./FSMN-VAD"
40
+ model = Fsmn_vad(model_dir, quantize=True)
41
+
42
+ wav_path = "./FSMN-VAD/asr_example.wav"
43
+
44
+ result = model(wav_path)
45
+ print(result)
46
+ ```
47
+ - `model_dir`: the model path, which contains `model.onnx`, `config.yaml`, `am.mvn`
48
+ - `batch_size`: `1` (Default), the batch size duration inference
49
+ - `device_id`: `-1` (Default), infer on CPU. If you want to infer with GPU, set it to gpu_id (Please make sure that you have install the onnxruntime-gpu)
50
+ - `quantize`: `False` (Default), load the model of `model.onnx` in `model_dir`. If set `True`, load the model of `model_quant.onnx` in `model_dir`
51
+ - `intra_op_num_threads`: `4` (Default), sets the number of threads used for intraop parallelism on CPU
52
+
53
+ Input: wav formt file, support formats: `str, np.ndarray, List[str]`
54
+
55
+ Output: `List[str]`: recognition result
56
+
57
+
58
+ ## Citations
59
+
60
+ ``` bibtex
61
+ @inproceedings{gao2022paraformer,
62
+ title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition},
63
+ author={Gao, Zhifu and Zhang, Shiliang and McLoughlin, Ian and Yan, Zhijie},
64
+ booktitle={INTERSPEECH},
65
+ year={2022}
66
+ }
67
+ ```