lucy1118 commited on
Commit
0515f46
·
verified ·
1 Parent(s): 959f9c0

Delete hub

Browse files
hub/snakers4_silero-vad_master/.github/ISSUE_TEMPLATE/bug_report.md DELETED
@@ -1,52 +0,0 @@
1
- ---
2
- name: Bug report
3
- about: Create a report to help us improve
4
- title: Bug report - [X]
5
- labels: bug
6
- assignees: snakers4
7
-
8
- ---
9
-
10
- ## 🐛 Bug
11
-
12
- <!-- A clear and concise description of what the bug is. -->
13
-
14
- ## To Reproduce
15
-
16
- Steps to reproduce the behavior:
17
-
18
- 1.
19
- 2.
20
- 3.
21
-
22
- <!-- If you have a code sample, error messages, stack traces, please provide it here as well -->
23
-
24
- ## Expected behavior
25
-
26
- <!-- A clear and concise description of what you expected to happen. -->
27
-
28
- ## Environment
29
-
30
- Please copy and paste the output from this
31
- [environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)
32
- (or fill out the checklist below manually).
33
-
34
- You can get the script and run it with:
35
- ```
36
- wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
37
- # For security purposes, please check the contents of collect_env.py before running it.
38
- python collect_env.py
39
- ```
40
-
41
- - PyTorch Version (e.g., 1.0):
42
- - OS (e.g., Linux):
43
- - How you installed PyTorch (`conda`, `pip`, source):
44
- - Build command you used (if compiling from source):
45
- - Python version:
46
- - CUDA/cuDNN version:
47
- - GPU models and configuration:
48
- - Any other relevant information:
49
-
50
- ## Additional context
51
-
52
- <!-- Add any other context about the problem here. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
hub/snakers4_silero-vad_master/.github/ISSUE_TEMPLATE/feature_request.md DELETED
@@ -1,27 +0,0 @@
1
- ---
2
- name: Feature request
3
- about: Suggest an idea for this project
4
- title: Feature request - [X]
5
- labels: enhancement
6
- assignees: snakers4
7
-
8
- ---
9
-
10
- ## 🚀 Feature
11
- <!-- A clear and concise description of the feature proposal -->
12
-
13
- ## Motivation
14
-
15
- <!-- Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too -->
16
-
17
- ## Pitch
18
-
19
- <!-- A clear and concise description of what you want to happen. -->
20
-
21
- ## Alternatives
22
-
23
- <!-- A clear and concise description of any alternative solutions or features you've considered, if any. -->
24
-
25
- ## Additional context
26
-
27
- <!-- Add any other context or screenshots about the feature request here. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
hub/snakers4_silero-vad_master/.github/ISSUE_TEMPLATE/questions---help---support.md DELETED
@@ -1,12 +0,0 @@
1
- ---
2
- name: Questions / Help / Support
3
- about: Ask for help, support or ask a question
4
- title: "❓ Questions / Help / Support"
5
- labels: help wanted
6
- assignees: snakers4
7
-
8
- ---
9
-
10
- ## ❓ Questions and Help
11
-
12
- We have a [wiki](https://github.com/snakers4/silero-models/wiki) available for our users. Please make sure you have checked it out first.
 
 
 
 
 
 
 
 
 
 
 
 
 
hub/snakers4_silero-vad_master/CODE_OF_CONDUCT.md DELETED
@@ -1,76 +0,0 @@
1
- # Contributor Covenant Code of Conduct
2
-
3
- ## Our Pledge
4
-
5
- In the interest of fostering an open and welcoming environment, we as
6
- contributors and maintainers pledge to making participation in our project and
7
- our community a harassment-free experience for everyone, regardless of age, body
8
- size, disability, ethnicity, sex characteristics, gender identity and expression,
9
- level of experience, education, socio-economic status, nationality, personal
10
- appearance, race, religion, or sexual identity and orientation.
11
-
12
- ## Our Standards
13
-
14
- Examples of behavior that contributes to creating a positive environment
15
- include:
16
-
17
- * Using welcoming and inclusive language
18
- * Being respectful of differing viewpoints and experiences
19
- * Gracefully accepting constructive criticism
20
- * Focusing on what is best for the community
21
- * Showing empathy towards other community members
22
-
23
- Examples of unacceptable behavior by participants include:
24
-
25
- * The use of sexualized language or imagery and unwelcome sexual attention or
26
- advances
27
- * Trolling, insulting/derogatory comments, and personal or political attacks
28
- * Public or private harassment
29
- * Publishing others' private information, such as a physical or electronic
30
- address, without explicit permission
31
- * Other conduct which could reasonably be considered inappropriate in a
32
- professional setting
33
-
34
- ## Our Responsibilities
35
-
36
- Project maintainers are responsible for clarifying the standards of acceptable
37
- behavior and are expected to take appropriate and fair corrective action in
38
- response to any instances of unacceptable behavior.
39
-
40
- Project maintainers have the right and responsibility to remove, edit, or
41
- reject comments, commits, code, wiki edits, issues, and other contributions
42
- that are not aligned to this Code of Conduct, or to ban temporarily or
43
- permanently any contributor for other behaviors that they deem inappropriate,
44
- threatening, offensive, or harmful.
45
-
46
- ## Scope
47
-
48
- This Code of Conduct applies both within project spaces and in public spaces
49
- when an individual is representing the project or its community. Examples of
50
- representing a project or community include using an official project e-mail
51
- address, posting via an official social media account, or acting as an appointed
52
- representative at an online or offline event. Representation of a project may be
53
- further defined and clarified by project maintainers.
54
-
55
- ## Enforcement
56
-
57
- Instances of abusive, harassing, or otherwise unacceptable behavior may be
58
- reported by contacting the project team at aveysov@gmail.com. All
59
- complaints will be reviewed and investigated and will result in a response that
60
- is deemed necessary and appropriate to the circumstances. The project team is
61
- obligated to maintain confidentiality with regard to the reporter of an incident.
62
- Further details of specific enforcement policies may be posted separately.
63
-
64
- Project maintainers who do not follow or enforce the Code of Conduct in good
65
- faith may face temporary or permanent repercussions as determined by other
66
- members of the project's leadership.
67
-
68
- ## Attribution
69
-
70
- This Code of Conduct is adapted from the [Contributor Covenant][homepage], version 1.4,
71
- available at https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
72
-
73
- [homepage]: https://www.contributor-covenant.org
74
-
75
- For answers to common questions about this code of conduct, see
76
- https://www.contributor-covenant.org/faq
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
hub/snakers4_silero-vad_master/LICENSE DELETED
@@ -1,21 +0,0 @@
1
- MIT License
2
-
3
- Copyright (c) 2020-present Silero Team
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in all
13
- copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
- SOFTWARE.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
hub/snakers4_silero-vad_master/README.md DELETED
@@ -1,113 +0,0 @@
1
- [![Mailing list : test](http://img.shields.io/badge/Email-gray.svg?style=for-the-badge&logo=gmail)](mailto:hello@silero.ai) [![Mailing list : test](http://img.shields.io/badge/Telegram-blue.svg?style=for-the-badge&logo=telegram)](https://t.me/silero_speech) [![License: CC BY-NC 4.0](https://img.shields.io/badge/License-MIT-lightgrey.svg?style=for-the-badge)](https://github.com/snakers4/silero-vad/blob/master/LICENSE)
2
-
3
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/snakers4/silero-vad/blob/master/silero-vad.ipynb)
4
-
5
- ![header](https://user-images.githubusercontent.com/12515440/89997349-b3523080-dc94-11ea-9906-ca2e8bc50535.png)
6
-
7
- <br/>
8
- <h1 align="center">Silero VAD</h1>
9
- <br/>
10
-
11
- **Silero VAD** - pre-trained enterprise-grade [Voice Activity Detector](https://en.wikipedia.org/wiki/Voice_activity_detection) (also see our [STT models](https://github.com/snakers4/silero-models)).
12
-
13
- This repository also includes Number Detector and Language classifier [models](https://github.com/snakers4/silero-vad/wiki/Other-Models)
14
-
15
- <br/>
16
-
17
- <p align="center">
18
- <img src="https://user-images.githubusercontent.com/36505480/198026365-8da383e0-5398-4a12-b7f8-22c2c0059512.png" />
19
- </p>
20
-
21
- <details>
22
- <summary>Real Time Example</summary>
23
-
24
- https://user-images.githubusercontent.com/36505480/144874384-95f80f6d-a4f1-42cc-9be7-004c891dd481.mp4
25
-
26
- </details>
27
-
28
- <br/>
29
- <h2 align="center">Key Features</h2>
30
- <br/>
31
-
32
- - **Stellar accuracy**
33
-
34
- Silero VAD has [excellent results](https://github.com/snakers4/silero-vad/wiki/Quality-Metrics#vs-other-available-solutions) on speech detection tasks.
35
-
36
- - **Fast**
37
-
38
- One audio chunk (30+ ms) [takes](https://github.com/snakers4/silero-vad/wiki/Performance-Metrics#silero-vad-performance-metrics) less than **1ms** to be processed on a single CPU thread. Using batching or GPU can also improve performance considerably. Under certain conditions ONNX may even run up to 4-5x faster.
39
-
40
- - **Lightweight**
41
-
42
- JIT model is around one megabyte in size.
43
-
44
- - **General**
45
-
46
- Silero VAD was trained on huge corpora that include over **100** languages and it performs well on audios from different domains with various background noise and quality levels.
47
-
48
- - **Flexible sampling rate**
49
-
50
- Silero VAD [supports](https://github.com/snakers4/silero-vad/wiki/Quality-Metrics#sample-rate-comparison) **8000 Hz** and **16000 Hz** [sampling rates](https://en.wikipedia.org/wiki/Sampling_(signal_processing)#Sampling_rate).
51
-
52
- - **Flexible chunk size**
53
-
54
- Model was trained on **30 ms**. Longer chunks are supported directly, others may work as well.
55
-
56
- - **Highly Portable**
57
-
58
- Silero VAD reaps benefits from the rich ecosystems built around **PyTorch** and **ONNX** running everywhere where these runtimes are available.
59
-
60
- - **No Strings Attached**
61
-
62
- Published under permissive license (MIT) Silero VAD has zero strings attached - no telemetry, no keys, no registration, no built-in expiration, no keys or vendor lock.
63
-
64
- <br/>
65
- <h2 align="center">Typical Use Cases</h2>
66
- <br/>
67
-
68
- - Voice activity detection for IOT / edge / mobile use cases
69
- - Data cleaning and preparation, voice detection in general
70
- - Telephony and call-center automation, voice bots
71
- - Voice interfaces
72
-
73
- <br/>
74
- <h2 align="center">Links</h2>
75
- <br/>
76
-
77
-
78
- - [Examples and Dependencies](https://github.com/snakers4/silero-vad/wiki/Examples-and-Dependencies#dependencies)
79
- - [Quality Metrics](https://github.com/snakers4/silero-vad/wiki/Quality-Metrics)
80
- - [Performance Metrics](https://github.com/snakers4/silero-vad/wiki/Performance-Metrics)
81
- - [Number Detector and Language classifier models](https://github.com/snakers4/silero-vad/wiki/Other-Models)
82
- - [Versions and Available Models](https://github.com/snakers4/silero-vad/wiki/Version-history-and-Available-Models)
83
- - [Further reading](https://github.com/snakers4/silero-models#further-reading)
84
- - [FAQ](https://github.com/snakers4/silero-vad/wiki/FAQ)
85
-
86
- <br/>
87
- <h2 align="center">Get In Touch</h2>
88
- <br/>
89
-
90
- Try our models, create an [issue](https://github.com/snakers4/silero-vad/issues/new), start a [discussion](https://github.com/snakers4/silero-vad/discussions/new), join our telegram [chat](https://t.me/silero_speech), [email](mailto:hello@silero.ai) us, read our [news](https://t.me/silero_news).
91
-
92
- Please see our [wiki](https://github.com/snakers4/silero-models/wiki) and [tiers](https://github.com/snakers4/silero-models/wiki/Licensing-and-Tiers) for relevant information and [email](mailto:hello@silero.ai) us directly.
93
-
94
- **Citations**
95
-
96
- ```
97
- @misc{Silero VAD,
98
- author = {Silero Team},
99
- title = {Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier},
100
- year = {2021},
101
- publisher = {GitHub},
102
- journal = {GitHub repository},
103
- howpublished = {\url{https://github.com/snakers4/silero-vad}},
104
- commit = {insert_some_commit_here},
105
- email = {hello@silero.ai}
106
- }
107
- ```
108
-
109
- <br/>
110
- <h2 align="center">VAD-based Community Apps</h2>
111
- <br/>
112
-
113
- - Voice activity detection for the [browser](https://github.com/ricky0123/vad) using ONNX Runtime Web
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
hub/snakers4_silero-vad_master/examples/colab_record_example.ipynb DELETED
@@ -1,241 +0,0 @@
1
- {
2
- "cells": [
3
- {
4
- "cell_type": "markdown",
5
- "metadata": {
6
- "id": "bccAucKjnPHm"
7
- },
8
- "source": [
9
- "### Dependencies and inputs"
10
- ]
11
- },
12
- {
13
- "cell_type": "code",
14
- "execution_count": null,
15
- "metadata": {
16
- "id": "cSih95WFmwgi"
17
- },
18
- "outputs": [],
19
- "source": [
20
- "!pip -q install pydub\n",
21
- "from google.colab import output\n",
22
- "from base64 import b64decode, b64encode\n",
23
- "from io import BytesIO\n",
24
- "import numpy as np\n",
25
- "from pydub import AudioSegment\n",
26
- "from IPython.display import HTML, display\n",
27
- "import torch\n",
28
- "import matplotlib.pyplot as plt\n",
29
- "import moviepy.editor as mpe\n",
30
- "from matplotlib.animation import FuncAnimation, FFMpegWriter\n",
31
- "import matplotlib\n",
32
- "matplotlib.use('Agg')\n",
33
- "\n",
34
- "torch.set_num_threads(1)\n",
35
- "\n",
36
- "model, _ = torch.hub.load(repo_or_dir='snakers4/silero-vad',\n",
37
- " model='silero_vad',\n",
38
- " force_reload=True)\n",
39
- "\n",
40
- "def int2float(sound):\n",
41
- " abs_max = np.abs(sound).max()\n",
42
- " sound = sound.astype('float32')\n",
43
- " if abs_max > 0:\n",
44
- " sound *= 1/abs_max\n",
45
- " sound = sound.squeeze()\n",
46
- " return sound\n",
47
- "\n",
48
- "AUDIO_HTML = \"\"\"\n",
49
- "<script>\n",
50
- "var my_div = document.createElement(\"DIV\");\n",
51
- "var my_p = document.createElement(\"P\");\n",
52
- "var my_btn = document.createElement(\"BUTTON\");\n",
53
- "var t = document.createTextNode(\"Press to start recording\");\n",
54
- "\n",
55
- "my_btn.appendChild(t);\n",
56
- "//my_p.appendChild(my_btn);\n",
57
- "my_div.appendChild(my_btn);\n",
58
- "document.body.appendChild(my_div);\n",
59
- "\n",
60
- "var base64data = 0;\n",
61
- "var reader;\n",
62
- "var recorder, gumStream;\n",
63
- "var recordButton = my_btn;\n",
64
- "\n",
65
- "var handleSuccess = function(stream) {\n",
66
- " gumStream = stream;\n",
67
- " var options = {\n",
68
- " //bitsPerSecond: 8000, //chrome seems to ignore, always 48k\n",
69
- " mimeType : 'audio/webm;codecs=opus'\n",
70
- " //mimeType : 'audio/webm;codecs=pcm'\n",
71
- " }; \n",
72
- " //recorder = new MediaRecorder(stream, options);\n",
73
- " recorder = new MediaRecorder(stream);\n",
74
- " recorder.ondataavailable = function(e) { \n",
75
- " var url = URL.createObjectURL(e.data);\n",
76
- " // var preview = document.createElement('audio');\n",
77
- " // preview.controls = true;\n",
78
- " // preview.src = url;\n",
79
- " // document.body.appendChild(preview);\n",
80
- "\n",
81
- " reader = new FileReader();\n",
82
- " reader.readAsDataURL(e.data); \n",
83
- " reader.onloadend = function() {\n",
84
- " base64data = reader.result;\n",
85
- " //console.log(\"Inside FileReader:\" + base64data);\n",
86
- " }\n",
87
- " };\n",
88
- " recorder.start();\n",
89
- " };\n",
90
- "\n",
91
- "recordButton.innerText = \"Recording... press to stop\";\n",
92
- "\n",
93
- "navigator.mediaDevices.getUserMedia({audio: true}).then(handleSuccess);\n",
94
- "\n",
95
- "\n",
96
- "function toggleRecording() {\n",
97
- " if (recorder && recorder.state == \"recording\") {\n",
98
- " recorder.stop();\n",
99
- " gumStream.getAudioTracks()[0].stop();\n",
100
- " recordButton.innerText = \"Saving recording...\"\n",
101
- " }\n",
102
- "}\n",
103
- "\n",
104
- "// https://stackoverflow.com/a/951057\n",
105
- "function sleep(ms) {\n",
106
- " return new Promise(resolve => setTimeout(resolve, ms));\n",
107
- "}\n",
108
- "\n",
109
- "var data = new Promise(resolve=>{\n",
110
- "//recordButton.addEventListener(\"click\", toggleRecording);\n",
111
- "recordButton.onclick = ()=>{\n",
112
- "toggleRecording()\n",
113
- "\n",
114
- "sleep(2000).then(() => {\n",
115
- " // wait 2000ms for the data to be available...\n",
116
- " // ideally this should use something like await...\n",
117
- " //console.log(\"Inside data:\" + base64data)\n",
118
- " resolve(base64data.toString())\n",
119
- "\n",
120
- "});\n",
121
- "\n",
122
- "}\n",
123
- "});\n",
124
- " \n",
125
- "</script>\n",
126
- "\"\"\"\n",
127
- "\n",
128
- "def record(sec=10):\n",
129
- " display(HTML(AUDIO_HTML))\n",
130
- " s = output.eval_js(\"data\")\n",
131
- " b = b64decode(s.split(',')[1])\n",
132
- " audio = AudioSegment.from_file(BytesIO(b))\n",
133
- " audio.export('test.mp3', format='mp3')\n",
134
- " audio = audio.set_channels(1)\n",
135
- " audio = audio.set_frame_rate(16000)\n",
136
- " audio_float = int2float(np.array(audio.get_array_of_samples()))\n",
137
- " audio_tens = torch.tensor(audio_float )\n",
138
- " return audio_tens\n",
139
- "\n",
140
- "def make_animation(probs, audio_duration, interval=40):\n",
141
- " fig = plt.figure(figsize=(16, 9))\n",
142
- " ax = plt.axes(xlim=(0, audio_duration), ylim=(0, 1.02))\n",
143
- " line, = ax.plot([], [], lw=2)\n",
144
- " x = [i / 16000 * 512 for i in range(len(probs))]\n",
145
- " plt.xlabel('Time, seconds', fontsize=16)\n",
146
- " plt.ylabel('Speech Probability', fontsize=16)\n",
147
- "\n",
148
- " def init():\n",
149
- " plt.fill_between(x, probs, color='#064273')\n",
150
- " line.set_data([], [])\n",
151
- " line.set_color('#990000')\n",
152
- " return line,\n",
153
- "\n",
154
- " def animate(i):\n",
155
- " x = i * interval / 1000 - 0.04\n",
156
- " y = np.linspace(0, 1.02, 2)\n",
157
- " \n",
158
- " line.set_data(x, y)\n",
159
- " line.set_color('#990000')\n",
160
- " return line,\n",
161
- "\n",
162
- " anim = FuncAnimation(fig, animate, init_func=init, interval=interval, save_count=audio_duration / (interval / 1000))\n",
163
- "\n",
164
- " f = r\"animation.mp4\" \n",
165
- " writervideo = FFMpegWriter(fps=1000/interval) \n",
166
- " anim.save(f, writer=writervideo)\n",
167
- " plt.close('all')\n",
168
- "\n",
169
- "def combine_audio(vidname, audname, outname, fps=25): \n",
170
- " my_clip = mpe.VideoFileClip(vidname, verbose=False)\n",
171
- " audio_background = mpe.AudioFileClip(audname)\n",
172
- " final_clip = my_clip.set_audio(audio_background)\n",
173
- " final_clip.write_videofile(outname,fps=fps,verbose=False)\n",
174
- "\n",
175
- "def record_make_animation():\n",
176
- " tensor = record()\n",
177
- "\n",
178
- " print('Calculating probabilities...')\n",
179
- " speech_probs = []\n",
180
- " window_size_samples = 512\n",
181
- " for i in range(0, len(tensor), window_size_samples):\n",
182
- " if len(tensor[i: i+ window_size_samples]) < window_size_samples:\n",
183
- " break\n",
184
- " speech_prob = model(tensor[i: i+ window_size_samples], 16000).item()\n",
185
- " speech_probs.append(speech_prob)\n",
186
- " model.reset_states()\n",
187
- " print('Making animation...')\n",
188
- " make_animation(speech_probs, len(tensor) / 16000)\n",
189
- "\n",
190
- " print('Merging your voice with animation...')\n",
191
- " combine_audio('animation.mp4', 'test.mp3', 'merged.mp4')\n",
192
- " print('Done!')\n",
193
- " mp4 = open('merged.mp4','rb').read()\n",
194
- " data_url = \"data:video/mp4;base64,\" + b64encode(mp4).decode()\n",
195
- " display(HTML(\"\"\"\n",
196
- " <video width=800 controls>\n",
197
- " <source src=\"%s\" type=\"video/mp4\">\n",
198
- " </video>\n",
199
- " \"\"\" % data_url))"
200
- ]
201
- },
202
- {
203
- "cell_type": "markdown",
204
- "metadata": {
205
- "id": "IFVs3GvTnpB1"
206
- },
207
- "source": [
208
- "## Record example"
209
- ]
210
- },
211
- {
212
- "cell_type": "code",
213
- "execution_count": null,
214
- "metadata": {
215
- "id": "5EBjrTwiqAaQ"
216
- },
217
- "outputs": [],
218
- "source": [
219
- "record_make_animation()"
220
- ]
221
- }
222
- ],
223
- "metadata": {
224
- "colab": {
225
- "collapsed_sections": [
226
- "bccAucKjnPHm"
227
- ],
228
- "name": "Untitled2.ipynb",
229
- "provenance": []
230
- },
231
- "kernelspec": {
232
- "display_name": "Python 3",
233
- "name": "python3"
234
- },
235
- "language_info": {
236
- "name": "python"
237
- }
238
- },
239
- "nbformat": 4,
240
- "nbformat_minor": 0
241
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
hub/snakers4_silero-vad_master/examples/microphone_and_webRTC_integration/README.md DELETED
@@ -1,28 +0,0 @@
1
-
2
- In this example, an integration with the microphone and the webRTC VAD has been done. I used [this](https://github.com/mozilla/DeepSpeech-examples/tree/r0.8/mic_vad_streaming) as a draft.
3
- Here a short video to present the results:
4
-
5
- https://user-images.githubusercontent.com/28188499/116685087-182ff100-a9b2-11eb-927d-ed9f621226ee.mp4
6
-
7
- # Requirements:
8
- The libraries used for the following example are:
9
- ```
10
- Python == 3.6.9
11
- webrtcvad >= 2.0.10
12
- torchaudio >= 0.8.1
13
- torch >= 1.8.1
14
- halo >= 0.0.31
15
- Soundfile >= 0.13.3
16
- ```
17
- Using pip3:
18
- ```
19
- pip3 install webrtcvad
20
- pip3 install torchaudio
21
- pip3 install torch
22
- pip3 install halo
23
- pip3 install soundfile
24
- ```
25
- Moreover, to make the code easier, the default sample_rate is 16KHz without resampling.
26
-
27
- This example has been tested on ``` ubuntu 18.04.3 LTS```
28
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
hub/snakers4_silero-vad_master/examples/microphone_and_webRTC_integration/microphone_and_webRTC_integration.py DELETED
@@ -1,201 +0,0 @@
1
- import collections, queue
2
- import numpy as np
3
- import pyaudio
4
- import webrtcvad
5
- from halo import Halo
6
- import torch
7
- import torchaudio
8
-
9
- class Audio(object):
10
- """Streams raw audio from microphone. Data is received in a separate thread, and stored in a buffer, to be read from."""
11
-
12
- FORMAT = pyaudio.paInt16
13
- # Network/VAD rate-space
14
- RATE_PROCESS = 16000
15
- CHANNELS = 1
16
- BLOCKS_PER_SECOND = 50
17
-
18
- def __init__(self, callback=None, device=None, input_rate=RATE_PROCESS):
19
- def proxy_callback(in_data, frame_count, time_info, status):
20
- #pylint: disable=unused-argument
21
- callback(in_data)
22
- return (None, pyaudio.paContinue)
23
- if callback is None: callback = lambda in_data: self.buffer_queue.put(in_data)
24
- self.buffer_queue = queue.Queue()
25
- self.device = device
26
- self.input_rate = input_rate
27
- self.sample_rate = self.RATE_PROCESS
28
- self.block_size = int(self.RATE_PROCESS / float(self.BLOCKS_PER_SECOND))
29
- self.block_size_input = int(self.input_rate / float(self.BLOCKS_PER_SECOND))
30
- self.pa = pyaudio.PyAudio()
31
-
32
- kwargs = {
33
- 'format': self.FORMAT,
34
- 'channels': self.CHANNELS,
35
- 'rate': self.input_rate,
36
- 'input': True,
37
- 'frames_per_buffer': self.block_size_input,
38
- 'stream_callback': proxy_callback,
39
- }
40
-
41
- self.chunk = None
42
- # if not default device
43
- if self.device:
44
- kwargs['input_device_index'] = self.device
45
-
46
- self.stream = self.pa.open(**kwargs)
47
- self.stream.start_stream()
48
-
49
- def read(self):
50
- """Return a block of audio data, blocking if necessary."""
51
- return self.buffer_queue.get()
52
-
53
- def destroy(self):
54
- self.stream.stop_stream()
55
- self.stream.close()
56
- self.pa.terminate()
57
-
58
- frame_duration_ms = property(lambda self: 1000 * self.block_size // self.sample_rate)
59
-
60
-
61
- class VADAudio(Audio):
62
- """Filter & segment audio with voice activity detection."""
63
-
64
- def __init__(self, aggressiveness=3, device=None, input_rate=None):
65
- super().__init__(device=device, input_rate=input_rate)
66
- self.vad = webrtcvad.Vad(aggressiveness)
67
-
68
- def frame_generator(self):
69
- """Generator that yields all audio frames from microphone."""
70
- if self.input_rate == self.RATE_PROCESS:
71
- while True:
72
- yield self.read()
73
- else:
74
- raise Exception("Resampling required")
75
-
76
- def vad_collector(self, padding_ms=300, ratio=0.75, frames=None):
77
- """Generator that yields series of consecutive audio frames comprising each utterence, separated by yielding a single None.
78
- Determines voice activity by ratio of frames in padding_ms. Uses a buffer to include padding_ms prior to being triggered.
79
- Example: (frame, ..., frame, None, frame, ..., frame, None, ...)
80
- |---utterence---| |---utterence---|
81
- """
82
- if frames is None: frames = self.frame_generator()
83
- num_padding_frames = padding_ms // self.frame_duration_ms
84
- ring_buffer = collections.deque(maxlen=num_padding_frames)
85
- triggered = False
86
-
87
- for frame in frames:
88
- if len(frame) < 640:
89
- return
90
-
91
- is_speech = self.vad.is_speech(frame, self.sample_rate)
92
-
93
- if not triggered:
94
- ring_buffer.append((frame, is_speech))
95
- num_voiced = len([f for f, speech in ring_buffer if speech])
96
- if num_voiced > ratio * ring_buffer.maxlen:
97
- triggered = True
98
- for f, s in ring_buffer:
99
- yield f
100
- ring_buffer.clear()
101
-
102
- else:
103
- yield frame
104
- ring_buffer.append((frame, is_speech))
105
- num_unvoiced = len([f for f, speech in ring_buffer if not speech])
106
- if num_unvoiced > ratio * ring_buffer.maxlen:
107
- triggered = False
108
- yield None
109
- ring_buffer.clear()
110
-
111
- def main(ARGS):
112
- # Start audio with VAD
113
- vad_audio = VADAudio(aggressiveness=ARGS.webRTC_aggressiveness,
114
- device=ARGS.device,
115
- input_rate=ARGS.rate)
116
-
117
- print("Listening (ctrl-C to exit)...")
118
- frames = vad_audio.vad_collector()
119
-
120
- # load silero VAD
121
- torchaudio.set_audio_backend("soundfile")
122
- model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
123
- model=ARGS.silaro_model_name,
124
- force_reload= ARGS.reload)
125
- (get_speech_ts,_,_, _,_, _, _) = utils
126
-
127
-
128
- # Stream from microphone to DeepSpeech using VAD
129
- spinner = None
130
- if not ARGS.nospinner:
131
- spinner = Halo(spinner='line')
132
- wav_data = bytearray()
133
- for frame in frames:
134
- if frame is not None:
135
- if spinner: spinner.start()
136
-
137
- wav_data.extend(frame)
138
- else:
139
- if spinner: spinner.stop()
140
- print("webRTC has detected a possible speech")
141
-
142
- newsound= np.frombuffer(wav_data,np.int16)
143
- audio_float32=Int2Float(newsound)
144
- time_stamps =get_speech_ts(audio_float32, model,num_steps=ARGS.num_steps,trig_sum=ARGS.trig_sum,neg_trig_sum=ARGS.neg_trig_sum,
145
- num_samples_per_window=ARGS.num_samples_per_window,min_speech_samples=ARGS.min_speech_samples,
146
- min_silence_samples=ARGS.min_silence_samples)
147
-
148
- if(len(time_stamps)>0):
149
- print("silero VAD has detected a possible speech")
150
- else:
151
- print("silero VAD has detected a noise")
152
- print()
153
- wav_data = bytearray()
154
-
155
-
156
- def Int2Float(sound):
157
- _sound = np.copy(sound) #
158
- abs_max = np.abs(_sound).max()
159
- _sound = _sound.astype('float32')
160
- if abs_max > 0:
161
- _sound *= 1/abs_max
162
- audio_float32 = torch.from_numpy(_sound.squeeze())
163
- return audio_float32
164
-
165
- if __name__ == '__main__':
166
- DEFAULT_SAMPLE_RATE = 16000
167
-
168
- import argparse
169
- parser = argparse.ArgumentParser(description="Stream from microphone to webRTC and silero VAD")
170
-
171
- parser.add_argument('-v', '--webRTC_aggressiveness', type=int, default=3,
172
- help="Set aggressiveness of webRTC: an integer between 0 and 3, 0 being the least aggressive about filtering out non-speech, 3 the most aggressive. Default: 3")
173
- parser.add_argument('--nospinner', action='store_true',
174
- help="Disable spinner")
175
- parser.add_argument('-d', '--device', type=int, default=None,
176
- help="Device input index (Int) as listed by pyaudio.PyAudio.get_device_info_by_index(). If not provided, falls back to PyAudio.get_default_device().")
177
-
178
- parser.add_argument('-name', '--silaro_model_name', type=str, default="silero_vad",
179
- help="select the name of the model. You can select between 'silero_vad',''silero_vad_micro','silero_vad_micro_8k','silero_vad_mini','silero_vad_mini_8k'")
180
- parser.add_argument('--reload', action='store_true',help="download the last version of the silero vad")
181
-
182
- parser.add_argument('-ts', '--trig_sum', type=float, default=0.25,
183
- help="overlapping windows are used for each audio chunk, trig sum defines average probability among those windows for switching into triggered state (speech state)")
184
-
185
- parser.add_argument('-nts', '--neg_trig_sum', type=float, default=0.07,
186
- help="same as trig_sum, but for switching from triggered to non-triggered state (non-speech)")
187
-
188
- parser.add_argument('-N', '--num_steps', type=int, default=8,
189
- help="nubmer of overlapping windows to split audio chunk into (we recommend 4 or 8)")
190
-
191
- parser.add_argument('-nspw', '--num_samples_per_window', type=int, default=4000,
192
- help="number of samples in each window, our models were trained using 4000 samples (250 ms) per window, so this is preferable value (lesser values reduce quality)")
193
-
194
- parser.add_argument('-msps', '--min_speech_samples', type=int, default=10000,
195
- help="minimum speech chunk duration in samples")
196
-
197
- parser.add_argument('-msis', '--min_silence_samples', type=int, default=500,
198
- help=" minimum silence duration in samples between to separate speech chunks")
199
- ARGS = parser.parse_args()
200
- ARGS.rate=DEFAULT_SAMPLE_RATE
201
- main(ARGS)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
hub/snakers4_silero-vad_master/examples/pyaudio-streaming/README.md DELETED
@@ -1,20 +0,0 @@
1
- # Pyaudio Streaming Example
2
-
3
- This example notebook shows how micophone audio fetched by pyaudio can be processed with Silero-VAD.
4
-
5
- It has been designed as a low-level example for binary real-time streaming using only the prediction of the model, processing the binary data and plotting the speech probabilities at the end to visualize it.
6
-
7
- Currently, the notebook consits of two examples:
8
- - One that records audio of a predefined length from the microphone, process it with Silero-VAD, and plots it afterwards.
9
- - The other one plots the speech probabilities in real-time (using jupyterplot) and records the audio until you press enter.
10
-
11
- ## Example Video for the Real-Time Visualization
12
-
13
-
14
- https://user-images.githubusercontent.com/8079748/117580455-4622dd00-b0f8-11eb-858d-e6368ed4eada.mp4
15
-
16
-
17
-
18
-
19
-
20
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
hub/snakers4_silero-vad_master/examples/pyaudio-streaming/pyaudio-streaming-examples.ipynb DELETED
@@ -1,331 +0,0 @@
1
- {
2
- "cells": [
3
- {
4
- "cell_type": "markdown",
5
- "id": "62a0cccb",
6
- "metadata": {},
7
- "source": [
8
- "# Pyaudio Microphone Streaming Examples\n",
9
- "\n",
10
- "A simple notebook that uses pyaudio to get the microphone audio and feeds this audio then to Silero VAD.\n",
11
- "\n",
12
- "I created it as an example on how binary data from a stream could be feed into Silero VAD.\n",
13
- "\n",
14
- "\n",
15
- "Has been tested on Ubuntu 21.04 (x86). After you installed the dependencies below, no additional setup is required."
16
- ]
17
- },
18
- {
19
- "cell_type": "markdown",
20
- "id": "64cbe1eb",
21
- "metadata": {},
22
- "source": [
23
- "## Dependencies\n",
24
- "The cell below lists all used dependencies and the used versions. Uncomment to install them from within the notebook."
25
- ]
26
- },
27
- {
28
- "cell_type": "code",
29
- "execution_count": null,
30
- "id": "57bc2aac",
31
- "metadata": {},
32
- "outputs": [],
33
- "source": [
34
- "#!pip install numpy==1.20.2\n",
35
- "#!pip install torch==1.9.0\n",
36
- "#!pip install matplotlib==3.4.2\n",
37
- "#!pip install torchaudio==0.9.0\n",
38
- "#!pip install soundfile==0.10.3.post1\n",
39
- "#!pip install pyaudio==0.2.11"
40
- ]
41
- },
42
- {
43
- "cell_type": "markdown",
44
- "id": "110de761",
45
- "metadata": {},
46
- "source": [
47
- "## Imports"
48
- ]
49
- },
50
- {
51
- "cell_type": "code",
52
- "execution_count": null,
53
- "id": "5a647d8d",
54
- "metadata": {},
55
- "outputs": [],
56
- "source": [
57
- "import io\n",
58
- "import numpy as np\n",
59
- "import torch\n",
60
- "torch.set_num_threads(1)\n",
61
- "import torchaudio\n",
62
- "import matplotlib\n",
63
- "import matplotlib.pylab as plt\n",
64
- "torchaudio.set_audio_backend(\"soundfile\")\n",
65
- "import pyaudio"
66
- ]
67
- },
68
- {
69
- "cell_type": "code",
70
- "execution_count": null,
71
- "id": "725d7066",
72
- "metadata": {},
73
- "outputs": [],
74
- "source": [
75
- "model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',\n",
76
- " model='silero_vad',\n",
77
- " force_reload=True)"
78
- ]
79
- },
80
- {
81
- "cell_type": "code",
82
- "execution_count": null,
83
- "id": "1c0b2ea7",
84
- "metadata": {},
85
- "outputs": [],
86
- "source": [
87
- "(get_speech_timestamps,\n",
88
- " save_audio,\n",
89
- " read_audio,\n",
90
- " VADIterator,\n",
91
- " collect_chunks) = utils"
92
- ]
93
- },
94
- {
95
- "cell_type": "markdown",
96
- "id": "f9112603",
97
- "metadata": {},
98
- "source": [
99
- "### Helper Methods"
100
- ]
101
- },
102
- {
103
- "cell_type": "code",
104
- "execution_count": null,
105
- "id": "5abc6330",
106
- "metadata": {},
107
- "outputs": [],
108
- "source": [
109
- "# Taken from utils_vad.py\n",
110
- "def validate(model,\n",
111
- " inputs: torch.Tensor):\n",
112
- " with torch.no_grad():\n",
113
- " outs = model(inputs)\n",
114
- " return outs\n",
115
- "\n",
116
- "# Provided by Alexander Veysov\n",
117
- "def int2float(sound):\n",
118
- " abs_max = np.abs(sound).max()\n",
119
- " sound = sound.astype('float32')\n",
120
- " if abs_max > 0:\n",
121
- " sound *= 1/abs_max\n",
122
- " sound = sound.squeeze() # depends on the use case\n",
123
- " return sound"
124
- ]
125
- },
126
- {
127
- "cell_type": "markdown",
128
- "id": "5124095e",
129
- "metadata": {},
130
- "source": [
131
- "## Pyaudio Set-up"
132
- ]
133
- },
134
- {
135
- "cell_type": "code",
136
- "execution_count": null,
137
- "id": "a845356e",
138
- "metadata": {},
139
- "outputs": [],
140
- "source": [
141
- "FORMAT = pyaudio.paInt16\n",
142
- "CHANNELS = 1\n",
143
- "SAMPLE_RATE = 16000\n",
144
- "CHUNK = int(SAMPLE_RATE / 10)\n",
145
- "\n",
146
- "audio = pyaudio.PyAudio()"
147
- ]
148
- },
149
- {
150
- "cell_type": "markdown",
151
- "id": "0b910c99",
152
- "metadata": {},
153
- "source": [
154
- "## Simple Example\n",
155
- "The following example reads the audio as 250ms chunks from the microphone, converts them to a Pytorch Tensor, and gets the probabilities/confidences if the model thinks the frame is voiced."
156
- ]
157
- },
158
- {
159
- "cell_type": "code",
160
- "execution_count": null,
161
- "id": "9d3d2c10",
162
- "metadata": {},
163
- "outputs": [],
164
- "source": [
165
- "num_samples = 1536"
166
- ]
167
- },
168
- {
169
- "cell_type": "code",
170
- "execution_count": null,
171
- "id": "3cb44a4a",
172
- "metadata": {},
173
- "outputs": [],
174
- "source": [
175
- "stream = audio.open(format=FORMAT,\n",
176
- " channels=CHANNELS,\n",
177
- " rate=SAMPLE_RATE,\n",
178
- " input=True,\n",
179
- " frames_per_buffer=CHUNK)\n",
180
- "data = []\n",
181
- "voiced_confidences = []\n",
182
- "\n",
183
- "print(\"Started Recording\")\n",
184
- "for i in range(0, frames_to_record):\n",
185
- " \n",
186
- " audio_chunk = stream.read(num_samples)\n",
187
- " \n",
188
- " # in case you want to save the audio later\n",
189
- " data.append(audio_chunk)\n",
190
- " \n",
191
- " audio_int16 = np.frombuffer(audio_chunk, np.int16);\n",
192
- "\n",
193
- " audio_float32 = int2float(audio_int16)\n",
194
- " \n",
195
- " # get the confidences and add them to the list to plot them later\n",
196
- " new_confidence = model(torch.from_numpy(audio_float32), 16000).item()\n",
197
- " voiced_confidences.append(new_confidence)\n",
198
- " \n",
199
- "print(\"Stopped the recording\")\n",
200
- "\n",
201
- "# plot the confidences for the speech\n",
202
- "plt.figure(figsize=(20,6))\n",
203
- "plt.plot(voiced_confidences)\n",
204
- "plt.show()"
205
- ]
206
- },
207
- {
208
- "cell_type": "markdown",
209
- "id": "a3dda982",
210
- "metadata": {},
211
- "source": [
212
- "## Real Time Visualization\n",
213
- "\n",
214
- "As an enhancement to plot the speech probabilities in real time I added the implementation below.\n",
215
- "In contrast to the simeple one, it records the audio until to stop the recording by pressing enter.\n",
216
- "While looking into good ways to update matplotlib plots in real-time, I found a simple libarary that does the job. https://github.com/lvwerra/jupyterplot It has some limitations, but works for this use case really well.\n"
217
- ]
218
- },
219
- {
220
- "cell_type": "code",
221
- "execution_count": null,
222
- "id": "05ef4100",
223
- "metadata": {},
224
- "outputs": [],
225
- "source": [
226
- "#!pip install jupyterplot==0.0.3"
227
- ]
228
- },
229
- {
230
- "cell_type": "code",
231
- "execution_count": null,
232
- "id": "d1d4cdd6",
233
- "metadata": {},
234
- "outputs": [],
235
- "source": [
236
- "from jupyterplot import ProgressPlot\n",
237
- "import threading\n",
238
- "\n",
239
- "continue_recording = True\n",
240
- "\n",
241
- "def stop():\n",
242
- " input(\"Press Enter to stop the recording:\")\n",
243
- " global continue_recording\n",
244
- " continue_recording = False\n",
245
- "\n",
246
- "def start_recording():\n",
247
- " \n",
248
- " stream = audio.open(format=FORMAT,\n",
249
- " channels=CHANNELS,\n",
250
- " rate=SAMPLE_RATE,\n",
251
- " input=True,\n",
252
- " frames_per_buffer=CHUNK)\n",
253
- "\n",
254
- " data = []\n",
255
- " voiced_confidences = []\n",
256
- " \n",
257
- " global continue_recording\n",
258
- " continue_recording = True\n",
259
- " \n",
260
- " pp = ProgressPlot(plot_names=[\"Silero VAD\"],line_names=[\"speech probabilities\"], x_label=\"audio chunks\")\n",
261
- " \n",
262
- " stop_listener = threading.Thread(target=stop)\n",
263
- " stop_listener.start()\n",
264
- "\n",
265
- " while continue_recording:\n",
266
- " \n",
267
- " audio_chunk = stream.read(num_samples)\n",
268
- " \n",
269
- " # in case you want to save the audio later\n",
270
- " data.append(audio_chunk)\n",
271
- " \n",
272
- " audio_int16 = np.frombuffer(audio_chunk, np.int16);\n",
273
- "\n",
274
- " audio_float32 = int2float(audio_int16)\n",
275
- " \n",
276
- " # get the confidences and add them to the list to plot them later\n",
277
- " new_confidence = model(torch.from_numpy(audio_float32), 16000).item()\n",
278
- " voiced_confidences.append(new_confidence)\n",
279
- " \n",
280
- " pp.update(new_confidence)\n",
281
- "\n",
282
- "\n",
283
- " pp.finalize()"
284
- ]
285
- },
286
- {
287
- "cell_type": "code",
288
- "execution_count": null,
289
- "id": "1e398009",
290
- "metadata": {},
291
- "outputs": [],
292
- "source": [
293
- "start_recording()"
294
- ]
295
- }
296
- ],
297
- "metadata": {
298
- "kernelspec": {
299
- "display_name": "Python 3",
300
- "language": "python",
301
- "name": "python3"
302
- },
303
- "language_info": {
304
- "codemirror_mode": {
305
- "name": "ipython",
306
- "version": 3
307
- },
308
- "file_extension": ".py",
309
- "mimetype": "text/x-python",
310
- "name": "python",
311
- "nbconvert_exporter": "python",
312
- "pygments_lexer": "ipython3",
313
- "version": "3.7.10"
314
- },
315
- "toc": {
316
- "base_numbering": 1,
317
- "nav_menu": {},
318
- "number_sections": true,
319
- "sideBar": true,
320
- "skip_h1_title": false,
321
- "title_cell": "Table of Contents",
322
- "title_sidebar": "Contents",
323
- "toc_cell": false,
324
- "toc_position": {},
325
- "toc_section_display": true,
326
- "toc_window_display": false
327
- }
328
- },
329
- "nbformat": 4,
330
- "nbformat_minor": 5
331
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
hub/snakers4_silero-vad_master/files/lang_dict_95.json DELETED
@@ -1 +0,0 @@
1
- {"59": "mg, Malagasy", "76": "tk, Turkmen", "20": "lb, Luxembourgish, Letzeburgesch", "62": "or, Oriya", "30": "en, English", "26": "oc, Occitan", "69": "no, Norwegian", "77": "sr, Serbian", "90": "bs, Bosnian", "71": "el, Greek, Modern (1453\u2013)", "15": "az, Azerbaijani", "12": "lo, Lao", "85": "zh-HK, Chinese", "79": "cs, Czech", "43": "sv, Swedish", "37": "mn, Mongolian", "32": "fi, Finnish", "51": "tg, Tajik", "46": "am, Amharic", "17": "nn, Norwegian Nynorsk", "40": "ja, Japanese", "8": "it, Italian", "21": "ha, Hausa", "11": "as, Assamese", "29": "fa, Persian", "82": "bn, Bengali", "54": "mk, Macedonian", "31": "sw, Swahili", "45": "vi, Vietnamese", "41": "ur, Urdu", "74": "bo, Tibetan", "4": "hi, Hindi", "86": "mr, Marathi", "3": "fy-NL, Western Frisian", "65": "sk, Slovak", "2": "ln, Lingala", "92": "gl, Galician", "53": "sn, Shona", "87": "su, Sundanese", "35": "tt, Tatar", "93": "kn, Kannada", "6": "yo, Yoruba", "27": "ps, Pashto, Pushto", "34": "hy, Armenian", "25": "pa-IN, Punjabi, Panjabi", "23": "nl, Dutch, Flemish", "48": "th, Thai", "73": "mt, Maltese", "55": "ar, Arabic", "89": "ba, Bashkir", "78": "bg, Bulgarian", "42": "yi, Yiddish", "5": "ru, Russian", "84": "sv-SE, Swedish", "80": "tr, Turkish", "33": "sq, Albanian", "38": "kk, Kazakh", "50": "pl, Polish", "9": "hr, Croatian", "66": "ky, Kirghiz, Kyrgyz", "49": "hu, Hungarian", "10": "si, Sinhala, Sinhalese", "56": "la, Latin", "75": "de, German", "14": "ko, Korean", "22": "id, Indonesian", "47": "sl, Slovenian", "57": "be, Belarusian", "36": "ta, Tamil", "7": "da, Danish", "91": "sd, Sindhi", "28": "et, Estonian", "63": "pt, Portuguese", "60": "ne, Nepali", "94": "zh-TW, Chinese", "18": "zh-CN, Chinese", "88": "rw, Kinyarwanda", "19": "es, Spanish, Castilian", "39": "ht, Haitian, Haitian Creole", "64": "tl, Tagalog", "83": "ms, Malay", "70": "ro, Romanian, Moldavian, Moldovan", "68": "pa, Punjabi, Panjabi", "52": "uz, Uzbek", "58": "km, Central Khmer", "67": "my, Burmese", "0": "fr, French", "24": "af, Afrikaans", "16": "gu, Gujarati", "81": "so, Somali", "13": "uk, Ukrainian", "44": "ca, Catalan, Valencian", "72": "ml, Malayalam", "61": "te, Telugu", "1": "zh, Chinese"}
 
 
hub/snakers4_silero-vad_master/files/lang_group_dict_95.json DELETED
@@ -1 +0,0 @@
1
- {"0": ["Afrikaans", "Dutch, Flemish", "Western Frisian"], "1": ["Turkish", "Azerbaijani"], "2": ["Russian", "Slovak", "Ukrainian", "Czech", "Polish", "Belarusian"], "3": ["Bulgarian", "Macedonian", "Serbian", "Croatian", "Bosnian", "Slovenian"], "4": ["Norwegian Nynorsk", "Swedish", "Danish", "Norwegian"], "5": ["English"], "6": ["Finnish", "Estonian"], "7": ["Yiddish", "Luxembourgish, Letzeburgesch", "German"], "8": ["Spanish", "Occitan", "Portuguese", "Catalan, Valencian", "Galician", "Spanish, Castilian", "Italian"], "9": ["Maltese", "Arabic"], "10": ["Marathi"], "11": ["Hindi", "Urdu"], "12": ["Lao", "Thai"], "13": ["Malay", "Indonesian"], "14": ["Romanian, Moldavian, Moldovan"], "15": ["Tagalog"], "16": ["Tajik", "Persian"], "17": ["Kazakh", "Uzbek", "Kirghiz, Kyrgyz"], "18": ["Kinyarwanda"], "19": ["Tatar", "Bashkir"], "20": ["French"], "21": ["Chinese"], "22": ["Lingala"], "23": ["Yoruba"], "24": ["Sinhala, Sinhalese"], "25": ["Assamese"], "26": ["Korean"], "27": ["Gujarati"], "28": ["Hausa"], "29": ["Punjabi, Panjabi"], "30": ["Pashto, Pushto"], "31": ["Swahili"], "32": ["Albanian"], "33": ["Armenian"], "34": ["Mongolian"], "35": ["Tamil"], "36": ["Haitian, Haitian Creole"], "37": ["Japanese"], "38": ["Vietnamese"], "39": ["Amharic"], "40": ["Hungarian"], "41": ["Shona"], "42": ["Latin"], "43": ["Central Khmer"], "44": ["Malagasy"], "45": ["Nepali"], "46": ["Telugu"], "47": ["Oriya"], "48": ["Burmese"], "49": ["Greek, Modern (1453\u2013)"], "50": ["Malayalam"], "51": ["Tibetan"], "52": ["Turkmen"], "53": ["Somali"], "54": ["Bengali"], "55": ["Sundanese"], "56": ["Sindhi"], "57": ["Kannada"]}
 
 
hub/snakers4_silero-vad_master/files/silero_logo.jpg DELETED
Binary file (23.9 kB)
 
hub/snakers4_silero-vad_master/files/silero_vad.jit DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:082e21870cf7722b0c7fa5228eaed579efb6870df81192b79bed3f7bac2f738a
3
- size 1439299
 
 
 
 
hub/snakers4_silero-vad_master/files/silero_vad.onnx DELETED
@@ -1,3 +0,0 @@
1
- version https://git-lfs.github.com/spec/v1
2
- oid sha256:a35ebf52fd3ce5f1469b2a36158dba761bc47b973ea3382b3186ca15b1f5af28
3
- size 1807522
 
 
 
 
hub/snakers4_silero-vad_master/hubconf.py DELETED
@@ -1,105 +0,0 @@
1
- dependencies = ['torch', 'torchaudio']
2
- import torch
3
- import os
4
- import json
5
- from utils_vad import (init_jit_model,
6
- get_speech_timestamps,
7
- get_number_ts,
8
- get_language,
9
- get_language_and_group,
10
- save_audio,
11
- read_audio,
12
- VADIterator,
13
- collect_chunks,
14
- drop_chunks,
15
- Validator,
16
- OnnxWrapper)
17
-
18
-
19
- def versiontuple(v):
20
- return tuple(map(int, (v.split('+')[0].split("."))))
21
-
22
-
23
- def silero_vad(onnx=False, force_onnx_cpu=False):
24
- """Silero Voice Activity Detector
25
- Returns a model with a set of utils
26
- Please see https://github.com/snakers4/silero-vad for usage examples
27
- """
28
-
29
- if not onnx:
30
- installed_version = torch.__version__
31
- supported_version = '1.12.0'
32
- if versiontuple(installed_version) < versiontuple(supported_version):
33
- raise Exception(f'Please install torch {supported_version} or greater ({installed_version} installed)')
34
-
35
- model_dir = os.path.join(os.path.dirname(__file__), 'files')
36
- if onnx:
37
- model = OnnxWrapper(os.path.join(model_dir, 'silero_vad.onnx'))
38
- else:
39
- model = init_jit_model(os.path.join(model_dir, 'silero_vad.jit'))
40
- utils = (get_speech_timestamps,
41
- save_audio,
42
- read_audio,
43
- VADIterator,
44
- collect_chunks)
45
-
46
- return model, utils
47
-
48
-
49
- def silero_number_detector(onnx=False, force_onnx_cpu=False):
50
- """Silero Number Detector
51
- Returns a model with a set of utils
52
- Please see https://github.com/snakers4/silero-vad for usage examples
53
- """
54
- if onnx:
55
- url = 'https://models.silero.ai/vad_models/number_detector.onnx'
56
- else:
57
- url = 'https://models.silero.ai/vad_models/number_detector.jit'
58
- model = Validator(url, force_onnx_cpu)
59
- utils = (get_number_ts,
60
- save_audio,
61
- read_audio,
62
- collect_chunks,
63
- drop_chunks)
64
-
65
- return model, utils
66
-
67
-
68
- def silero_lang_detector(onnx=False, force_onnx_cpu=False):
69
- """Silero Language Classifier
70
- Returns a model with a set of utils
71
- Please see https://github.com/snakers4/silero-vad for usage examples
72
- """
73
- if onnx:
74
- url = 'https://models.silero.ai/vad_models/number_detector.onnx'
75
- else:
76
- url = 'https://models.silero.ai/vad_models/number_detector.jit'
77
- model = Validator(url, force_onnx_cpu)
78
- utils = (get_language,
79
- read_audio)
80
-
81
- return model, utils
82
-
83
-
84
- def silero_lang_detector_95(onnx=False, force_onnx_cpu=False):
85
- """Silero Language Classifier (95 languages)
86
- Returns a model with a set of utils
87
- Please see https://github.com/snakers4/silero-vad for usage examples
88
- """
89
-
90
- if onnx:
91
- url = 'https://models.silero.ai/vad_models/lang_classifier_95.onnx'
92
- else:
93
- url = 'https://models.silero.ai/vad_models/lang_classifier_95.jit'
94
- model = Validator(url, force_onnx_cpu)
95
-
96
- model_dir = os.path.join(os.path.dirname(__file__), 'files')
97
- with open(os.path.join(model_dir, 'lang_dict_95.json'), 'r') as f:
98
- lang_dict = json.load(f)
99
-
100
- with open(os.path.join(model_dir, 'lang_group_dict_95.json'), 'r') as f:
101
- lang_group_dict = json.load(f)
102
-
103
- utils = (get_language_and_group, read_audio)
104
-
105
- return model, lang_dict, lang_group_dict, utils
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
hub/snakers4_silero-vad_master/silero-vad.ipynb DELETED
@@ -1,445 +0,0 @@
1
- {
2
- "cells": [
3
- {
4
- "cell_type": "markdown",
5
- "metadata": {
6
- "id": "FpMplOCA2Fwp"
7
- },
8
- "source": [
9
- "#VAD"
10
- ]
11
- },
12
- {
13
- "cell_type": "markdown",
14
- "metadata": {
15
- "heading_collapsed": true,
16
- "id": "62A6F_072Fwq"
17
- },
18
- "source": [
19
- "## Install Dependencies"
20
- ]
21
- },
22
- {
23
- "cell_type": "code",
24
- "execution_count": null,
25
- "metadata": {
26
- "hidden": true,
27
- "id": "5w5AkskZ2Fwr"
28
- },
29
- "outputs": [],
30
- "source": [
31
- "#@title Install and Import Dependencies\n",
32
- "\n",
33
- "# this assumes that you have a relevant version of PyTorch installed\n",
34
- "!pip install -q torchaudio\n",
35
- "\n",
36
- "SAMPLING_RATE = 16000\n",
37
- "\n",
38
- "import torch\n",
39
- "torch.set_num_threads(1)\n",
40
- "\n",
41
- "from IPython.display import Audio\n",
42
- "from pprint import pprint\n",
43
- "# download example\n",
44
- "torch.hub.download_url_to_file('https://models.silero.ai/vad_models/en.wav', 'en_example.wav')"
45
- ]
46
- },
47
- {
48
- "cell_type": "code",
49
- "execution_count": null,
50
- "metadata": {
51
- "id": "pSifus5IilRp"
52
- },
53
- "outputs": [],
54
- "source": [
55
- "USE_ONNX = False # change this to True if you want to test onnx model\n",
56
- "if USE_ONNX:\n",
57
- " !pip install -q onnxruntime\n",
58
- " \n",
59
- "model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',\n",
60
- " model='silero_vad',\n",
61
- " force_reload=True,\n",
62
- " onnx=USE_ONNX)\n",
63
- "\n",
64
- "(get_speech_timestamps,\n",
65
- " save_audio,\n",
66
- " read_audio,\n",
67
- " VADIterator,\n",
68
- " collect_chunks) = utils"
69
- ]
70
- },
71
- {
72
- "cell_type": "markdown",
73
- "metadata": {
74
- "id": "fXbbaUO3jsrw"
75
- },
76
- "source": [
77
- "## Full Audio"
78
- ]
79
- },
80
- {
81
- "cell_type": "markdown",
82
- "metadata": {
83
- "id": "RAfJPb_a-Auj"
84
- },
85
- "source": [
86
- "**Speech timestapms from full audio**"
87
- ]
88
- },
89
- {
90
- "cell_type": "code",
91
- "execution_count": null,
92
- "metadata": {
93
- "id": "aI_eydBPjsrx"
94
- },
95
- "outputs": [],
96
- "source": [
97
- "wav = read_audio('en_example.wav', sampling_rate=SAMPLING_RATE)\n",
98
- "# get speech timestamps from full audio file\n",
99
- "speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=SAMPLING_RATE)\n",
100
- "pprint(speech_timestamps)"
101
- ]
102
- },
103
- {
104
- "cell_type": "code",
105
- "execution_count": null,
106
- "metadata": {
107
- "id": "OuEobLchjsry"
108
- },
109
- "outputs": [],
110
- "source": [
111
- "# merge all speech chunks to one audio\n",
112
- "save_audio('only_speech.wav',\n",
113
- " collect_chunks(speech_timestamps, wav), sampling_rate=SAMPLING_RATE) \n",
114
- "Audio('only_speech.wav')"
115
- ]
116
- },
117
- {
118
- "cell_type": "markdown",
119
- "metadata": {
120
- "id": "iDKQbVr8jsry"
121
- },
122
- "source": [
123
- "## Stream imitation example"
124
- ]
125
- },
126
- {
127
- "cell_type": "code",
128
- "execution_count": null,
129
- "metadata": {
130
- "id": "q-lql_2Wjsry"
131
- },
132
- "outputs": [],
133
- "source": [
134
- "## using VADIterator class\n",
135
- "\n",
136
- "vad_iterator = VADIterator(model)\n",
137
- "wav = read_audio(f'en_example.wav', sampling_rate=SAMPLING_RATE)\n",
138
- "\n",
139
- "window_size_samples = 1536 # number of samples in a single audio chunk\n",
140
- "for i in range(0, len(wav), window_size_samples):\n",
141
- " chunk = wav[i: i+ window_size_samples]\n",
142
- " if len(chunk) < window_size_samples:\n",
143
- " break\n",
144
- " speech_dict = vad_iterator(chunk, return_seconds=True)\n",
145
- " if speech_dict:\n",
146
- " print(speech_dict, end=' ')\n",
147
- "vad_iterator.reset_states() # reset model states after each audio"
148
- ]
149
- },
150
- {
151
- "cell_type": "code",
152
- "execution_count": null,
153
- "metadata": {
154
- "id": "BX3UgwwB2Fwv"
155
- },
156
- "outputs": [],
157
- "source": [
158
- "## just probabilities\n",
159
- "\n",
160
- "wav = read_audio('en_example.wav', sampling_rate=SAMPLING_RATE)\n",
161
- "speech_probs = []\n",
162
- "window_size_samples = 1536\n",
163
- "for i in range(0, len(wav), window_size_samples):\n",
164
- " chunk = wav[i: i+ window_size_samples]\n",
165
- " if len(chunk) < window_size_samples:\n",
166
- " break\n",
167
- " speech_prob = model(chunk, SAMPLING_RATE).item()\n",
168
- " speech_probs.append(speech_prob)\n",
169
- "vad_iterator.reset_states() # reset model states after each audio\n",
170
- "\n",
171
- "print(speech_probs[:10]) # first 10 chunks predicts"
172
- ]
173
- },
174
- {
175
- "cell_type": "markdown",
176
- "metadata": {
177
- "heading_collapsed": true,
178
- "id": "36jY0niD2Fww"
179
- },
180
- "source": [
181
- "# Number detector"
182
- ]
183
- },
184
- {
185
- "cell_type": "markdown",
186
- "metadata": {
187
- "heading_collapsed": true,
188
- "hidden": true,
189
- "id": "scd1DlS42Fwx"
190
- },
191
- "source": [
192
- "## Install Dependencies"
193
- ]
194
- },
195
- {
196
- "cell_type": "code",
197
- "execution_count": null,
198
- "metadata": {
199
- "hidden": true,
200
- "id": "Kq5gQuYq2Fwx"
201
- },
202
- "outputs": [],
203
- "source": [
204
- "#@title Install and Import Dependencies\n",
205
- "\n",
206
- "# this assumes that you have a relevant version of PyTorch installed\n",
207
- "!pip install -q torchaudio\n",
208
- "\n",
209
- "SAMPLING_RATE = 16000\n",
210
- "\n",
211
- "import torch\n",
212
- "torch.set_num_threads(1)\n",
213
- "\n",
214
- "from IPython.display import Audio\n",
215
- "from pprint import pprint\n",
216
- "# download example\n",
217
- "torch.hub.download_url_to_file('https://models.silero.ai/vad_models/en_num.wav', 'en_number_example.wav')"
218
- ]
219
- },
220
- {
221
- "cell_type": "code",
222
- "execution_count": null,
223
- "metadata": {
224
- "id": "dPwCFHmFycUF"
225
- },
226
- "outputs": [],
227
- "source": [
228
- "USE_ONNX = False # change this to True if you want to test onnx model\n",
229
- "if USE_ONNX:\n",
230
- " !pip install -q onnxruntime\n",
231
- " \n",
232
- "model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',\n",
233
- " model='silero_number_detector',\n",
234
- " force_reload=True,\n",
235
- " onnx=USE_ONNX)\n",
236
- "\n",
237
- "(get_number_ts,\n",
238
- " save_audio,\n",
239
- " read_audio,\n",
240
- " collect_chunks,\n",
241
- " drop_chunks) = utils\n"
242
- ]
243
- },
244
- {
245
- "cell_type": "markdown",
246
- "metadata": {
247
- "heading_collapsed": true,
248
- "hidden": true,
249
- "id": "qhPa30ij2Fwy"
250
- },
251
- "source": [
252
- "## Full audio"
253
- ]
254
- },
255
- {
256
- "cell_type": "code",
257
- "execution_count": null,
258
- "metadata": {
259
- "hidden": true,
260
- "id": "EXpau6xq2Fwy"
261
- },
262
- "outputs": [],
263
- "source": [
264
- "wav = read_audio('en_number_example.wav', sampling_rate=SAMPLING_RATE)\n",
265
- "# get number timestamps from full audio file\n",
266
- "number_timestamps = get_number_ts(wav, model)\n",
267
- "pprint(number_timestamps)"
268
- ]
269
- },
270
- {
271
- "cell_type": "code",
272
- "execution_count": null,
273
- "metadata": {
274
- "hidden": true,
275
- "id": "u-KfXRhZ2Fwy"
276
- },
277
- "outputs": [],
278
- "source": [
279
- "# convert ms in timestamps to samples\n",
280
- "for timestamp in number_timestamps:\n",
281
- " timestamp['start'] = int(timestamp['start'] * SAMPLING_RATE / 1000)\n",
282
- " timestamp['end'] = int(timestamp['end'] * SAMPLING_RATE / 1000)"
283
- ]
284
- },
285
- {
286
- "cell_type": "code",
287
- "execution_count": null,
288
- "metadata": {
289
- "hidden": true,
290
- "id": "iwYEC4aZ2Fwy"
291
- },
292
- "outputs": [],
293
- "source": [
294
- "# merge all number chunks to one audio\n",
295
- "save_audio('only_numbers.wav',\n",
296
- " collect_chunks(number_timestamps, wav), SAMPLING_RATE) \n",
297
- "Audio('only_numbers.wav')"
298
- ]
299
- },
300
- {
301
- "cell_type": "code",
302
- "execution_count": null,
303
- "metadata": {
304
- "hidden": true,
305
- "id": "fHaYejX12Fwy"
306
- },
307
- "outputs": [],
308
- "source": [
309
- "# drop all number chunks from audio\n",
310
- "save_audio('no_numbers.wav',\n",
311
- " drop_chunks(number_timestamps, wav), SAMPLING_RATE) \n",
312
- "Audio('no_numbers.wav')"
313
- ]
314
- },
315
- {
316
- "cell_type": "markdown",
317
- "metadata": {
318
- "heading_collapsed": true,
319
- "id": "PnKtJKbq2Fwz"
320
- },
321
- "source": [
322
- "# Language detector"
323
- ]
324
- },
325
- {
326
- "cell_type": "markdown",
327
- "metadata": {
328
- "heading_collapsed": true,
329
- "hidden": true,
330
- "id": "F5cAmMbP2Fwz"
331
- },
332
- "source": [
333
- "## Install Dependencies"
334
- ]
335
- },
336
- {
337
- "cell_type": "code",
338
- "execution_count": null,
339
- "metadata": {
340
- "hidden": true,
341
- "id": "Zu9D0t6n2Fwz"
342
- },
343
- "outputs": [],
344
- "source": [
345
- "#@title Install and Import Dependencies\n",
346
- "\n",
347
- "# this assumes that you have a relevant version of PyTorch installed\n",
348
- "!pip install -q torchaudio\n",
349
- "\n",
350
- "SAMPLING_RATE = 16000\n",
351
- "\n",
352
- "import torch\n",
353
- "torch.set_num_threads(1)\n",
354
- "\n",
355
- "from IPython.display import Audio\n",
356
- "from pprint import pprint\n",
357
- "# download example\n",
358
- "torch.hub.download_url_to_file('https://models.silero.ai/vad_models/en.wav', 'en_example.wav')"
359
- ]
360
- },
361
- {
362
- "cell_type": "code",
363
- "execution_count": null,
364
- "metadata": {
365
- "id": "JfRKDZiRztFe"
366
- },
367
- "outputs": [],
368
- "source": [
369
- "USE_ONNX = False # change this to True if you want to test onnx model\n",
370
- "if USE_ONNX:\n",
371
- " !pip install -q onnxruntime\n",
372
- " \n",
373
- "model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',\n",
374
- " model='silero_lang_detector',\n",
375
- " force_reload=True,\n",
376
- " onnx=USE_ONNX)\n",
377
- "\n",
378
- "get_language, read_audio = utils"
379
- ]
380
- },
381
- {
382
- "cell_type": "markdown",
383
- "metadata": {
384
- "heading_collapsed": true,
385
- "hidden": true,
386
- "id": "iC696eMX2Fwz"
387
- },
388
- "source": [
389
- "## Full audio"
390
- ]
391
- },
392
- {
393
- "cell_type": "code",
394
- "execution_count": null,
395
- "metadata": {
396
- "hidden": true,
397
- "id": "c8UYnYBF2Fw0"
398
- },
399
- "outputs": [],
400
- "source": [
401
- "wav = read_audio('en_example.wav', sampling_rate=SAMPLING_RATE)\n",
402
- "lang = get_language(wav, model)\n",
403
- "print(lang)"
404
- ]
405
- }
406
- ],
407
- "metadata": {
408
- "colab": {
409
- "name": "silero-vad.ipynb",
410
- "provenance": []
411
- },
412
- "kernelspec": {
413
- "display_name": "Python 3",
414
- "language": "python",
415
- "name": "python3"
416
- },
417
- "language_info": {
418
- "codemirror_mode": {
419
- "name": "ipython",
420
- "version": 3
421
- },
422
- "file_extension": ".py",
423
- "mimetype": "text/x-python",
424
- "name": "python",
425
- "nbconvert_exporter": "python",
426
- "pygments_lexer": "ipython3",
427
- "version": "3.8.8"
428
- },
429
- "toc": {
430
- "base_numbering": 1,
431
- "nav_menu": {},
432
- "number_sections": true,
433
- "sideBar": true,
434
- "skip_h1_title": false,
435
- "title_cell": "Table of Contents",
436
- "title_sidebar": "Contents",
437
- "toc_cell": false,
438
- "toc_position": {},
439
- "toc_section_display": true,
440
- "toc_window_display": false
441
- }
442
- },
443
- "nbformat": 4,
444
- "nbformat_minor": 0
445
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
hub/snakers4_silero-vad_master/utils_vad.py DELETED
@@ -1,488 +0,0 @@
1
- import torch
2
- import torchaudio
3
- from typing import List
4
- import torch.nn.functional as F
5
- import warnings
6
-
7
- languages = ['ru', 'en', 'de', 'es']
8
-
9
-
10
- class OnnxWrapper():
11
-
12
- def __init__(self, path, force_onnx_cpu=False):
13
- import numpy as np
14
- global np
15
- import onnxruntime
16
- if force_onnx_cpu and 'CPUExecutionProvider' in onnxruntime.get_available_providers():
17
- self.session = onnxruntime.InferenceSession(path, providers=['CPUExecutionProvider'])
18
- else:
19
- self.session = onnxruntime.InferenceSession(path)
20
- self.session.intra_op_num_threads = 1
21
- self.session.inter_op_num_threads = 1
22
-
23
- self.reset_states()
24
- self.sample_rates = [8000, 16000]
25
-
26
- def _validate_input(self, x, sr: int):
27
- if x.dim() == 1:
28
- x = x.unsqueeze(0)
29
- if x.dim() > 2:
30
- raise ValueError(f"Too many dimensions for input audio chunk {x.dim()}")
31
-
32
- if sr != 16000 and (sr % 16000 == 0):
33
- step = sr // 16000
34
- x = x[::step]
35
- sr = 16000
36
-
37
- if sr not in self.sample_rates:
38
- raise ValueError(f"Supported sampling rates: {self.sample_rates} (or multiply of 16000)")
39
-
40
- if sr / x.shape[1] > 31.25:
41
- raise ValueError("Input audio chunk is too short")
42
-
43
- return x, sr
44
-
45
- def reset_states(self, batch_size=1):
46
- self._h = np.zeros((2, batch_size, 64)).astype('float32')
47
- self._c = np.zeros((2, batch_size, 64)).astype('float32')
48
- self._last_sr = 0
49
- self._last_batch_size = 0
50
-
51
- def __call__(self, x, sr: int):
52
-
53
- x, sr = self._validate_input(x, sr)
54
- batch_size = x.shape[0]
55
-
56
- if not self._last_batch_size:
57
- self.reset_states(batch_size)
58
- if (self._last_sr) and (self._last_sr != sr):
59
- self.reset_states(batch_size)
60
- if (self._last_batch_size) and (self._last_batch_size != batch_size):
61
- self.reset_states(batch_size)
62
-
63
- if sr in [8000, 16000]:
64
- ort_inputs = {'input': x.numpy(), 'h': self._h, 'c': self._c, 'sr': np.array(sr)}
65
- ort_outs = self.session.run(None, ort_inputs)
66
- out, self._h, self._c = ort_outs
67
- else:
68
- raise ValueError()
69
-
70
- self._last_sr = sr
71
- self._last_batch_size = batch_size
72
-
73
- out = torch.tensor(out)
74
- return out
75
-
76
- def audio_forward(self, x, sr: int, num_samples: int = 512):
77
- outs = []
78
- x, sr = self._validate_input(x, sr)
79
-
80
- if x.shape[1] % num_samples:
81
- pad_num = num_samples - (x.shape[1] % num_samples)
82
- x = torch.nn.functional.pad(x, (0, pad_num), 'constant', value=0.0)
83
-
84
- self.reset_states(x.shape[0])
85
- for i in range(0, x.shape[1], num_samples):
86
- wavs_batch = x[:, i:i+num_samples]
87
- out_chunk = self.__call__(wavs_batch, sr)
88
- outs.append(out_chunk)
89
-
90
- stacked = torch.cat(outs, dim=1)
91
- return stacked.cpu()
92
-
93
-
94
- class Validator():
95
- def __init__(self, url, force_onnx_cpu):
96
- self.onnx = True if url.endswith('.onnx') else False
97
- torch.hub.download_url_to_file(url, 'inf.model')
98
- if self.onnx:
99
- import onnxruntime
100
- if force_onnx_cpu and 'CPUExecutionProvider' in onnxruntime.get_available_providers():
101
- self.model = onnxruntime.InferenceSession('inf.model', providers=['CPUExecutionProvider'])
102
- else:
103
- self.model = onnxruntime.InferenceSession('inf.model')
104
- else:
105
- self.model = init_jit_model(model_path='inf.model')
106
-
107
- def __call__(self, inputs: torch.Tensor):
108
- with torch.no_grad():
109
- if self.onnx:
110
- ort_inputs = {'input': inputs.cpu().numpy()}
111
- outs = self.model.run(None, ort_inputs)
112
- outs = [torch.Tensor(x) for x in outs]
113
- else:
114
- outs = self.model(inputs)
115
-
116
- return outs
117
-
118
-
119
- def read_audio(path: str,
120
- sampling_rate: int = 16000):
121
-
122
- wav, sr = torchaudio.load(path)
123
-
124
- if wav.size(0) > 1:
125
- wav = wav.mean(dim=0, keepdim=True)
126
-
127
- if sr != sampling_rate:
128
- transform = torchaudio.transforms.Resample(orig_freq=sr,
129
- new_freq=sampling_rate)
130
- wav = transform(wav)
131
- sr = sampling_rate
132
-
133
- assert sr == sampling_rate
134
- return wav.squeeze(0)
135
-
136
-
137
- def save_audio(path: str,
138
- tensor: torch.Tensor,
139
- sampling_rate: int = 16000):
140
- torchaudio.save(path, tensor.unsqueeze(0), sampling_rate)
141
-
142
-
143
- def init_jit_model(model_path: str,
144
- device=torch.device('cpu')):
145
- torch.set_grad_enabled(False)
146
- model = torch.jit.load(model_path, map_location=device)
147
- model.eval()
148
- return model
149
-
150
-
151
- def make_visualization(probs, step):
152
- import pandas as pd
153
- pd.DataFrame({'probs': probs},
154
- index=[x * step for x in range(len(probs))]).plot(figsize=(16, 8),
155
- kind='area', ylim=[0, 1.05], xlim=[0, len(probs) * step],
156
- xlabel='seconds',
157
- ylabel='speech probability',
158
- colormap='tab20')
159
-
160
-
161
- def get_speech_timestamps(audio: torch.Tensor,
162
- model,
163
- threshold: float = 0.5,
164
- sampling_rate: int = 16000,
165
- min_speech_duration_ms: int = 250,
166
- min_silence_duration_ms: int = 100,
167
- window_size_samples: int = 512,
168
- speech_pad_ms: int = 30,
169
- return_seconds: bool = False,
170
- visualize_probs: bool = False):
171
-
172
- """
173
- This method is used for splitting long audios into speech chunks using silero VAD
174
-
175
- Parameters
176
- ----------
177
- audio: torch.Tensor, one dimensional
178
- One dimensional float torch.Tensor, other types are casted to torch if possible
179
-
180
- model: preloaded .jit silero VAD model
181
-
182
- threshold: float (default - 0.5)
183
- Speech threshold. Silero VAD outputs speech probabilities for each audio chunk, probabilities ABOVE this value are considered as SPEECH.
184
- It is better to tune this parameter for each dataset separately, but "lazy" 0.5 is pretty good for most datasets.
185
-
186
- sampling_rate: int (default - 16000)
187
- Currently silero VAD models support 8000 and 16000 sample rates
188
-
189
- min_speech_duration_ms: int (default - 250 milliseconds)
190
- Final speech chunks shorter min_speech_duration_ms are thrown out
191
-
192
- min_silence_duration_ms: int (default - 100 milliseconds)
193
- In the end of each speech chunk wait for min_silence_duration_ms before separating it
194
-
195
- window_size_samples: int (default - 1536 samples)
196
- Audio chunks of window_size_samples size are fed to the silero VAD model.
197
- WARNING! Silero VAD models were trained using 512, 1024, 1536 samples for 16000 sample rate and 256, 512, 768 samples for 8000 sample rate.
198
- Values other than these may affect model perfomance!!
199
-
200
- speech_pad_ms: int (default - 30 milliseconds)
201
- Final speech chunks are padded by speech_pad_ms each side
202
-
203
- return_seconds: bool (default - False)
204
- whether return timestamps in seconds (default - samples)
205
-
206
- visualize_probs: bool (default - False)
207
- whether draw prob hist or not
208
-
209
- Returns
210
- ----------
211
- speeches: list of dicts
212
- list containing ends and beginnings of speech chunks (samples or seconds based on return_seconds)
213
- """
214
-
215
- if not torch.is_tensor(audio):
216
- try:
217
- audio = torch.Tensor(audio)
218
- except:
219
- raise TypeError("Audio cannot be casted to tensor. Cast it manually")
220
-
221
- if len(audio.shape) > 1:
222
- for i in range(len(audio.shape)): # trying to squeeze empty dimensions
223
- audio = audio.squeeze(0)
224
- if len(audio.shape) > 1:
225
- raise ValueError("More than one dimension in audio. Are you trying to process audio with 2 channels?")
226
-
227
- if sampling_rate > 16000 and (sampling_rate % 16000 == 0):
228
- step = sampling_rate // 16000
229
- sampling_rate = 16000
230
- audio = audio[::step]
231
- warnings.warn('Sampling rate is a multiply of 16000, casting to 16000 manually!')
232
- else:
233
- step = 1
234
-
235
- if sampling_rate == 8000 and window_size_samples > 768:
236
- warnings.warn('window_size_samples is too big for 8000 sampling_rate! Better set window_size_samples to 256, 512 or 768 for 8000 sample rate!')
237
- if window_size_samples not in [256, 512, 768, 1024, 1536]:
238
- warnings.warn('Unusual window_size_samples! Supported window_size_samples:\n - [512, 1024, 1536] for 16000 sampling_rate\n - [256, 512, 768] for 8000 sampling_rate')
239
-
240
- model.reset_states()
241
- min_speech_samples = sampling_rate * min_speech_duration_ms / 1000
242
- min_silence_samples = sampling_rate * min_silence_duration_ms / 1000
243
- speech_pad_samples = sampling_rate * speech_pad_ms / 1000
244
-
245
- audio_length_samples = len(audio)
246
-
247
- speech_probs = []
248
- for current_start_sample in range(0, audio_length_samples, window_size_samples):
249
- chunk = audio[current_start_sample: current_start_sample + window_size_samples]
250
- if len(chunk) < window_size_samples:
251
- chunk = torch.nn.functional.pad(chunk, (0, int(window_size_samples - len(chunk))))
252
- speech_prob = model(chunk, sampling_rate).item()
253
- speech_probs.append(speech_prob)
254
-
255
- triggered = False
256
- speeches = []
257
- current_speech = {}
258
- neg_threshold = threshold - 0.15
259
- temp_end = 0
260
-
261
- for i, speech_prob in enumerate(speech_probs):
262
- if (speech_prob >= threshold) and temp_end:
263
- temp_end = 0
264
-
265
- if (speech_prob >= threshold) and not triggered:
266
- triggered = True
267
- current_speech['start'] = window_size_samples * i
268
- continue
269
-
270
- if (speech_prob < neg_threshold) and triggered:
271
- if not temp_end:
272
- temp_end = window_size_samples * i
273
- if (window_size_samples * i) - temp_end < min_silence_samples:
274
- continue
275
- else:
276
- current_speech['end'] = temp_end
277
- if (current_speech['end'] - current_speech['start']) > min_speech_samples:
278
- speeches.append(current_speech)
279
- temp_end = 0
280
- current_speech = {}
281
- triggered = False
282
- continue
283
-
284
- if current_speech and (audio_length_samples - current_speech['start']) > min_speech_samples:
285
- current_speech['end'] = audio_length_samples
286
- speeches.append(current_speech)
287
-
288
- for i, speech in enumerate(speeches):
289
- if i == 0:
290
- speech['start'] = int(max(0, speech['start'] - speech_pad_samples))
291
- if i != len(speeches) - 1:
292
- silence_duration = speeches[i+1]['start'] - speech['end']
293
- if silence_duration < 2 * speech_pad_samples:
294
- speech['end'] += int(silence_duration // 2)
295
- speeches[i+1]['start'] = int(max(0, speeches[i+1]['start'] - silence_duration // 2))
296
- else:
297
- speech['end'] = int(min(audio_length_samples, speech['end'] + speech_pad_samples))
298
- speeches[i+1]['start'] = int(max(0, speeches[i+1]['start'] - speech_pad_samples))
299
- else:
300
- speech['end'] = int(min(audio_length_samples, speech['end'] + speech_pad_samples))
301
-
302
- if return_seconds:
303
- for speech_dict in speeches:
304
- speech_dict['start'] = round(speech_dict['start'] / sampling_rate, 1)
305
- speech_dict['end'] = round(speech_dict['end'] / sampling_rate, 1)
306
- elif step > 1:
307
- for speech_dict in speeches:
308
- speech_dict['start'] *= step
309
- speech_dict['end'] *= step
310
-
311
- if visualize_probs:
312
- make_visualization(speech_probs, window_size_samples / sampling_rate)
313
-
314
- return speeches
315
-
316
-
317
- def get_number_ts(wav: torch.Tensor,
318
- model,
319
- model_stride=8,
320
- hop_length=160,
321
- sample_rate=16000):
322
- wav = torch.unsqueeze(wav, dim=0)
323
- perframe_logits = model(wav)[0]
324
- perframe_preds = torch.argmax(torch.softmax(perframe_logits, dim=1), dim=1).squeeze() # (1, num_frames_strided)
325
- extended_preds = []
326
- for i in perframe_preds:
327
- extended_preds.extend([i.item()] * model_stride)
328
- # len(extended_preds) is *num_frames_real*; for each frame of audio we know if it has a number in it.
329
- triggered = False
330
- timings = []
331
- cur_timing = {}
332
- for i, pred in enumerate(extended_preds):
333
- if pred == 1:
334
- if not triggered:
335
- cur_timing['start'] = int((i * hop_length) / (sample_rate / 1000))
336
- triggered = True
337
- elif pred == 0:
338
- if triggered:
339
- cur_timing['end'] = int((i * hop_length) / (sample_rate / 1000))
340
- timings.append(cur_timing)
341
- cur_timing = {}
342
- triggered = False
343
- if cur_timing:
344
- cur_timing['end'] = int(len(wav) / (sample_rate / 1000))
345
- timings.append(cur_timing)
346
- return timings
347
-
348
-
349
- def get_language(wav: torch.Tensor,
350
- model):
351
- wav = torch.unsqueeze(wav, dim=0)
352
- lang_logits = model(wav)[2]
353
- lang_pred = torch.argmax(torch.softmax(lang_logits, dim=1), dim=1).item() # from 0 to len(languages) - 1
354
- assert lang_pred < len(languages)
355
- return languages[lang_pred]
356
-
357
-
358
- def get_language_and_group(wav: torch.Tensor,
359
- model,
360
- lang_dict: dict,
361
- lang_group_dict: dict,
362
- top_n=1):
363
- wav = torch.unsqueeze(wav, dim=0)
364
- lang_logits, lang_group_logits = model(wav)
365
-
366
- softm = torch.softmax(lang_logits, dim=1).squeeze()
367
- softm_group = torch.softmax(lang_group_logits, dim=1).squeeze()
368
-
369
- srtd = torch.argsort(softm, descending=True)
370
- srtd_group = torch.argsort(softm_group, descending=True)
371
-
372
- outs = []
373
- outs_group = []
374
- for i in range(top_n):
375
- prob = round(softm[srtd[i]].item(), 2)
376
- prob_group = round(softm_group[srtd_group[i]].item(), 2)
377
- outs.append((lang_dict[str(srtd[i].item())], prob))
378
- outs_group.append((lang_group_dict[str(srtd_group[i].item())], prob_group))
379
-
380
- return outs, outs_group
381
-
382
-
383
- class VADIterator:
384
- def __init__(self,
385
- model,
386
- threshold: float = 0.5,
387
- sampling_rate: int = 16000,
388
- min_silence_duration_ms: int = 100,
389
- speech_pad_ms: int = 30
390
- ):
391
-
392
- """
393
- Class for stream imitation
394
-
395
- Parameters
396
- ----------
397
- model: preloaded .jit silero VAD model
398
-
399
- threshold: float (default - 0.5)
400
- Speech threshold. Silero VAD outputs speech probabilities for each audio chunk, probabilities ABOVE this value are considered as SPEECH.
401
- It is better to tune this parameter for each dataset separately, but "lazy" 0.5 is pretty good for most datasets.
402
-
403
- sampling_rate: int (default - 16000)
404
- Currently silero VAD models support 8000 and 16000 sample rates
405
-
406
- min_silence_duration_ms: int (default - 100 milliseconds)
407
- In the end of each speech chunk wait for min_silence_duration_ms before separating it
408
-
409
- speech_pad_ms: int (default - 30 milliseconds)
410
- Final speech chunks are padded by speech_pad_ms each side
411
- """
412
-
413
- self.model = model
414
- self.threshold = threshold
415
- self.sampling_rate = sampling_rate
416
-
417
- if sampling_rate not in [8000, 16000]:
418
- raise ValueError('VADIterator does not support sampling rates other than [8000, 16000]')
419
-
420
- self.min_silence_samples = sampling_rate * min_silence_duration_ms / 1000
421
- self.speech_pad_samples = sampling_rate * speech_pad_ms / 1000
422
- self.reset_states()
423
-
424
- def reset_states(self):
425
-
426
- self.model.reset_states()
427
- self.triggered = False
428
- self.temp_end = 0
429
- self.current_sample = 0
430
-
431
- def __call__(self, x, return_seconds=False):
432
- """
433
- x: torch.Tensor
434
- audio chunk (see examples in repo)
435
-
436
- return_seconds: bool (default - False)
437
- whether return timestamps in seconds (default - samples)
438
- """
439
-
440
- if not torch.is_tensor(x):
441
- try:
442
- x = torch.Tensor(x)
443
- except:
444
- raise TypeError("Audio cannot be casted to tensor. Cast it manually")
445
-
446
- window_size_samples = len(x[0]) if x.dim() == 2 else len(x)
447
- self.current_sample += window_size_samples
448
-
449
- speech_prob = self.model(x, self.sampling_rate).item()
450
-
451
- if (speech_prob >= self.threshold) and self.temp_end:
452
- self.temp_end = 0
453
-
454
- if (speech_prob >= self.threshold) and not self.triggered:
455
- self.triggered = True
456
- speech_start = self.current_sample - self.speech_pad_samples
457
- return {'start': int(speech_start) if not return_seconds else round(speech_start / self.sampling_rate, 1)}
458
-
459
- if (speech_prob < self.threshold - 0.15) and self.triggered:
460
- if not self.temp_end:
461
- self.temp_end = self.current_sample
462
- if self.current_sample - self.temp_end < self.min_silence_samples:
463
- return None
464
- else:
465
- speech_end = self.temp_end + self.speech_pad_samples
466
- self.temp_end = 0
467
- self.triggered = False
468
- return {'end': int(speech_end) if not return_seconds else round(speech_end / self.sampling_rate, 1)}
469
-
470
- return None
471
-
472
-
473
- def collect_chunks(tss: List[dict],
474
- wav: torch.Tensor):
475
- chunks = []
476
- for i in tss:
477
- chunks.append(wav[i['start']: i['end']])
478
- return torch.cat(chunks)
479
-
480
-
481
- def drop_chunks(tss: List[dict],
482
- wav: torch.Tensor):
483
- chunks = []
484
- cur_start = 0
485
- for i in tss:
486
- chunks.append((wav[cur_start: i['start']]))
487
- cur_start = i['end']
488
- return torch.cat(chunks)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
hub/trusted_list DELETED
File without changes