Spaces:
Sleeping
Sleeping
ybbwcwaps
commited on
Commit
•
d036110
1
Parent(s):
c578da5
some torchvgg
Browse files- FakeVD/Models/torchvggish/LICENSE +250 -0
- FakeVD/Models/torchvggish/README.md +33 -0
- FakeVD/Models/torchvggish/docs/_example_download_weights.ipynb +251 -0
- FakeVD/Models/torchvggish/hubconf.py +15 -0
- FakeVD/Models/torchvggish/torchvggish/mel_features.py +223 -0
- FakeVD/Models/torchvggish/torchvggish/vggish.py +189 -0
- FakeVD/Models/torchvggish/torchvggish/vggish_input.py +98 -0
- FakeVD/Models/torchvggish/torchvggish/vggish_params.py +53 -0
FakeVD/Models/torchvggish/LICENSE
ADDED
@@ -0,0 +1,250 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Copyright 2020 Harri Taylor. All rights reserved.
|
2 |
+
|
3 |
+
Apache License
|
4 |
+
Version 2.0, January 2004
|
5 |
+
http://www.apache.org/licenses/
|
6 |
+
|
7 |
+
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
8 |
+
|
9 |
+
1. Definitions.
|
10 |
+
|
11 |
+
"License" shall mean the terms and conditions for use, reproduction,
|
12 |
+
and distribution as defined by Sections 1 through 9 of this document.
|
13 |
+
|
14 |
+
"Licensor" shall mean the copyright owner or entity authorized by
|
15 |
+
the copyright owner that is granting the License.
|
16 |
+
|
17 |
+
"Legal Entity" shall mean the union of the acting entity and all
|
18 |
+
other entities that control, are controlled by, or are under common
|
19 |
+
control with that entity. For the purposes of this definition,
|
20 |
+
"control" means (i) the power, direct or indirect, to cause the
|
21 |
+
direction or management of such entity, whether by contract or
|
22 |
+
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
23 |
+
outstanding shares, or (iii) beneficial ownership of such entity.
|
24 |
+
|
25 |
+
"You" (or "Your") shall mean an individual or Legal Entity
|
26 |
+
exercising permissions granted by this License.
|
27 |
+
|
28 |
+
"Source" form shall mean the preferred form for making modifications,
|
29 |
+
including but not limited to software source code, documentation
|
30 |
+
source, and configuration files.
|
31 |
+
|
32 |
+
"Object" form shall mean any form resulting from mechanical
|
33 |
+
transformation or translation of a Source form, including but
|
34 |
+
not limited to compiled object code, generated documentation,
|
35 |
+
and conversions to other media types.
|
36 |
+
|
37 |
+
"Work" shall mean the work of authorship, whether in Source or
|
38 |
+
Object form, made available under the License, as indicated by a
|
39 |
+
copyright notice that is included in or attached to the work
|
40 |
+
(an example is provided in the Appendix below).
|
41 |
+
|
42 |
+
"Derivative Works" shall mean any work, whether in Source or Object
|
43 |
+
form, that is based on (or derived from) the Work and for which the
|
44 |
+
editorial revisions, annotations, elaborations, or other modifications
|
45 |
+
represent, as a whole, an original work of authorship. For the purposes
|
46 |
+
of this License, Derivative Works shall not include works that remain
|
47 |
+
separable from, or merely link (or bind by name) to the interfaces of,
|
48 |
+
the Work and Derivative Works thereof.
|
49 |
+
|
50 |
+
"Contribution" shall mean any work of authorship, including
|
51 |
+
the original version of the Work and any modifications or additions
|
52 |
+
to that Work or Derivative Works thereof, that is intentionally
|
53 |
+
submitted to Licensor for inclusion in the Work by the copyright owner
|
54 |
+
or by an individual or Legal Entity authorized to submit on behalf of
|
55 |
+
the copyright owner. For the purposes of this definition, "submitted"
|
56 |
+
means any form of electronic, verbal, or written communication sent
|
57 |
+
to the Licensor or its representatives, including but not limited to
|
58 |
+
communication on electronic mailing lists, source code control systems,
|
59 |
+
and issue tracking systems that are managed by, or on behalf of, the
|
60 |
+
Licensor for the purpose of discussing and improving the Work, but
|
61 |
+
excluding communication that is conspicuously marked or otherwise
|
62 |
+
designated in writing by the copyright owner as "Not a Contribution."
|
63 |
+
|
64 |
+
"Contributor" shall mean Licensor and any individual or Legal Entity
|
65 |
+
on behalf of whom a Contribution has been received by Licensor and
|
66 |
+
subsequently incorporated within the Work.
|
67 |
+
|
68 |
+
2. Grant of Copyright License. Subject to the terms and conditions of
|
69 |
+
this License, each Contributor hereby grants to You a perpetual,
|
70 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
71 |
+
copyright license to reproduce, prepare Derivative Works of,
|
72 |
+
publicly display, publicly perform, sublicense, and distribute the
|
73 |
+
Work and such Derivative Works in Source or Object form.
|
74 |
+
|
75 |
+
3. Grant of Patent License. Subject to the terms and conditions of
|
76 |
+
this License, each Contributor hereby grants to You a perpetual,
|
77 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
78 |
+
(except as stated in this section) patent license to make, have made,
|
79 |
+
use, offer to sell, sell, import, and otherwise transfer the Work,
|
80 |
+
where such license applies only to those patent claims licensable
|
81 |
+
by such Contributor that are necessarily infringed by their
|
82 |
+
Contribution(s) alone or by combination of their Contribution(s)
|
83 |
+
with the Work to which such Contribution(s) was submitted. If You
|
84 |
+
institute patent litigation against any entity (including a
|
85 |
+
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
86 |
+
or a Contribution incorporated within the Work constitutes direct
|
87 |
+
or contributory patent infringement, then any patent licenses
|
88 |
+
granted to You under this License for that Work shall terminate
|
89 |
+
as of the date such litigation is filed.
|
90 |
+
|
91 |
+
4. Redistribution. You may reproduce and distribute copies of the
|
92 |
+
Work or Derivative Works thereof in any medium, with or without
|
93 |
+
modifications, and in Source or Object form, provided that You
|
94 |
+
meet the following conditions:
|
95 |
+
|
96 |
+
(a) You must give any other recipients of the Work or
|
97 |
+
Derivative Works a copy of this License; and
|
98 |
+
|
99 |
+
(b) You must cause any modified files to carry prominent notices
|
100 |
+
stating that You changed the files; and
|
101 |
+
|
102 |
+
(c) You must retain, in the Source form of any Derivative Works
|
103 |
+
that You distribute, all copyright, patent, trademark, and
|
104 |
+
attribution notices from the Source form of the Work,
|
105 |
+
excluding those notices that do not pertain to any part of
|
106 |
+
the Derivative Works; and
|
107 |
+
|
108 |
+
(d) If the Work includes a "NOTICE" text file as part of its
|
109 |
+
distribution, then any Derivative Works that You distribute must
|
110 |
+
include a readable copy of the attribution notices contained
|
111 |
+
within such NOTICE file, excluding those notices that do not
|
112 |
+
pertain to any part of the Derivative Works, in at least one
|
113 |
+
of the following places: within a NOTICE text file distributed
|
114 |
+
as part of the Derivative Works; within the Source form or
|
115 |
+
documentation, if provided along with the Derivative Works; or,
|
116 |
+
within a display generated by the Derivative Works, if and
|
117 |
+
wherever such third-party notices normally appear. The contents
|
118 |
+
of the NOTICE file are for informational purposes only and
|
119 |
+
do not modify the License. You may add Your own attribution
|
120 |
+
notices within Derivative Works that You distribute, alongside
|
121 |
+
or as an addendum to the NOTICE text from the Work, provided
|
122 |
+
that such additional attribution notices cannot be construed
|
123 |
+
as modifying the License.
|
124 |
+
|
125 |
+
You may add Your own copyright statement to Your modifications and
|
126 |
+
may provide additional or different license terms and conditions
|
127 |
+
for use, reproduction, or distribution of Your modifications, or
|
128 |
+
for any such Derivative Works as a whole, provided Your use,
|
129 |
+
reproduction, and distribution of the Work otherwise complies with
|
130 |
+
the conditions stated in this License.
|
131 |
+
|
132 |
+
5. Submission of Contributions. Unless You explicitly state otherwise,
|
133 |
+
any Contribution intentionally submitted for inclusion in the Work
|
134 |
+
by You to the Licensor shall be under the terms and conditions of
|
135 |
+
this License, without any additional terms or conditions.
|
136 |
+
Notwithstanding the above, nothing herein shall supersede or modify
|
137 |
+
the terms of any separate license agreement you may have executed
|
138 |
+
with Licensor regarding such Contributions.
|
139 |
+
|
140 |
+
6. Trademarks. This License does not grant permission to use the trade
|
141 |
+
names, trademarks, service marks, or product names of the Licensor,
|
142 |
+
except as required for reasonable and customary use in describing the
|
143 |
+
origin of the Work and reproducing the content of the NOTICE file.
|
144 |
+
|
145 |
+
7. Disclaimer of Warranty. Unless required by applicable law or
|
146 |
+
agreed to in writing, Licensor provides the Work (and each
|
147 |
+
Contributor provides its Contributions) on an "AS IS" BASIS,
|
148 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
149 |
+
implied, including, without limitation, any warranties or conditions
|
150 |
+
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
151 |
+
PARTICULAR PURPOSE. You are solely responsible for determining the
|
152 |
+
appropriateness of using or redistributing the Work and assume any
|
153 |
+
risks associated with Your exercise of permissions under this License.
|
154 |
+
|
155 |
+
8. Limitation of Liability. In no event and under no legal theory,
|
156 |
+
whether in tort (including negligence), contract, or otherwise,
|
157 |
+
unless required by applicable law (such as deliberate and grossly
|
158 |
+
negligent acts) or agreed to in writing, shall any Contributor be
|
159 |
+
liable to You for damages, including any direct, indirect, special,
|
160 |
+
incidental, or consequential damages of any character arising as a
|
161 |
+
result of this License or out of the use or inability to use the
|
162 |
+
Work (including but not limited to damages for loss of goodwill,
|
163 |
+
work stoppage, computer failure or malfunction, or any and all
|
164 |
+
other commercial damages or losses), even if such Contributor
|
165 |
+
has been advised of the possibility of such damages.
|
166 |
+
|
167 |
+
9. Accepting Warranty or Additional Liability. While redistributing
|
168 |
+
the Work or Derivative Works thereof, You may choose to offer,
|
169 |
+
and charge a fee for, acceptance of support, warranty, indemnity,
|
170 |
+
or other liability obligations and/or rights consistent with this
|
171 |
+
License. However, in accepting such obligations, You may act only
|
172 |
+
on Your own behalf and on Your sole responsibility, not on behalf
|
173 |
+
of any other Contributor, and only if You agree to indemnify,
|
174 |
+
defend, and hold each Contributor harmless for any liability
|
175 |
+
incurred by, or claims asserted against, such Contributor by reason
|
176 |
+
of your accepting any such warranty or additional liability.
|
177 |
+
|
178 |
+
END OF TERMS AND CONDITIONS
|
179 |
+
|
180 |
+
APPENDIX: How to apply the Apache License to your work.
|
181 |
+
|
182 |
+
To apply the Apache License to your work, attach the following
|
183 |
+
boilerplate notice, with the fields enclosed by brackets "[]"
|
184 |
+
replaced with your own identifying information. (Don't include
|
185 |
+
the brackets!) The text should be enclosed in the appropriate
|
186 |
+
comment syntax for the file format. We also recommend that a
|
187 |
+
file or class name and description of purpose be included on the
|
188 |
+
same "printed page" as the copyright notice for easier
|
189 |
+
identification within third-party archives.
|
190 |
+
|
191 |
+
Copyright [yyyy] [name of copyright owner]
|
192 |
+
|
193 |
+
Licensed under the Apache License, Version 2.0 (the "License");
|
194 |
+
you may not use this file except in compliance with the License.
|
195 |
+
You may obtain a copy of the License at
|
196 |
+
|
197 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
198 |
+
|
199 |
+
Unless required by applicable law or agreed to in writing, software
|
200 |
+
distributed under the License is distributed on an "AS IS" BASIS,
|
201 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
202 |
+
See the License for the specific language governing permissions and
|
203 |
+
limitations under the License.
|
204 |
+
|
205 |
+
From PyTorch:
|
206 |
+
|
207 |
+
Copyright (c) 2016- Facebook, Inc (Adam Paszke)
|
208 |
+
Copyright (c) 2014- Facebook, Inc (Soumith Chintala)
|
209 |
+
Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
|
210 |
+
Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu)
|
211 |
+
Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
|
212 |
+
Copyright (c) 2011-2013 NYU (Clement Farabet)
|
213 |
+
Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
|
214 |
+
Copyright (c) 2006 Idiap Research Institute (Samy Bengio)
|
215 |
+
Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
|
216 |
+
|
217 |
+
From Caffe2:
|
218 |
+
|
219 |
+
Copyright (c) 2016-present, Facebook Inc. All rights reserved.
|
220 |
+
|
221 |
+
All contributions by Facebook:
|
222 |
+
Copyright (c) 2016 Facebook Inc.
|
223 |
+
|
224 |
+
All contributions by Google:
|
225 |
+
Copyright (c) 2015 Google Inc.
|
226 |
+
All rights reserved.
|
227 |
+
|
228 |
+
All contributions by Yangqing Jia:
|
229 |
+
Copyright (c) 2015 Yangqing Jia
|
230 |
+
All rights reserved.
|
231 |
+
|
232 |
+
All contributions from Caffe:
|
233 |
+
Copyright(c) 2013, 2014, 2015, the respective contributors
|
234 |
+
All rights reserved.
|
235 |
+
|
236 |
+
From Tensorflow:
|
237 |
+
Copyright(c) 2019 The TensorFlow Authors. All rights reserved.
|
238 |
+
|
239 |
+
All other contributions:
|
240 |
+
Copyright(c) 2015, 2016 the respective contributors
|
241 |
+
All rights reserved.
|
242 |
+
|
243 |
+
Caffe2 uses a copyright model similar to Caffe: each contributor holds
|
244 |
+
copyright over their contributions to Caffe2. The project versioning records
|
245 |
+
all such contribution and copyright details. If a contributor wants to further
|
246 |
+
mark their specific copyright on a particular contribution, they should
|
247 |
+
indicate their copyright solely in the commit message of the change when it is
|
248 |
+
committed.
|
249 |
+
|
250 |
+
All rights reserved.
|
FakeVD/Models/torchvggish/README.md
ADDED
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
**Looking for maintainers** - I no longer have the capacity to maintain this project. If you would like to take over maintenence, please get in touch. I will either forward to your fork, or add you as a maintainer for the project. Thanks.
|
2 |
+
|
3 |
+
---
|
4 |
+
|
5 |
+
|
6 |
+
# VGGish
|
7 |
+
A `torch`-compatible port of [VGGish](https://github.com/tensorflow/models/tree/master/research/audioset)<sup>[1]</sup>,
|
8 |
+
a feature embedding frontend for audio classification models. The weights are ported directly from the tensorflow model, so embeddings created using `torchvggish` will be identical.
|
9 |
+
|
10 |
+
|
11 |
+
## Usage
|
12 |
+
|
13 |
+
```python
|
14 |
+
import torch
|
15 |
+
|
16 |
+
model = torch.hub.load('harritaylor/torchvggish', 'vggish')
|
17 |
+
model.eval()
|
18 |
+
|
19 |
+
# Download an example audio file
|
20 |
+
import urllib
|
21 |
+
url, filename = ("http://soundbible.com/grab.php?id=1698&type=wav", "bus_chatter.wav")
|
22 |
+
try: urllib.URLopener().retrieve(url, filename)
|
23 |
+
except: urllib.request.urlretrieve(url, filename)
|
24 |
+
|
25 |
+
model.forward(filename)
|
26 |
+
```
|
27 |
+
|
28 |
+
<hr>
|
29 |
+
[1] S. Hershey et al., ‘CNN Architectures for Large-Scale Audio Classification’,\
|
30 |
+
in International Conference on Acoustics, Speech and Signal Processing (ICASSP),2017\
|
31 |
+
Available: https://arxiv.org/abs/1609.09430, https://ai.google/research/pubs/pub45611
|
32 |
+
|
33 |
+
|
FakeVD/Models/torchvggish/docs/_example_download_weights.ipynb
ADDED
@@ -0,0 +1,251 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"cells": [
|
3 |
+
{
|
4 |
+
"cell_type": "code",
|
5 |
+
"execution_count": 1,
|
6 |
+
"metadata": {
|
7 |
+
"pycharm": {
|
8 |
+
"is_executing": false
|
9 |
+
}
|
10 |
+
},
|
11 |
+
"outputs": [
|
12 |
+
{
|
13 |
+
"name": "stdout",
|
14 |
+
"text": [
|
15 |
+
"A audioset/README.md\r\nA audioset/mel_features.py\r\nA audioset/vggish_inference_demo.py\r\nA audioset/vggish_input.py\r\nA audioset/vggish_params.py\r\nA audioset/vggish_postprocess.py\r\nA audioset/vggish_slim.py\r\nA audioset/vggish_smoke_test.py\r\n",
|
16 |
+
"A audioset/vggish_train_demo.py\r\n",
|
17 |
+
"Checked out revision 9495.\r\n",
|
18 |
+
"Requirement already satisfied: numpy in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (1.16.3)\r\nRequirement already satisfied: scipy in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (1.2.1)\r\n",
|
19 |
+
"Requirement already satisfied: resampy in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (0.2.1)\r\nRequirement already satisfied: tensorflow in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (1.13.1)\r\nRequirement already satisfied: six in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (1.12.0)\r\nRequirement already satisfied: soundfile in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (0.10.2)\r\nRequirement already satisfied: numpy>=1.10 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from resampy) (1.16.3)\r\nRequirement already satisfied: scipy>=0.13 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from resampy) (1.2.1)\r\n",
|
20 |
+
"Requirement already satisfied: numba>=0.32 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from resampy) (0.43.1)\r\nRequirement already satisfied: gast>=0.2.0 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (0.2.2)\r\nRequirement already satisfied: termcolor>=1.1.0 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (1.1.0)\r\nRequirement already satisfied: tensorflow-estimator<1.14.0rc0,>=1.13.0 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (1.13.0)\r\nRequirement already satisfied: wheel>=0.26 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (0.33.1)\r\nRequirement already satisfied: grpcio>=1.8.6 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (1.20.1)\r\nRequirement already satisfied: astor>=0.6.0 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (0.7.1)\r\nRequirement already satisfied: absl-py>=0.1.6 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (0.7.1)\r\nRequirement already satisfied: protobuf>=3.6.1 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (3.7.1)\r\nRequirement already satisfied: keras-applications>=1.0.6 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (1.0.7)\r\nRequirement already satisfied: keras-preprocessing>=1.0.5 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (1.0.9)\r\nRequirement already satisfied: tensorboard<1.14.0,>=1.13.0 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow) (1.13.1)\r\n",
|
21 |
+
"Requirement already satisfied: cffi>=1.0 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from soundfile) (1.12.3)\r\nRequirement already satisfied: llvmlite>=0.28.0dev0 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from numba>=0.32->resampy) (0.28.0)\r\nRequirement already satisfied: mock>=2.0.0 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorflow-estimator<1.14.0rc0,>=1.13.0->tensorflow) (3.0.5)\r\nRequirement already satisfied: setuptools in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from protobuf>=3.6.1->tensorflow) (41.0.1)\r\nRequirement already satisfied: h5py in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from keras-applications>=1.0.6->tensorflow) (2.9.0)\r\nRequirement already satisfied: werkzeug>=0.11.15 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorboard<1.14.0,>=1.13.0->tensorflow) (0.15.2)\r\nRequirement already satisfied: markdown>=2.6.8 in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from tensorboard<1.14.0,>=1.13.0->tensorflow) (3.1)\r\nRequirement already satisfied: pycparser in /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages (from cffi>=1.0->soundfile) (2.19)\r\n",
|
22 |
+
" % Total % Received % Xferd Average Speed Time Time Time Current\r\n Dload Upload Total Spent Left Speed\r\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0",
|
23 |
+
"\r 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0",
|
24 |
+
"\r 0 277M 0 3870 0 0 2314 0 34:56:43 0:00:01 34:56:42 2313",
|
25 |
+
"\r 1 277M 1 4096k 0 0 1413k 0 0:03:21 0:00:02 0:03:19 1413k",
|
26 |
+
"\r 4 277M 4 13.1M 0 0 4110k 0 0:01:09 0:00:03 0:01:06 4110k",
|
27 |
+
"\r 13 277M 13 37.2M 0 0 8940k 0 0:00:31 0:00:04 0:00:27 8939k",
|
28 |
+
"\r 25 277M 25 71.5M 0 0 13.5M 0 0:00:20 0:00:05 0:00:15 17.7M",
|
29 |
+
"\r 34 277M 34 96.8M 0 0 15.4M 0 0:00:17 0:00:06 0:00:11 21.0M",
|
30 |
+
"\r 43 277M 43 120M 0 0 15.8M 0 0:00:17 0:00:07 0:00:10 24.6M",
|
31 |
+
"\r 48 277M 48 136M 0 0 16.4M 0 0:00:16 0:00:08 0:00:08 24.4M",
|
32 |
+
"\r 60 277M 60 166M 0 0 17.9M 0 0:00:15 0:00:09 0:00:06 25.8M",
|
33 |
+
"\r 71 277M 71 197M 0 0 19.2M 0 0:00:14 0:00:10 0:00:04 25.1M",
|
34 |
+
"\r 76 277M 76 212M 0 0 18.8M 0 0:00:14 0:00:11 0:00:03 23.0M",
|
35 |
+
"\r 83 277M 83 232M 0 0 17.8M 0 0:00:15 0:00:12 0:00:03 20.8M",
|
36 |
+
"\r 86 277M 86 ",
|
37 |
+
" 240M 0 0 18.0M 0 0:00:15 0:00:13 0:00:02 20.8M",
|
38 |
+
"\r 95 277M 95 264M 0 0 18.2M 0 0:00:15 0:00:14 0:00:01 18.6M",
|
39 |
+
"\r100 277M 100 277M 0 0 18.6M 0 0:00:14 0:00:14 --:--:-- 17.3M\r\n",
|
40 |
+
" % Total % Received % Xferd Average Speed Time Time Time Current\r\n Dload Upload Total Spent Left Speed\r\n\r 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0",
|
41 |
+
"\r 1 73020 1 1284 0 0 3139 0 0:00:23 --:--:-- 0:00:23 3139\r100 73020 100 73020 0 0 163k 0 --:--:-- --:--:-- --:--:-- 163k\r\n"
|
42 |
+
],
|
43 |
+
"output_type": "stream"
|
44 |
+
}
|
45 |
+
],
|
46 |
+
"source": [
|
47 |
+
"\"\"\"\n",
|
48 |
+
"This notebook demonstrates how to replicate converting tensorflow\n",
|
49 |
+
"weights from tensorflow's vggish to torchvggish\n",
|
50 |
+
"\"\"\" \n",
|
51 |
+
"\n",
|
52 |
+
"# Download the audioset directory using subversion\n",
|
53 |
+
"# !apt-get -qq install subversion # uncomment if on linux\n",
|
54 |
+
"!svn checkout https://github.com/tensorflow/models/trunk/research/audioset\n",
|
55 |
+
"\n",
|
56 |
+
"# Download audioset requirements\n",
|
57 |
+
"!pip install numpy scipy\n",
|
58 |
+
"!pip install resampy tensorflow six soundfile\n",
|
59 |
+
"\n",
|
60 |
+
"# grab the VGGish model checkpoints & PCA params\n",
|
61 |
+
"!curl -O https://storage.googleapis.com/audioset/vggish_model.ckpt\n",
|
62 |
+
"!curl -O https://storage.googleapis.com/audioset/vggish_pca_params.npz"
|
63 |
+
]
|
64 |
+
},
|
65 |
+
{
|
66 |
+
"cell_type": "code",
|
67 |
+
"execution_count": 2,
|
68 |
+
"metadata": {
|
69 |
+
"pycharm": {
|
70 |
+
"is_executing": false
|
71 |
+
}
|
72 |
+
},
|
73 |
+
"outputs": [
|
74 |
+
{
|
75 |
+
"name": "stdout",
|
76 |
+
"text": [
|
77 |
+
"\nWARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.\nFor more information, please see:\n * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md\n * https://github.com/tensorflow/addons\nIf you depend on functionality not listed there, please file an issue.\n\n\nTesting your install of VGGish\n\n",
|
78 |
+
"Log Mel Spectrogram example: [[-4.47297436 -4.29457354 -4.14940631 ... -3.9747003 -3.94774997\n -3.78687669]\n [-4.48589533 -4.28825497 -4.139964 ... -3.98368686 -3.94976505\n -3.7951698 ]\n [-4.46158065 -4.29329706 -4.14905953 ... -3.96442484 -3.94895483\n -3.78619839]\n ...\n [-4.46152626 -4.29365061 -4.14848608 ... -3.96638113 -3.95057575\n -3.78538167]\n [-4.46152595 -4.2936572 -4.14848104 ... -3.96640507 -3.95059567\n -3.78537143]\n [-4.46152565 -4.29366386 -4.14847603 ... -3.96642906 -3.95061564\n -3.78536116]]\nWARNING:tensorflow:From /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.\nInstructions for updating:\nColocations handled automatically by placer.\n",
|
79 |
+
"WARNING:tensorflow:From /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py:1624: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.\nInstructions for updating:\nUse keras.layers.flatten instead.\n",
|
80 |
+
"WARNING:tensorflow:From /Users/harrisontaylor/.conda/envs/audioset-experiments/lib/python3.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.\nInstructions for updating:\nUse standard file APIs to check for files with this prefix.\n",
|
81 |
+
"INFO:tensorflow:Restoring parameters from vggish_model.ckpt\n",
|
82 |
+
"VGGish embedding: [0. 0. 0. 0. 0. 0.\n 0. 0.16137293 0. 0. 0. 0.\n 0. 0. 0. 0. 0. 0.80695796\n 0. 0. 0. 0. 0. 0.\n 0. 0.36792755 0.03582409 0. 0. 0.\n 0. 0.38027024 0.1375593 0.9174708 0.8065634 0.\n 0. 0. 0. 0.04036281 0.7076243 0.\n 0.497839 0.24081808 0.21565434 0.88492286 1.19568 0.6706197\n 0.20779458 0.01639861 0.17471863 0. 0. 0.25100806\n 0. 0. 0.14607918 0. 0.39887053 0.30542105\n 0.12896761 0. 0. 0. 0. 0.\n 0.5385133 0. 0. 0.04941072 0.42527416 0.18537284\n 0. 0. 0.14753515 0. 0. 0.69933873\n 0.45541188 0.05174822 0. 0.01992539 0. 0.\n 0.5181578 0.565576 0.6587975 0. 0. 0.41056332\n 0. 0. 0. 0.25765193 0.23232114 0.24026448\n 0. 0. 0. 0. 0. 0.26523757\n 0. 0.48460823 0. 0. 0.19325787 0.\n 0.20123348 0. 0.03368621 0. 0. 0.\n 0. 0.17836356 0.024749 0.06889972 0. 0.\n 0. 0.08246281 0. 0. 0. 0.\n 0. 0. ]\nPostprocessed VGGish embedding: [169 10 154 127 191 66 124 69 157 232 142 21 128 131 43 3 33 111\n 198 153 76 255 194 60 71 179 146 131 167 60 79 76 192 84 102 160\n 23 91 173 13 149 186 115 202 252 163 84 145 107 255 5 198 81 0\n 203 110 35 104 101 131 255 0 0 158 136 74 115 152 77 154 54 151\n 82 243 57 116 165 153 85 181 152 0 255 122 29 255 46 105 110 43\n 0 90 58 13 255 108 96 255 84 121 255 75 176 111 176 64 83 231\n 255 82 255 94 81 144 99 173 255 0 0 158 31 230 112 255 0 255\n 20 255]\n\nLooks Good To Me!\n\n"
|
83 |
+
],
|
84 |
+
"output_type": "stream"
|
85 |
+
}
|
86 |
+
],
|
87 |
+
"source": [
|
88 |
+
"# Test install\n",
|
89 |
+
"!mv audioset/* .\n",
|
90 |
+
"from vggish_smoke_test import *"
|
91 |
+
]
|
92 |
+
},
|
93 |
+
{
|
94 |
+
"cell_type": "code",
|
95 |
+
"execution_count": 4,
|
96 |
+
"metadata": {
|
97 |
+
"pycharm": {
|
98 |
+
"is_executing": false
|
99 |
+
}
|
100 |
+
},
|
101 |
+
"outputs": [
|
102 |
+
{
|
103 |
+
"name": "stdout",
|
104 |
+
"text": [
|
105 |
+
"INFO:tensorflow:Restoring parameters from vggish_model.ckpt\n",
|
106 |
+
"vggish/conv1/weights:0\n\t(3, 3, 1, 64)\nvggish/conv1/biases:0\n\t(64,)\nvggish/conv2/weights:0\n\t(3, 3, 64, 128)\nvggish/conv2/biases:0\n\t(128,)\nvggish/conv3/conv3_1/weights:0\n\t(3, 3, 128, 256)\nvggish/conv3/conv3_1/biases:0\n\t(256,)\nvggish/conv3/conv3_2/weights:0\n\t(3, 3, 256, 256)\nvggish/conv3/conv3_2/biases:0\n\t(256,)\nvggish/conv4/conv4_1/weights:0\n\t(3, 3, 256, 512)\nvggish/conv4/conv4_1/biases:0\n\t(512,)\nvggish/conv4/conv4_2/weights:0\n\t(3, 3, 512, 512)\nvggish/conv4/conv4_2/biases:0\n\t(512,)\nvggish/fc1/fc1_1/weights:0\n\t(12288, 4096)\nvggish/fc1/fc1_1/biases:0\n\t(4096,)\nvggish/fc1/fc1_2/weights:0\n\t(4096, 4096)\nvggish/fc1/fc1_2/biases:0\n\t(4096,)\nvggish/fc2/weights:0\n\t(4096, 128)\nvggish/fc2/biases:0\n\t(128,)\nvalues written to vggish_dict\n"
|
107 |
+
],
|
108 |
+
"output_type": "stream"
|
109 |
+
}
|
110 |
+
],
|
111 |
+
"source": [
|
112 |
+
"import tensorflow as tf\n",
|
113 |
+
"import vggish_slim\n",
|
114 |
+
"\n",
|
115 |
+
"vggish_dict = {}\n",
|
116 |
+
"# load the model and get info \n",
|
117 |
+
"with tf.Graph().as_default(), tf.Session() as sess:\n",
|
118 |
+
" vggish_slim.define_vggish_slim(training=True)\n",
|
119 |
+
" vggish_slim.load_vggish_slim_checkpoint(sess,\"vggish_model.ckpt\")\n",
|
120 |
+
" \n",
|
121 |
+
" tvars = tf.trainable_variables()\n",
|
122 |
+
" tvars_vals = sess.run(tvars)\n",
|
123 |
+
"\n",
|
124 |
+
" for var, val in zip(tvars, tvars_vals):\n",
|
125 |
+
" print(\"%s\" % (var.name))\n",
|
126 |
+
" print(\"\\t\" + str(var.shape))\n",
|
127 |
+
" vggish_dict[var.name] = val\n",
|
128 |
+
" print(\"values written to vggish_dict\")"
|
129 |
+
]
|
130 |
+
},
|
131 |
+
{
|
132 |
+
"cell_type": "code",
|
133 |
+
"execution_count": 14,
|
134 |
+
"metadata": {
|
135 |
+
"pycharm": {
|
136 |
+
"is_executing": false,
|
137 |
+
"name": "#%%\n"
|
138 |
+
}
|
139 |
+
},
|
140 |
+
"outputs": [],
|
141 |
+
"source": [
|
142 |
+
"# Define torch model for vggish\n",
|
143 |
+
"\n",
|
144 |
+
"import torch\n",
|
145 |
+
"import torch.nn as nn\n",
|
146 |
+
"import numpy as np\n",
|
147 |
+
"\n",
|
148 |
+
"# From vggish_slim:\n",
|
149 |
+
"# The VGG stack of alternating convolutions and max-pools.\n",
|
150 |
+
"# net = slim.conv2d(net, 64, scope='conv1')\n",
|
151 |
+
"# net = slim.max_pool2d(net, scope='pool1')\n",
|
152 |
+
"# net = slim.conv2d(net, 128, scope='conv2')\n",
|
153 |
+
"# net = slim.max_pool2d(net, scope='pool2')\n",
|
154 |
+
"# net = slim.repeat(net, 2, slim.conv2d, 256, scope='conv3')\n",
|
155 |
+
"# net = slim.max_pool2d(net, scope='pool3')\n",
|
156 |
+
"# net = slim.repeat(net, 2, slim.conv2d, 512, scope='conv4')\n",
|
157 |
+
"# net = slim.max_pool2d(net, scope='pool4')\n",
|
158 |
+
"# # Flatten before entering fully-connected layers\n",
|
159 |
+
"# net = slim.flatten(net)\n",
|
160 |
+
"# net = slim.repeat(net, 2, slim.fully_connected, 4096, scope='fc1')\n",
|
161 |
+
"# # The embedding layer.\n",
|
162 |
+
"# net = slim.fully_connected(net, params.EMBEDDING_SIZE, scope='fc2')\n",
|
163 |
+
"\n",
|
164 |
+
"vggish_list = list(vggish_dict.values())\n",
|
165 |
+
"def param_generator():\n",
|
166 |
+
" param = vggish_list.pop(0)\n",
|
167 |
+
" transposed = np.transpose(param)\n",
|
168 |
+
" to_torch = torch.from_numpy(transposed)\n",
|
169 |
+
" result = torch.nn.Parameter(to_torch)\n",
|
170 |
+
" yield result\n",
|
171 |
+
"\n",
|
172 |
+
"class VGGish(nn.Module):\n",
|
173 |
+
" def __init__(self):\n",
|
174 |
+
" super(VGGish, self).__init__()\n",
|
175 |
+
" self.features = nn.Sequential(\n",
|
176 |
+
" nn.Conv2d(1, 64, 3, 1, 1),\n",
|
177 |
+
" nn.ReLU(inplace=True),\n",
|
178 |
+
" nn.MaxPool2d(2, 2),\n",
|
179 |
+
" nn.Conv2d(64, 128, 3, 1, 1),\n",
|
180 |
+
" nn.ReLU(inplace=True),\n",
|
181 |
+
" nn.MaxPool2d(2, 2),\n",
|
182 |
+
" nn.Conv2d(128, 256, 3, 1, 1),\n",
|
183 |
+
" nn.ReLU(inplace=True),\n",
|
184 |
+
" nn.Conv2d(256, 256, 3, 1, 1),\n",
|
185 |
+
" nn.ReLU(inplace=True),\n",
|
186 |
+
" nn.MaxPool2d(2, 2),\n",
|
187 |
+
" nn.Conv2d(256, 512, 3, 1, 1),\n",
|
188 |
+
" nn.ReLU(inplace=True),\n",
|
189 |
+
" nn.Conv2d(512, 512, 3, 1, 1),\n",
|
190 |
+
" nn.ReLU(inplace=True),\n",
|
191 |
+
" nn.MaxPool2d(2, 2))\n",
|
192 |
+
" self.embeddings = nn.Sequential(\n",
|
193 |
+
" nn.Linear(512*24, 4096),\n",
|
194 |
+
" nn.ReLU(inplace=True),\n",
|
195 |
+
" nn.Linear(4096, 4096),\n",
|
196 |
+
" nn.ReLU(inplace=True),\n",
|
197 |
+
" nn.Linear(4096, 128),\n",
|
198 |
+
" nn.ReLU(inplace=True))\n",
|
199 |
+
" \n",
|
200 |
+
" # extract weights from `vggish_list`\n",
|
201 |
+
" for seq in (self.features, self.embeddings):\n",
|
202 |
+
" for layer in seq:\n",
|
203 |
+
" if type(layer).__name__ != \"MaxPool2d\" and type(layer).__name__ != \"ReLU\":\n",
|
204 |
+
" layer.weight = next(param_generator())\n",
|
205 |
+
" layer.bias = next(param_generator())\n",
|
206 |
+
" \n",
|
207 |
+
" def forward(self, x):\n",
|
208 |
+
" x = self.features(x)\n",
|
209 |
+
" x = x.view(x.size(0),-1)\n",
|
210 |
+
" x = self.embeddings(x)\n",
|
211 |
+
" return x\n",
|
212 |
+
"\n",
|
213 |
+
"net = VGGish()\n",
|
214 |
+
"net.eval()\n",
|
215 |
+
"\n",
|
216 |
+
"# Save weights to disk\n",
|
217 |
+
"torch.save(net.state_dict(), \"./vggish.pth\")"
|
218 |
+
]
|
219 |
+
}
|
220 |
+
],
|
221 |
+
"metadata": {
|
222 |
+
"kernelspec": {
|
223 |
+
"display_name": "Python 3",
|
224 |
+
"language": "python",
|
225 |
+
"name": "python3"
|
226 |
+
},
|
227 |
+
"language_info": {
|
228 |
+
"codemirror_mode": {
|
229 |
+
"name": "ipython",
|
230 |
+
"version": 3
|
231 |
+
},
|
232 |
+
"file_extension": ".py",
|
233 |
+
"mimetype": "text/x-python",
|
234 |
+
"name": "python",
|
235 |
+
"nbconvert_exporter": "python",
|
236 |
+
"pygments_lexer": "ipython3",
|
237 |
+
"version": "3.7.1"
|
238 |
+
},
|
239 |
+
"pycharm": {
|
240 |
+
"stem_cell": {
|
241 |
+
"cell_type": "raw",
|
242 |
+
"source": [],
|
243 |
+
"metadata": {
|
244 |
+
"collapsed": false
|
245 |
+
}
|
246 |
+
}
|
247 |
+
}
|
248 |
+
},
|
249 |
+
"nbformat": 4,
|
250 |
+
"nbformat_minor": 1
|
251 |
+
}
|
FakeVD/Models/torchvggish/hubconf.py
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
dependencies = ['torch', 'numpy', 'resampy', 'soundfile']
|
2 |
+
|
3 |
+
from torchvggish.vggish import VGGish
|
4 |
+
|
5 |
+
model_urls = {
|
6 |
+
'vggish': 'https://github.com/harritaylor/torchvggish/'
|
7 |
+
'releases/download/v0.1/vggish-10086976.pth',
|
8 |
+
'pca': 'https://github.com/harritaylor/torchvggish/'
|
9 |
+
'releases/download/v0.1/vggish_pca_params-970ea276.pth'
|
10 |
+
}
|
11 |
+
|
12 |
+
|
13 |
+
def vggish(**kwargs):
|
14 |
+
model = VGGish(urls=model_urls, **kwargs)
|
15 |
+
return model
|
FakeVD/Models/torchvggish/torchvggish/mel_features.py
ADDED
@@ -0,0 +1,223 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
|
2 |
+
#
|
3 |
+
# Licensed under the Apache License, Version 2.0 (the "License");
|
4 |
+
# you may not use this file except in compliance with the License.
|
5 |
+
# You may obtain a copy of the License at
|
6 |
+
#
|
7 |
+
# http://www.apache.org/licenses/LICENSE-2.0
|
8 |
+
#
|
9 |
+
# Unless required by applicable law or agreed to in writing, software
|
10 |
+
# distributed under the License is distributed on an "AS IS" BASIS,
|
11 |
+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12 |
+
# See the License for the specific language governing permissions and
|
13 |
+
# limitations under the License.
|
14 |
+
# ==============================================================================
|
15 |
+
|
16 |
+
"""Defines routines to compute mel spectrogram features from audio waveform."""
|
17 |
+
|
18 |
+
import numpy as np
|
19 |
+
|
20 |
+
|
21 |
+
def frame(data, window_length, hop_length):
|
22 |
+
"""Convert array into a sequence of successive possibly overlapping frames.
|
23 |
+
|
24 |
+
An n-dimensional array of shape (num_samples, ...) is converted into an
|
25 |
+
(n+1)-D array of shape (num_frames, window_length, ...), where each frame
|
26 |
+
starts hop_length points after the preceding one.
|
27 |
+
|
28 |
+
This is accomplished using stride_tricks, so the original data is not
|
29 |
+
copied. However, there is no zero-padding, so any incomplete frames at the
|
30 |
+
end are not included.
|
31 |
+
|
32 |
+
Args:
|
33 |
+
data: np.array of dimension N >= 1.
|
34 |
+
window_length: Number of samples in each frame.
|
35 |
+
hop_length: Advance (in samples) between each window.
|
36 |
+
|
37 |
+
Returns:
|
38 |
+
(N+1)-D np.array with as many rows as there are complete frames that can be
|
39 |
+
extracted.
|
40 |
+
"""
|
41 |
+
num_samples = data.shape[0]
|
42 |
+
num_frames = 1 + int(np.floor((num_samples - window_length) / hop_length))
|
43 |
+
shape = (num_frames, window_length) + data.shape[1:]
|
44 |
+
strides = (data.strides[0] * hop_length,) + data.strides
|
45 |
+
return np.lib.stride_tricks.as_strided(data, shape=shape, strides=strides)
|
46 |
+
|
47 |
+
|
48 |
+
def periodic_hann(window_length):
|
49 |
+
"""Calculate a "periodic" Hann window.
|
50 |
+
|
51 |
+
The classic Hann window is defined as a raised cosine that starts and
|
52 |
+
ends on zero, and where every value appears twice, except the middle
|
53 |
+
point for an odd-length window. Matlab calls this a "symmetric" window
|
54 |
+
and np.hanning() returns it. However, for Fourier analysis, this
|
55 |
+
actually represents just over one cycle of a period N-1 cosine, and
|
56 |
+
thus is not compactly expressed on a length-N Fourier basis. Instead,
|
57 |
+
it's better to use a raised cosine that ends just before the final
|
58 |
+
zero value - i.e. a complete cycle of a period-N cosine. Matlab
|
59 |
+
calls this a "periodic" window. This routine calculates it.
|
60 |
+
|
61 |
+
Args:
|
62 |
+
window_length: The number of points in the returned window.
|
63 |
+
|
64 |
+
Returns:
|
65 |
+
A 1D np.array containing the periodic hann window.
|
66 |
+
"""
|
67 |
+
return 0.5 - (0.5 * np.cos(2 * np.pi / window_length *
|
68 |
+
np.arange(window_length)))
|
69 |
+
|
70 |
+
|
71 |
+
def stft_magnitude(signal, fft_length,
|
72 |
+
hop_length=None,
|
73 |
+
window_length=None):
|
74 |
+
"""Calculate the short-time Fourier transform magnitude.
|
75 |
+
|
76 |
+
Args:
|
77 |
+
signal: 1D np.array of the input time-domain signal.
|
78 |
+
fft_length: Size of the FFT to apply.
|
79 |
+
hop_length: Advance (in samples) between each frame passed to FFT.
|
80 |
+
window_length: Length of each block of samples to pass to FFT.
|
81 |
+
|
82 |
+
Returns:
|
83 |
+
2D np.array where each row contains the magnitudes of the fft_length/2+1
|
84 |
+
unique values of the FFT for the corresponding frame of input samples.
|
85 |
+
"""
|
86 |
+
frames = frame(signal, window_length, hop_length)
|
87 |
+
# Apply frame window to each frame. We use a periodic Hann (cosine of period
|
88 |
+
# window_length) instead of the symmetric Hann of np.hanning (period
|
89 |
+
# window_length-1).
|
90 |
+
window = periodic_hann(window_length)
|
91 |
+
windowed_frames = frames * window
|
92 |
+
return np.abs(np.fft.rfft(windowed_frames, int(fft_length)))
|
93 |
+
|
94 |
+
|
95 |
+
# Mel spectrum constants and functions.
|
96 |
+
_MEL_BREAK_FREQUENCY_HERTZ = 700.0
|
97 |
+
_MEL_HIGH_FREQUENCY_Q = 1127.0
|
98 |
+
|
99 |
+
|
100 |
+
def hertz_to_mel(frequencies_hertz):
|
101 |
+
"""Convert frequencies to mel scale using HTK formula.
|
102 |
+
|
103 |
+
Args:
|
104 |
+
frequencies_hertz: Scalar or np.array of frequencies in hertz.
|
105 |
+
|
106 |
+
Returns:
|
107 |
+
Object of same size as frequencies_hertz containing corresponding values
|
108 |
+
on the mel scale.
|
109 |
+
"""
|
110 |
+
return _MEL_HIGH_FREQUENCY_Q * np.log(
|
111 |
+
1.0 + (frequencies_hertz / _MEL_BREAK_FREQUENCY_HERTZ))
|
112 |
+
|
113 |
+
|
114 |
+
def spectrogram_to_mel_matrix(num_mel_bins=20,
|
115 |
+
num_spectrogram_bins=129,
|
116 |
+
audio_sample_rate=8000,
|
117 |
+
lower_edge_hertz=125.0,
|
118 |
+
upper_edge_hertz=3800.0):
|
119 |
+
"""Return a matrix that can post-multiply spectrogram rows to make mel.
|
120 |
+
|
121 |
+
Returns a np.array matrix A that can be used to post-multiply a matrix S of
|
122 |
+
spectrogram values (STFT magnitudes) arranged as frames x bins to generate a
|
123 |
+
"mel spectrogram" M of frames x num_mel_bins. M = S A.
|
124 |
+
|
125 |
+
The classic HTK algorithm exploits the complementarity of adjacent mel bands
|
126 |
+
to multiply each FFT bin by only one mel weight, then add it, with positive
|
127 |
+
and negative signs, to the two adjacent mel bands to which that bin
|
128 |
+
contributes. Here, by expressing this operation as a matrix multiply, we go
|
129 |
+
from num_fft multiplies per frame (plus around 2*num_fft adds) to around
|
130 |
+
num_fft^2 multiplies and adds. However, because these are all presumably
|
131 |
+
accomplished in a single call to np.dot(), it's not clear which approach is
|
132 |
+
faster in Python. The matrix multiplication has the attraction of being more
|
133 |
+
general and flexible, and much easier to read.
|
134 |
+
|
135 |
+
Args:
|
136 |
+
num_mel_bins: How many bands in the resulting mel spectrum. This is
|
137 |
+
the number of columns in the output matrix.
|
138 |
+
num_spectrogram_bins: How many bins there are in the source spectrogram
|
139 |
+
data, which is understood to be fft_size/2 + 1, i.e. the spectrogram
|
140 |
+
only contains the nonredundant FFT bins.
|
141 |
+
audio_sample_rate: Samples per second of the audio at the input to the
|
142 |
+
spectrogram. We need this to figure out the actual frequencies for
|
143 |
+
each spectrogram bin, which dictates how they are mapped into mel.
|
144 |
+
lower_edge_hertz: Lower bound on the frequencies to be included in the mel
|
145 |
+
spectrum. This corresponds to the lower edge of the lowest triangular
|
146 |
+
band.
|
147 |
+
upper_edge_hertz: The desired top edge of the highest frequency band.
|
148 |
+
|
149 |
+
Returns:
|
150 |
+
An np.array with shape (num_spectrogram_bins, num_mel_bins).
|
151 |
+
|
152 |
+
Raises:
|
153 |
+
ValueError: if frequency edges are incorrectly ordered or out of range.
|
154 |
+
"""
|
155 |
+
nyquist_hertz = audio_sample_rate / 2.
|
156 |
+
if lower_edge_hertz < 0.0:
|
157 |
+
raise ValueError("lower_edge_hertz %.1f must be >= 0" % lower_edge_hertz)
|
158 |
+
if lower_edge_hertz >= upper_edge_hertz:
|
159 |
+
raise ValueError("lower_edge_hertz %.1f >= upper_edge_hertz %.1f" %
|
160 |
+
(lower_edge_hertz, upper_edge_hertz))
|
161 |
+
if upper_edge_hertz > nyquist_hertz:
|
162 |
+
raise ValueError("upper_edge_hertz %.1f is greater than Nyquist %.1f" %
|
163 |
+
(upper_edge_hertz, nyquist_hertz))
|
164 |
+
spectrogram_bins_hertz = np.linspace(0.0, nyquist_hertz, num_spectrogram_bins)
|
165 |
+
spectrogram_bins_mel = hertz_to_mel(spectrogram_bins_hertz)
|
166 |
+
# The i'th mel band (starting from i=1) has center frequency
|
167 |
+
# band_edges_mel[i], lower edge band_edges_mel[i-1], and higher edge
|
168 |
+
# band_edges_mel[i+1]. Thus, we need num_mel_bins + 2 values in
|
169 |
+
# the band_edges_mel arrays.
|
170 |
+
band_edges_mel = np.linspace(hertz_to_mel(lower_edge_hertz),
|
171 |
+
hertz_to_mel(upper_edge_hertz), num_mel_bins + 2)
|
172 |
+
# Matrix to post-multiply feature arrays whose rows are num_spectrogram_bins
|
173 |
+
# of spectrogram values.
|
174 |
+
mel_weights_matrix = np.empty((num_spectrogram_bins, num_mel_bins))
|
175 |
+
for i in range(num_mel_bins):
|
176 |
+
lower_edge_mel, center_mel, upper_edge_mel = band_edges_mel[i:i + 3]
|
177 |
+
# Calculate lower and upper slopes for every spectrogram bin.
|
178 |
+
# Line segments are linear in the *mel* domain, not hertz.
|
179 |
+
lower_slope = ((spectrogram_bins_mel - lower_edge_mel) /
|
180 |
+
(center_mel - lower_edge_mel))
|
181 |
+
upper_slope = ((upper_edge_mel - spectrogram_bins_mel) /
|
182 |
+
(upper_edge_mel - center_mel))
|
183 |
+
# .. then intersect them with each other and zero.
|
184 |
+
mel_weights_matrix[:, i] = np.maximum(0.0, np.minimum(lower_slope,
|
185 |
+
upper_slope))
|
186 |
+
# HTK excludes the spectrogram DC bin; make sure it always gets a zero
|
187 |
+
# coefficient.
|
188 |
+
mel_weights_matrix[0, :] = 0.0
|
189 |
+
return mel_weights_matrix
|
190 |
+
|
191 |
+
|
192 |
+
def log_mel_spectrogram(data,
|
193 |
+
audio_sample_rate=8000,
|
194 |
+
log_offset=0.0,
|
195 |
+
window_length_secs=0.025,
|
196 |
+
hop_length_secs=0.010,
|
197 |
+
**kwargs):
|
198 |
+
"""Convert waveform to a log magnitude mel-frequency spectrogram.
|
199 |
+
|
200 |
+
Args:
|
201 |
+
data: 1D np.array of waveform data.
|
202 |
+
audio_sample_rate: The sampling rate of data.
|
203 |
+
log_offset: Add this to values when taking log to avoid -Infs.
|
204 |
+
window_length_secs: Duration of each window to analyze.
|
205 |
+
hop_length_secs: Advance between successive analysis windows.
|
206 |
+
**kwargs: Additional arguments to pass to spectrogram_to_mel_matrix.
|
207 |
+
|
208 |
+
Returns:
|
209 |
+
2D np.array of (num_frames, num_mel_bins) consisting of log mel filterbank
|
210 |
+
magnitudes for successive frames.
|
211 |
+
"""
|
212 |
+
window_length_samples = int(round(audio_sample_rate * window_length_secs))
|
213 |
+
hop_length_samples = int(round(audio_sample_rate * hop_length_secs))
|
214 |
+
fft_length = 2 ** int(np.ceil(np.log(window_length_samples) / np.log(2.0)))
|
215 |
+
spectrogram = stft_magnitude(
|
216 |
+
data,
|
217 |
+
fft_length=fft_length,
|
218 |
+
hop_length=hop_length_samples,
|
219 |
+
window_length=window_length_samples)
|
220 |
+
mel_spectrogram = np.dot(spectrogram, spectrogram_to_mel_matrix(
|
221 |
+
num_spectrogram_bins=spectrogram.shape[1],
|
222 |
+
audio_sample_rate=audio_sample_rate, **kwargs))
|
223 |
+
return np.log(mel_spectrogram + log_offset)
|
FakeVD/Models/torchvggish/torchvggish/vggish.py
ADDED
@@ -0,0 +1,189 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import numpy as np
|
2 |
+
import torch
|
3 |
+
import torch.nn as nn
|
4 |
+
from torch import hub
|
5 |
+
|
6 |
+
from . import vggish_input, vggish_params
|
7 |
+
|
8 |
+
|
9 |
+
class VGG(nn.Module):
|
10 |
+
def __init__(self, features):
|
11 |
+
super(VGG, self).__init__()
|
12 |
+
self.features = features
|
13 |
+
self.embeddings = nn.Sequential(
|
14 |
+
nn.Linear(512 * 4 * 6, 4096),
|
15 |
+
nn.ReLU(True),
|
16 |
+
nn.Linear(4096, 4096),
|
17 |
+
nn.ReLU(True),
|
18 |
+
nn.Linear(4096, 128),
|
19 |
+
nn.ReLU(True))
|
20 |
+
|
21 |
+
def forward(self, x):
|
22 |
+
x = self.features(x)
|
23 |
+
|
24 |
+
# Transpose the output from features to
|
25 |
+
# remain compatible with vggish embeddings
|
26 |
+
x = torch.transpose(x, 1, 3)
|
27 |
+
x = torch.transpose(x, 1, 2)
|
28 |
+
x = x.contiguous()
|
29 |
+
x = x.view(x.size(0), -1)
|
30 |
+
|
31 |
+
return self.embeddings(x)
|
32 |
+
|
33 |
+
|
34 |
+
class Postprocessor(nn.Module):
|
35 |
+
"""Post-processes VGGish embeddings. Returns a torch.Tensor instead of a
|
36 |
+
numpy array in order to preserve the gradient.
|
37 |
+
|
38 |
+
"The initial release of AudioSet included 128-D VGGish embeddings for each
|
39 |
+
segment of AudioSet. These released embeddings were produced by applying
|
40 |
+
a PCA transformation (technically, a whitening transform is included as well)
|
41 |
+
and 8-bit quantization to the raw embedding output from VGGish, in order to
|
42 |
+
stay compatible with the YouTube-8M project which provides visual embeddings
|
43 |
+
in the same format for a large set of YouTube videos. This class implements
|
44 |
+
the same PCA (with whitening) and quantization transformations."
|
45 |
+
"""
|
46 |
+
|
47 |
+
def __init__(self):
|
48 |
+
"""Constructs a postprocessor."""
|
49 |
+
super(Postprocessor, self).__init__()
|
50 |
+
# Create empty matrix, for user's state_dict to load
|
51 |
+
self.pca_eigen_vectors = torch.empty(
|
52 |
+
(vggish_params.EMBEDDING_SIZE, vggish_params.EMBEDDING_SIZE,),
|
53 |
+
dtype=torch.float,
|
54 |
+
)
|
55 |
+
self.pca_means = torch.empty(
|
56 |
+
(vggish_params.EMBEDDING_SIZE, 1), dtype=torch.float
|
57 |
+
)
|
58 |
+
|
59 |
+
self.pca_eigen_vectors = nn.Parameter(self.pca_eigen_vectors, requires_grad=False)
|
60 |
+
self.pca_means = nn.Parameter(self.pca_means, requires_grad=False)
|
61 |
+
|
62 |
+
def postprocess(self, embeddings_batch):
|
63 |
+
"""Applies tensor postprocessing to a batch of embeddings.
|
64 |
+
|
65 |
+
Args:
|
66 |
+
embeddings_batch: An tensor of shape [batch_size, embedding_size]
|
67 |
+
containing output from the embedding layer of VGGish.
|
68 |
+
|
69 |
+
Returns:
|
70 |
+
A tensor of the same shape as the input, containing the PCA-transformed,
|
71 |
+
quantized, and clipped version of the input.
|
72 |
+
"""
|
73 |
+
assert len(embeddings_batch.shape) == 2, "Expected 2-d batch, got %r" % (
|
74 |
+
embeddings_batch.shape,
|
75 |
+
)
|
76 |
+
assert (
|
77 |
+
embeddings_batch.shape[1] == vggish_params.EMBEDDING_SIZE
|
78 |
+
), "Bad batch shape: %r" % (embeddings_batch.shape,)
|
79 |
+
|
80 |
+
# Apply PCA.
|
81 |
+
# - Embeddings come in as [batch_size, embedding_size].
|
82 |
+
# - Transpose to [embedding_size, batch_size].
|
83 |
+
# - Subtract pca_means column vector from each column.
|
84 |
+
# - Premultiply by PCA matrix of shape [output_dims, input_dims]
|
85 |
+
# where both are are equal to embedding_size in our case.
|
86 |
+
# - Transpose result back to [batch_size, embedding_size].
|
87 |
+
pca_applied = torch.mm(self.pca_eigen_vectors, (embeddings_batch.t() - self.pca_means)).t()
|
88 |
+
|
89 |
+
# Quantize by:
|
90 |
+
# - clipping to [min, max] range
|
91 |
+
clipped_embeddings = torch.clamp(
|
92 |
+
pca_applied, vggish_params.QUANTIZE_MIN_VAL, vggish_params.QUANTIZE_MAX_VAL
|
93 |
+
)
|
94 |
+
# - convert to 8-bit in range [0.0, 255.0]
|
95 |
+
quantized_embeddings = torch.round(
|
96 |
+
(clipped_embeddings - vggish_params.QUANTIZE_MIN_VAL)
|
97 |
+
* (
|
98 |
+
255.0
|
99 |
+
/ (vggish_params.QUANTIZE_MAX_VAL - vggish_params.QUANTIZE_MIN_VAL)
|
100 |
+
)
|
101 |
+
)
|
102 |
+
return torch.squeeze(quantized_embeddings)
|
103 |
+
|
104 |
+
def forward(self, x):
|
105 |
+
return self.postprocess(x)
|
106 |
+
|
107 |
+
|
108 |
+
def make_layers():
|
109 |
+
layers = []
|
110 |
+
in_channels = 1
|
111 |
+
for v in [64, "M", 128, "M", 256, 256, "M", 512, 512, "M"]:
|
112 |
+
if v == "M":
|
113 |
+
layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
|
114 |
+
else:
|
115 |
+
conv2d = nn.Conv2d(in_channels, v, kernel_size=3, padding=1)
|
116 |
+
layers += [conv2d, nn.ReLU(inplace=True)]
|
117 |
+
in_channels = v
|
118 |
+
return nn.Sequential(*layers)
|
119 |
+
|
120 |
+
|
121 |
+
def _vgg():
|
122 |
+
return VGG(make_layers())
|
123 |
+
|
124 |
+
|
125 |
+
# def _spectrogram():
|
126 |
+
# config = dict(
|
127 |
+
# sr=16000,
|
128 |
+
# n_fft=400,
|
129 |
+
# n_mels=64,
|
130 |
+
# hop_length=160,
|
131 |
+
# window="hann",
|
132 |
+
# center=False,
|
133 |
+
# pad_mode="reflect",
|
134 |
+
# htk=True,
|
135 |
+
# fmin=125,
|
136 |
+
# fmax=7500,
|
137 |
+
# output_format='Magnitude',
|
138 |
+
# # device=device,
|
139 |
+
# )
|
140 |
+
# return Spectrogram.MelSpectrogram(**config)
|
141 |
+
|
142 |
+
|
143 |
+
class VGGish(VGG):
|
144 |
+
def __init__(self, urls, device=None, pretrained=True, preprocess=True, postprocess=True, progress=True):
|
145 |
+
super().__init__(make_layers())
|
146 |
+
if pretrained:
|
147 |
+
state_dict = hub.load_state_dict_from_url(urls['vggish'], progress=progress)
|
148 |
+
super().load_state_dict(state_dict)
|
149 |
+
|
150 |
+
if device is None:
|
151 |
+
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
152 |
+
self.device = device
|
153 |
+
self.preprocess = preprocess
|
154 |
+
self.postprocess = postprocess
|
155 |
+
if self.postprocess:
|
156 |
+
self.pproc = Postprocessor()
|
157 |
+
if pretrained:
|
158 |
+
state_dict = hub.load_state_dict_from_url(urls['pca'], progress=progress)
|
159 |
+
# TODO: Convert the state_dict to torch
|
160 |
+
state_dict[vggish_params.PCA_EIGEN_VECTORS_NAME] = torch.as_tensor(
|
161 |
+
state_dict[vggish_params.PCA_EIGEN_VECTORS_NAME], dtype=torch.float
|
162 |
+
)
|
163 |
+
state_dict[vggish_params.PCA_MEANS_NAME] = torch.as_tensor(
|
164 |
+
state_dict[vggish_params.PCA_MEANS_NAME].reshape(-1, 1), dtype=torch.float
|
165 |
+
)
|
166 |
+
|
167 |
+
self.pproc.load_state_dict(state_dict)
|
168 |
+
self.to(self.device)
|
169 |
+
|
170 |
+
def forward(self, x, fs=None):
|
171 |
+
if self.preprocess:
|
172 |
+
x = self._preprocess(x, fs)
|
173 |
+
x = x.to(self.device)
|
174 |
+
x = VGG.forward(self, x)
|
175 |
+
if self.postprocess:
|
176 |
+
x = self._postprocess(x)
|
177 |
+
return x
|
178 |
+
|
179 |
+
def _preprocess(self, x, fs):
|
180 |
+
if isinstance(x, np.ndarray):
|
181 |
+
x = vggish_input.waveform_to_examples(x, fs)
|
182 |
+
elif isinstance(x, str):
|
183 |
+
x = vggish_input.wavfile_to_examples(x)
|
184 |
+
else:
|
185 |
+
raise AttributeError
|
186 |
+
return x
|
187 |
+
|
188 |
+
def _postprocess(self, x):
|
189 |
+
return self.pproc(x)
|
FakeVD/Models/torchvggish/torchvggish/vggish_input.py
ADDED
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
|
2 |
+
#
|
3 |
+
# Licensed under the Apache License, Version 2.0 (the "License");
|
4 |
+
# you may not use this file except in compliance with the License.
|
5 |
+
# You may obtain a copy of the License at
|
6 |
+
#
|
7 |
+
# http://www.apache.org/licenses/LICENSE-2.0
|
8 |
+
#
|
9 |
+
# Unless required by applicable law or agreed to in writing, software
|
10 |
+
# distributed under the License is distributed on an "AS IS" BASIS,
|
11 |
+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12 |
+
# See the License for the specific language governing permissions and
|
13 |
+
# limitations under the License.
|
14 |
+
# ==============================================================================
|
15 |
+
|
16 |
+
"""Compute input examples for VGGish from audio waveform."""
|
17 |
+
|
18 |
+
# Modification: Return torch tensors rather than numpy arrays
|
19 |
+
import torch
|
20 |
+
|
21 |
+
import numpy as np
|
22 |
+
import resampy
|
23 |
+
|
24 |
+
from . import mel_features
|
25 |
+
from . import vggish_params
|
26 |
+
|
27 |
+
import soundfile as sf
|
28 |
+
|
29 |
+
|
30 |
+
def waveform_to_examples(data, sample_rate, return_tensor=True):
|
31 |
+
"""Converts audio waveform into an array of examples for VGGish.
|
32 |
+
|
33 |
+
Args:
|
34 |
+
data: np.array of either one dimension (mono) or two dimensions
|
35 |
+
(multi-channel, with the outer dimension representing channels).
|
36 |
+
Each sample is generally expected to lie in the range [-1.0, +1.0],
|
37 |
+
although this is not required.
|
38 |
+
sample_rate: Sample rate of data.
|
39 |
+
return_tensor: Return data as a Pytorch tensor ready for VGGish
|
40 |
+
|
41 |
+
Returns:
|
42 |
+
3-D np.array of shape [num_examples, num_frames, num_bands] which represents
|
43 |
+
a sequence of examples, each of which contains a patch of log mel
|
44 |
+
spectrogram, covering num_frames frames of audio and num_bands mel frequency
|
45 |
+
bands, where the frame length is vggish_params.STFT_HOP_LENGTH_SECONDS.
|
46 |
+
|
47 |
+
"""
|
48 |
+
# Convert to mono.
|
49 |
+
if len(data.shape) > 1:
|
50 |
+
data = np.mean(data, axis=1)
|
51 |
+
# Resample to the rate assumed by VGGish.
|
52 |
+
if sample_rate != vggish_params.SAMPLE_RATE:
|
53 |
+
data = resampy.resample(data, sample_rate, vggish_params.SAMPLE_RATE)
|
54 |
+
|
55 |
+
# Compute log mel spectrogram features.
|
56 |
+
log_mel = mel_features.log_mel_spectrogram(
|
57 |
+
data,
|
58 |
+
audio_sample_rate=vggish_params.SAMPLE_RATE,
|
59 |
+
log_offset=vggish_params.LOG_OFFSET,
|
60 |
+
window_length_secs=vggish_params.STFT_WINDOW_LENGTH_SECONDS,
|
61 |
+
hop_length_secs=vggish_params.STFT_HOP_LENGTH_SECONDS,
|
62 |
+
num_mel_bins=vggish_params.NUM_MEL_BINS,
|
63 |
+
lower_edge_hertz=vggish_params.MEL_MIN_HZ,
|
64 |
+
upper_edge_hertz=vggish_params.MEL_MAX_HZ)
|
65 |
+
|
66 |
+
# Frame features into examples.
|
67 |
+
features_sample_rate = 1.0 / vggish_params.STFT_HOP_LENGTH_SECONDS
|
68 |
+
example_window_length = int(round(
|
69 |
+
vggish_params.EXAMPLE_WINDOW_SECONDS * features_sample_rate))
|
70 |
+
example_hop_length = int(round(
|
71 |
+
vggish_params.EXAMPLE_HOP_SECONDS * features_sample_rate))
|
72 |
+
log_mel_examples = mel_features.frame(
|
73 |
+
log_mel,
|
74 |
+
window_length=example_window_length,
|
75 |
+
hop_length=example_hop_length)
|
76 |
+
|
77 |
+
if return_tensor:
|
78 |
+
log_mel_examples = torch.tensor(
|
79 |
+
log_mel_examples, requires_grad=True)[:, None, :, :].float()
|
80 |
+
|
81 |
+
return log_mel_examples
|
82 |
+
|
83 |
+
|
84 |
+
def wavfile_to_examples(wav_file, return_tensor=True):
|
85 |
+
"""Convenience wrapper around waveform_to_examples() for a common WAV format.
|
86 |
+
|
87 |
+
Args:
|
88 |
+
wav_file: String path to a file, or a file-like object. The file
|
89 |
+
is assumed to contain WAV audio data with signed 16-bit PCM samples.
|
90 |
+
torch: Return data as a Pytorch tensor ready for VGGish
|
91 |
+
|
92 |
+
Returns:
|
93 |
+
See waveform_to_examples.
|
94 |
+
"""
|
95 |
+
wav_data, sr = sf.read(wav_file, dtype='int16')
|
96 |
+
assert wav_data.dtype == np.int16, 'Bad sample type: %r' % wav_data.dtype
|
97 |
+
samples = wav_data / 32768.0 # Convert to [-1.0, +1.0]
|
98 |
+
return waveform_to_examples(samples, sr, return_tensor)
|
FakeVD/Models/torchvggish/torchvggish/vggish_params.py
ADDED
@@ -0,0 +1,53 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Copyright 2017 The TensorFlow Authors All Rights Reserved.
|
2 |
+
#
|
3 |
+
# Licensed under the Apache License, Version 2.0 (the "License");
|
4 |
+
# you may not use this file except in compliance with the License.
|
5 |
+
# You may obtain a copy of the License at
|
6 |
+
#
|
7 |
+
# http://www.apache.org/licenses/LICENSE-2.0
|
8 |
+
#
|
9 |
+
# Unless required by applicable law or agreed to in writing, software
|
10 |
+
# distributed under the License is distributed on an "AS IS" BASIS,
|
11 |
+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
12 |
+
# See the License for the specific language governing permissions and
|
13 |
+
# limitations under the License.
|
14 |
+
# ==============================================================================
|
15 |
+
|
16 |
+
"""Global parameters for the VGGish model.
|
17 |
+
|
18 |
+
See vggish_slim.py for more information.
|
19 |
+
"""
|
20 |
+
|
21 |
+
# Architectural constants.
|
22 |
+
NUM_FRAMES = 96 # Frames in input mel-spectrogram patch.
|
23 |
+
NUM_BANDS = 64 # Frequency bands in input mel-spectrogram patch.
|
24 |
+
EMBEDDING_SIZE = 128 # Size of embedding layer.
|
25 |
+
|
26 |
+
# Hyperparameters used in feature and example generation.
|
27 |
+
SAMPLE_RATE = 16000
|
28 |
+
STFT_WINDOW_LENGTH_SECONDS = 0.025
|
29 |
+
STFT_HOP_LENGTH_SECONDS = 0.010
|
30 |
+
NUM_MEL_BINS = NUM_BANDS
|
31 |
+
MEL_MIN_HZ = 125
|
32 |
+
MEL_MAX_HZ = 7500
|
33 |
+
LOG_OFFSET = 0.01 # Offset used for stabilized log of input mel-spectrogram.
|
34 |
+
EXAMPLE_WINDOW_SECONDS = 0.96 # Each example contains 96 10ms frames
|
35 |
+
EXAMPLE_HOP_SECONDS = 0.96 # with zero overlap.
|
36 |
+
|
37 |
+
# Parameters used for embedding postprocessing.
|
38 |
+
PCA_EIGEN_VECTORS_NAME = 'pca_eigen_vectors'
|
39 |
+
PCA_MEANS_NAME = 'pca_means'
|
40 |
+
QUANTIZE_MIN_VAL = -2.0
|
41 |
+
QUANTIZE_MAX_VAL = +2.0
|
42 |
+
|
43 |
+
# Hyperparameters used in training.
|
44 |
+
INIT_STDDEV = 0.01 # Standard deviation used to initialize weights.
|
45 |
+
LEARNING_RATE = 1e-4 # Learning rate for the Adam optimizer.
|
46 |
+
ADAM_EPSILON = 1e-8 # Epsilon for the Adam optimizer.
|
47 |
+
|
48 |
+
# Names of ops, tensors, and features.
|
49 |
+
INPUT_OP_NAME = 'vggish/input_features'
|
50 |
+
INPUT_TENSOR_NAME = INPUT_OP_NAME + ':0'
|
51 |
+
OUTPUT_OP_NAME = 'vggish/embedding'
|
52 |
+
OUTPUT_TENSOR_NAME = OUTPUT_OP_NAME + ':0'
|
53 |
+
AUDIO_EMBEDDING_FEATURE_NAME = 'audio_embedding'
|