JunzheJosephZhu commited on
Commit
2d978ea
1 Parent(s): 22c7bc6

change task, add data files

Browse files
Files changed (39) hide show
  1. README.md +2 -2
  2. create-speaker-mixtures-2345/__MACOSX/._create-speaker-mixtures-2345 +0 -0
  3. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._activlev.m +0 -0
  4. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._create_wav_2speakers.m +0 -0
  5. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._create_wav_3speakers.m +0 -0
  6. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._create_wav_4speakers.m +0 -0
  7. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._create_wav_5speakers.m +0 -0
  8. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._maxfilt.m +0 -0
  9. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_2_spk_cv.txt +0 -0
  10. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_2_spk_tr.txt +0 -0
  11. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_2_spk_tt.txt +0 -0
  12. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_3_spk_cv.txt +0 -0
  13. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_3_spk_tr.txt +0 -0
  14. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_3_spk_tt.txt +0 -0
  15. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_4_spk_cv.txt +0 -0
  16. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_4_spk_tr.txt +0 -0
  17. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_4_spk_tt.txt +0 -0
  18. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_5_spk_cv.txt +0 -0
  19. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_5_spk_tr.txt +0 -0
  20. create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_5_spk_tt.txt +0 -0
  21. create-speaker-mixtures-2345/create-speaker-mixtures-2345.zip +0 -0
  22. create-speaker-mixtures-2345/create-speaker-mixtures-2345/activlev.m +345 -0
  23. create-speaker-mixtures-2345/create-speaker-mixtures-2345/create_wav_2speakers.m +188 -0
  24. create-speaker-mixtures-2345/create-speaker-mixtures-2345/create_wav_3speakers.m +188 -0
  25. create-speaker-mixtures-2345/create-speaker-mixtures-2345/create_wav_4speakers.m +214 -0
  26. create-speaker-mixtures-2345/create-speaker-mixtures-2345/create_wav_5speakers.m +238 -0
  27. create-speaker-mixtures-2345/create-speaker-mixtures-2345/maxfilt.m +127 -0
  28. create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_2_spk_cv.txt +0 -0
  29. create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_2_spk_tr.txt +0 -0
  30. create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_2_spk_tt.txt +0 -0
  31. create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_3_spk_cv.txt +0 -0
  32. create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_3_spk_tr.txt +0 -0
  33. create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_3_spk_tt.txt +0 -0
  34. create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_4_spk_cv.txt +0 -0
  35. create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_4_spk_tr.txt +0 -0
  36. create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_4_spk_tt.txt +0 -0
  37. create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_5_spk_cv.txt +0 -0
  38. create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_5_spk_tr.txt +0 -0
  39. create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_5_spk_tt.txt +0 -0
README.md CHANGED
@@ -17,7 +17,7 @@ Demo Page: https://junzhejosephzhu.github.io/Multi-Decoder-DPRNN/
17
  Original research repo is at https://github.com/JunzheJosephZhu/MultiDecoder-DPRNN
18
 
19
  This model was trained by Joseph Zhu using the wsj0-mix-var/Multi-Decoder-DPRNN recipe in Asteroid.
20
- It was trained on the `sep_clean` task of the Wsj0MixVar dataset.
21
 
22
  ## Training config:
23
  ```yaml
@@ -51,7 +51,7 @@ optim:
51
  data:
52
  train_dir: "data/{}speakers/wav8k/min/tr"
53
  valid_dir: "data/{}speakers/wav8k/min/cv"
54
- task: sep_clean
55
  sample_rate: 8000
56
  seglen: 4.0
57
  minlen: 2.0
17
  Original research repo is at https://github.com/JunzheJosephZhu/MultiDecoder-DPRNN
18
 
19
  This model was trained by Joseph Zhu using the wsj0-mix-var/Multi-Decoder-DPRNN recipe in Asteroid.
20
+ It was trained on the `sep_count` task of the Wsj0MixVar dataset.
21
 
22
  ## Training config:
23
  ```yaml
51
  data:
52
  train_dir: "data/{}speakers/wav8k/min/tr"
53
  valid_dir: "data/{}speakers/wav8k/min/cv"
54
+ task: sep_count
55
  sample_rate: 8000
56
  seglen: 4.0
57
  minlen: 2.0
create-speaker-mixtures-2345/__MACOSX/._create-speaker-mixtures-2345 ADDED
Binary file (212 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._activlev.m ADDED
Binary file (212 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._create_wav_2speakers.m ADDED
Binary file (212 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._create_wav_3speakers.m ADDED
Binary file (212 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._create_wav_4speakers.m ADDED
Binary file (312 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._create_wav_5speakers.m ADDED
Binary file (268 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._maxfilt.m ADDED
Binary file (212 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_2_spk_cv.txt ADDED
Binary file (212 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_2_spk_tr.txt ADDED
Binary file (212 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_2_spk_tt.txt ADDED
Binary file (212 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_3_spk_cv.txt ADDED
Binary file (212 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_3_spk_tr.txt ADDED
Binary file (212 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_3_spk_tt.txt ADDED
Binary file (268 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_4_spk_cv.txt ADDED
Binary file (594 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_4_spk_tr.txt ADDED
Binary file (596 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_4_spk_tt.txt ADDED
Binary file (652 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_5_spk_cv.txt ADDED
Binary file (368 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_5_spk_tr.txt ADDED
Binary file (312 Bytes). View file
create-speaker-mixtures-2345/__MACOSX/create-speaker-mixtures-2345/._mix_5_spk_tt.txt ADDED
Binary file (312 Bytes). View file
create-speaker-mixtures-2345/create-speaker-mixtures-2345.zip ADDED
Binary file (3.66 MB). View file
create-speaker-mixtures-2345/create-speaker-mixtures-2345/activlev.m ADDED
@@ -0,0 +1,345 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ function [lev,af,fso,vad]=activlev(sp,fs,mode)
2
+ %ACTIVLEV Measure active speech level as in ITU-T P.56 [LEV,AF,FSO]=(sp,FS,MODE)
3
+ %
4
+ %Usage: (1) lev=activlev(s,fs); % speech level in units of power
5
+ % (2) db=activlev(s,fs,'d'); % speech level in dB
6
+ % (3) s=activlev(s,fs,'n'); % normalize active level to 0 dB
7
+ %
8
+ %Inputs: sp is the speech signal (with better than 20dB SNR)
9
+ % FS is the sample frequency in Hz (see also FSO below)
10
+ % MODE is a combination of the following:
11
+ % 0 - omit high pass filter completely (i.e. include DC)
12
+ % 3 - high pass filter at 30 Hz instead of 200 Hz (but allows mains hum to pass)
13
+ % 4 - high pass filter at 40 Hz instead of 200 Hz (but allows mains hum to pass)
14
+ % 1 - use cheybyshev 1 filter
15
+ % 2 - use chebyshev 2 filter (default)
16
+ % e - use elliptic filter
17
+ % h - omit low pass filter at 5.5, 12 or 18 kHz
18
+ % w - use wideband filter frequencies: 70 Hz to 12 kHz
19
+ % W - use ultra wideband filter frequencies: 30 Hz to 18 kHz
20
+ % d - give outputs in dB rather than power
21
+ % n - output a normalized speech signal as the first argument
22
+ % N - output a normalized filtered speech signal as the first argument
23
+ % l - give both active and long-term power levels
24
+ % a - include A-weighting filter
25
+ % i - include ITU-R-BS.468/ITU-T-J.16 weighting filter
26
+ % z - do NOT zero-pad the signal by 0.35 s
27
+ %
28
+ %Outputs:
29
+ % If the "n" option is specified, a speech signal normalized to 0dB will be given as
30
+ % the first output followed by the other outputs.
31
+ % LEV gives the speech level in units of power (or dB if mode='d')
32
+ % if mode='l' is specified, LEV is a row vector with the "long term
33
+ % level" as its second element (this is just the mean power)
34
+ % AF is the activity factor (or duty cycle) in the range 0 to 1
35
+ % FSO is a column vector of intermediate information that allows
36
+ % you to process a speech signal in chunks. Thus:
37
+ % fso=fs;
38
+ % for i=1:inc:nsamp
39
+ % [lev,af,fso]=activlev(sp(i:min(i+inc-1,nsamp)),fso,['z' mode]);
40
+ % end
41
+ % lev=activlev([],fso)
42
+ % is equivalent to:
43
+ % lev=activlev(sp(1:nsamp),fs,mode)
44
+ % but is much slower. The two methods will not give identical results
45
+ % because they will use slightly different thresholds. Note you need
46
+ % the 'z' option for all calls except the last.
47
+ % VAD is a boolean vector the same length as sp that acts as an approximate voice activity detector
48
+
49
+ %For completeness we list here the contents of the FSO structure:
50
+ %
51
+ % ffs : sample frequency
52
+ % fmd : mode string
53
+ % nh : hangover time in samples
54
+ % ae : smoothing filter coefs
55
+ % abl: HP filter numerator and denominator coefficient
56
+ % bh : LP filter numerator coefficient
57
+ % ah : LP filter denominator coefficients
58
+ % ze : smoothing filter state
59
+ % zl : HP filter state
60
+ % zh : LP filter state
61
+ % zx : hangover max filter state
62
+ % emax : maximum envelope exponent + 1
63
+ % ssq : signal sum of squares
64
+ % ns : number of signal samples
65
+ % ss : sum of speech samples (not actually used here)
66
+ % kc : cumulative occupancy counts
67
+ % aw : weighting filter denominator
68
+ % bw : weighting filter numerator
69
+ % zw : weighting filter state
70
+ %
71
+ % This routine implements "Method B" from [1],[2] to calculate the active
72
+ % speech level which is defined to be the speech energy divided by the
73
+ % duration of speech activity. Speech is designated as "active" based on an
74
+ % adaptive threshold applied to the smoothed rectified speech signal. A
75
+ % bandpass filter is first applied to the input speech whose -0.25 dB points
76
+ % are at 200 Hz & 5.5 kHz by default but this can be changed to 70 Hz & 5.5 kHz
77
+ % or to 30 Hz & 18 kHz by specifying the 'w' or 'W' options; these
78
+ % correspond respectively to Annexes B and C in [2].
79
+ %
80
+ % References:
81
+ % [1] ITU-T. Objective measurement of active speech level. Recommendation P.56, Mar. 1993.
82
+ % [2] ITU-T. Objective measurement of active speech level. Recommendation P.56, Dec. 2011.
83
+
84
+ % Copyright (C) Mike Brookes 2008-2016
85
+ % Version: $Id: activlev.m 9407 2017-02-07 13:25:55Z dmb $
86
+ %
87
+ % VOICEBOX is a MATLAB toolbox for speech processing.
88
+ % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
89
+ %
90
+ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
91
+ % This program is free software; you can redistribute it and/or modify
92
+ % it under the terms of the GNU General Public License as published by
93
+ % the Free Software Foundation; either version 2 of the License, or
94
+ % (at your option) any later version.
95
+ %
96
+ % This program is distributed in the hope that it will be useful,
97
+ % but WITHOUT ANY WARRANTY; without even the implied warranty of
98
+ % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
99
+ % GNU General Public License for more details.
100
+ %
101
+ % You can obtain a copy of the GNU General Public License from
102
+ % http://www.gnu.org/copyleft/gpl.html or by writing to
103
+ % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
104
+ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
105
+
106
+ persistent nbin thresh c25zp c15zp e5zp
107
+ if isempty(nbin)
108
+ nbin=20; % 60 dB range at 3dB per bin
109
+ thresh=15.9; % threshold in dB
110
+ % High pass s-domain zeros and poles of filters with passband ripple<0.25dB, stopband<-50dB, w0=1
111
+ % w0=fzero(@ch2,0.5); [c2z,c2p,k]=cheby2(5,50,w0,'high','s');
112
+ % function v=ch2(w); [c2z,c2p,k]=cheby2(5,50,w,'high','s'); v= 20*log10(prod(abs(1i-c2z))/prod(abs(1i-c2p)))+0.25;
113
+ c25zp=[0.37843443673309i 0.23388534441447i; -0.20640255179496+0.73942185906851i -0.54036889596392+0.45698784092898i];
114
+ c25zp=[[0; -0.66793268833792] c25zp conj(c25zp)];
115
+ % [c1z,c1p,c1k] = cheby1(5,0.25,1,'high','s');
116
+ c15zp=[-0.659002835294875+1.195798636925079i -0.123261821596263+0.947463030958881i];
117
+ c15zp=[zeros(1,5); -2.288586431066945 c15zp conj(c15zp)];
118
+ % [ez,ep,ek] = ellip(5,0.25,50,1,'high','s')
119
+ e5zp=[0.406667680649209i 0.613849362744881i; -0.538736390607201+1.130245082677107i -0.092723126159100+0.958193646330194i];
120
+ e5zp=[[0; -1.964538608244084] e5zp conj(e5zp)];
121
+ % w=linspace(0.2,2,100);
122
+ % figure(1); plot(w,20*log10(abs(freqs(real(poly(c15zp(1,:))),real(poly(c15zp(2,:))),w)))); title('Chebyshev 1');
123
+ % figure(2); plot(w,20*log10(abs(freqs(real(poly(c25zp(1,:))),real(poly(c25zp(2,:))),w)))); title('Chebyshev 2');
124
+ % figure(3); plot(w,20*log10(abs(freqs(real(poly(e5zp(1,:))),real(poly(e5zp(2,:))),w)))); title('Elliptic');
125
+ end
126
+
127
+ if ~isstruct(fs) % no state vector given
128
+ if nargin<3
129
+ mode=' ';
130
+ end
131
+ fso.ffs=fs; % sample frequency
132
+
133
+ ti=1/fs;
134
+ g=exp(-ti/0.03); % pole position for envelope filter
135
+ fso.ae=[1 -2*g g^2]/(1-g)^2; % envelope filter coefficients (DC gain = 1)
136
+ fso.ze=zeros(2,1);
137
+ fso.nh=ceil(0.2/ti)+1; % hangover time in samples
138
+ fso.zx=-Inf; % initial value for maxfilt()
139
+ fso.emax=-Inf; % maximum exponent
140
+ fso.ns=0;
141
+ fso.ssq=0;
142
+ fso.ss=0;
143
+ fso.kc=zeros(nbin,1); % cumulative occupancy counts
144
+ % s-plane zeros and poles of high pass 5'th order filter -0.25dB at w=1 and -50dB stopband
145
+ if any(mode=='1')
146
+ szp=c15zp; % Chebyshev 1
147
+ elseif any(mode=='e')
148
+ szp=e5zp; % Elliptic
149
+ else
150
+ szp=c25zp; % Chebyshev 2
151
+ end
152
+ flh=[200 5500]; % default frequency range +- 0.25 dB
153
+ if any(mode=='w')
154
+ flh=[70 12000]; % super-wideband (Annex B of [2])
155
+ elseif any(mode=='W')
156
+ flh=[30 18000]; % full band (Annex C of [2])
157
+ end
158
+ if any(mode=='3')
159
+ flh(1)=30; % force a 30 Hz HPF cutoff
160
+ end
161
+ if any(mode=='4')
162
+ flh(1)=40; % force a 40 Hz HPF cutoff
163
+ end
164
+ if any(mode=='r') % included for backward compatibility
165
+ mode=['0h' mode]; % abolish both filters
166
+ elseif fs<flh(2)*2.2
167
+ mode=['h' mode]; % abolish lowpass filter at low sample rates
168
+ end
169
+ fso.fmd=mode; % save mode flags
170
+ if all(mode~='0') % implement the HPF as biquads to avoid rounding errors
171
+ zl=2./(1-szp*tan(flh(1)*pi/fs))-1; % Transform s-domain poles/zeros with bilinear transform
172
+ abl=[ones(2,1) -zl(:,1) -2*real(zl(:,2:3)) abs(zl(:,2:3)).^2]; % biquad coefficients
173
+ hfg=(abl*[1 -1 0 0 0 0]').*(abl*[1 0 -1 0 1 0]').*(abl*[1 0 0 -1 0 1]');
174
+ abl=abl(:,[1 2 1 3 5 1 4 6]); % reorder into biquads
175
+ abl(1,1:2)= abl(1,1:2)*hfg(2)/hfg(1); % force Nyquist gain to equal 1
176
+ fso.abl=abl;
177
+ fso.zl=zeros(5,1); % space for HPF filter state
178
+ end
179
+ if all(mode~='h')
180
+ zh=2./(szp/tan(flh(2)*pi/fs)-1)+1; % Transform s-domain poles/zeros with bilinear transform
181
+ ah=real(poly(zh(2,:)));
182
+ bh=real(poly(zh(1,:)));
183
+ fso.bh=bh*sum(ah)/sum(bh);
184
+ fso.ah=ah;
185
+ fso.zh=zeros(5,1);
186
+ end
187
+ if any(mode=='a')
188
+ [fso.bw,fso.aw]=stdspectrum(2,'z',fs);
189
+ fso.zw=zeros(length(fso.aw)-1,1);
190
+ elseif any(mode=='i')
191
+ [fso.bw,fso.aw]=stdspectrum(8,'z',fs);
192
+ fso.zw=zeros(length(fso.aw)-1,1);
193
+ end
194
+ else
195
+ fso=fs; % use existing structure
196
+ end
197
+ md=fso.fmd;
198
+ if nargin<3
199
+ mode=fso.fmd;
200
+ end
201
+ nsp=length(sp); % original length of speech
202
+ if all(mode~='z')
203
+ nz=ceil(0.35*fso.ffs); % number of zeros to append
204
+ sp=[sp(:);zeros(nz,1)];
205
+ else
206
+ nz=0;
207
+ end
208
+ ns=length(sp);
209
+ if ns % process this speech chunk
210
+ % apply the input filters to the speech
211
+ if all(md~='0') % implement the HPF as biquads to avoid rounding errors
212
+ [sq,fso.zl(1)]=filter(fso.abl(1,1:2),fso.abl(2,1:2),sp(:),fso.zl(1)); % highpass filter: real pole/zero
213
+ [sq,fso.zl(2:3)]=filter(fso.abl(1,3:5),fso.abl(2,3:5),sq(:),fso.zl(2:3)); % highpass filter: biquad 1
214
+ [sq,fso.zl(4:5)]=filter(fso.abl(1,6:8),fso.abl(2,6:8),sq(:),fso.zl(4:5)); % highpass filter: biquad 2
215
+ else
216
+ sq=sp(:);
217
+ end
218
+ if all(md~='h')
219
+ [sq,fso.zh]=filter(fso.bh,fso.ah,sq(:),fso.zh); % lowpass filter
220
+ end
221
+ if any(md=='a') || any(md=='i')
222
+ [sq,fso.zw]=filter(fso.bw,fso.aw,sq(:),fso.zw); % weighting filter
223
+ end
224
+ fso.ns=fso.ns+ns; % count the number of speech samples
225
+ fso.ss=fso.ss+sum(sq); % sum of speech samples
226
+ fso.ssq=fso.ssq+sum(sq.*sq); % sum of squared speech samples
227
+ [s,fso.ze]=filter(1,fso.ae,abs(sq(:)),fso.ze); % envelope filter
228
+ [qf,qe]=log2(s.^2); % take efficient log2 function, 2^qe is upper limit of bin
229
+ qe(qf==0)=-Inf; % fix zero values
230
+ [qe,qk,fso.zx]=maxfilt(qe,1,fso.nh,1,fso.zx); % apply the 0.2 second hangover
231
+ oemax=fso.emax;
232
+ fso.emax=max(oemax,max(qe)+1);
233
+ if fso.emax==-Inf
234
+ fso.kc(1)=fso.kc(1)+ns;
235
+ else
236
+ qe=min(fso.emax-qe,nbin); % force in the range 1:nbin. Bin k has 2^(emax-k-1)<=s^2<=2^(emax-k)
237
+ wqe=ones(length(qe),1);
238
+ % below: could use kc=cumsum(accumarray(qe,wqe,nbin)) but unsure about backwards compatibility
239
+ kc=cumsum(full(sparse(qe,wqe,wqe,nbin,1))); % cumulative occupancy counts
240
+ esh=fso.emax-oemax; % amount to shift down previous bin counts
241
+ if esh<nbin-1 % if any of the previous bins are worth keeping
242
+ kc(esh+1:nbin-1)=kc(esh+1:nbin-1)+fso.kc(1:nbin-esh-1);
243
+ kc(nbin)=kc(nbin)+sum(fso.kc(nbin-esh:nbin));
244
+ else
245
+ kc(nbin)=kc(nbin)+sum(fso.kc); % otherwise just add all old counts into the last (lowest) bin
246
+ end
247
+ fso.kc=kc;
248
+ end
249
+ end
250
+ if fso.ns % now calculate the output values
251
+ if fso.ssq>0
252
+ aj=10*log10(fso.ssq*(fso.kc).^(-1));
253
+ % equivalent to cj=20*log10(sqrt(2).^(fso.emax-(1:nbin)-1));
254
+ cj=10*log10(2)*(fso.emax-(1:nbin)-1); % lower limit of bin j in dB
255
+ mj=aj'-cj-thresh;
256
+ % jj=find(mj*sign(mj(1))<=0); % Find threshold
257
+ jj=find(mj(1:end-1)<0 & mj(2:end)>=0,1); % find +ve transition through threshold
258
+ if isempty(jj) % if we never cross the threshold
259
+ if mj(end)<=0 % if we end up below if
260
+ jj=length(mj)-1; % take the threshold to be the bottom of the last (lowest) bin
261
+ jf=1;
262
+ else % if we are always above it
263
+ jj=1; % take the threshold to be the bottom of the first (highest) bin
264
+ jf=0;
265
+ end
266
+ else
267
+ jf=1/(1-mj(jj+1)/mj(jj)); % fractional part of j using linear interpolation
268
+ end
269
+ lev=aj(jj)+jf*(aj(jj+1)-aj(jj)); % active level in decibels
270
+ lp=10.^(lev/10); % active level in power
271
+ if any(md=='d') % 'd' option -> output in dB
272
+ lev=[lev 10*log10(fso.ssq/fso.ns)];
273
+ else % ~'d' option -> output in power
274
+ lev=[lp fso.ssq/fso.ns];
275
+ end
276
+ af=fso.ssq/(fso.ns*lp);
277
+ else % if all samples are equal to zero
278
+ af=0;
279
+ if any(md=='d') % 'd' option -> output in dB
280
+ lev=[-Inf -Inf]; % active level is 0 dB
281
+ else % ~'d' option -> output in power
282
+ lev=[0 0]; % active level is 0 power
283
+ end
284
+ end
285
+ if all(md~='l')
286
+ lev=lev(1); % only output the first element of lev unless 'l' option
287
+ end
288
+ end
289
+ if nargout>3
290
+ vad=maxfilt(s(1:nsp),1,fso.nh,1);
291
+ vad=vad>(sqrt(lp)/10^(thresh/20));
292
+ end
293
+ if ~nargout
294
+ vad=maxfilt(s,1,fso.nh,1);
295
+ vad=vad>(sqrt(lp)/10^(thresh/20));
296
+ levdb=10*log10(lp);
297
+ clf;
298
+ subplot(2,2,[1 2]);
299
+ tax=(1:ns)/fso.ffs;
300
+ plot(tax,sp,'-y',tax,s,'-r',tax,(vad>0)*sqrt(lp),'-b');
301
+ xlabel('Time (s)');
302
+ title(sprintf('Active Level = %.2g dB, Activity = %.0f%% (ITU-T P.56)',levdb,100*af));
303
+ axisenlarge([-1 -1 -1.4 -1.05]);
304
+ if nz>0
305
+ hold on
306
+ ylim=get(gca,'ylim');
307
+ plot(tax(end-nz)*[1 1],ylim,':k');
308
+ hold off
309
+ end
310
+ ylabel('Amplitude');
311
+ legend('Signal','Smoothed envelope','VAD * Active-Level','Location','SouthEast');
312
+ subplot(2,2,4);
313
+ plot(cj,repmat(levdb,nbin,1),'k:',cj,aj(:),'-b',cj,cj,'-r',levdb-thresh*ones(1,2),[levdb-thresh levdb],'-r');
314
+ xlabel('Threshold (dB)');
315
+ ylabel('Active Level (dB)');
316
+ legend('Active Level','Speech>Thresh','Threshold','Location','NorthWest');
317
+ texthvc(levdb-thresh,levdb-0.5*thresh,sprintf('%.1f dB ',thresh),'rmr');
318
+ axisenlarge([-1 -1.05]);
319
+ ylim=get(gca,'ylim');
320
+ set(gca,'ylim',[levdb-1.2*thresh max(ylim(2),levdb+1.9*thresh)]);
321
+ kch=filter([1 -1],1,kc);
322
+ subplot(2,2,3);
323
+ bar(5*log10(2)+cj(end:-1:1),kch(end:-1:1)*100/kc(end));
324
+ set(gca,'xlim',[cj(end) cj(1)+10*log10(2)]);
325
+ ylim=get(gca,'ylim');
326
+ hold on
327
+ plot(lev([1 1]),ylim,'k:',lev([1 1])-thresh,ylim,'r:');
328
+ hold off
329
+ texthvc(lev(1),ylim(2),sprintf(' Act\n Lev'),'ltk');
330
+ texthvc(lev(1)-thresh,ylim(2),sprintf('Threshold '),'rtr');
331
+ xlabel('Frame power (dB)')
332
+ ylabel('% frames');
333
+ elseif any(md=='n') || any(md=='N') % output normalized speech waveform
334
+ fsx=fso; % shift along other outputs
335
+ fso=af;
336
+ af=lev;
337
+ if any(md=='n')
338
+ sq=sp; % 'n' -> use unfiltered speech
339
+ end
340
+ if fsx.ns>0 && fsx.ssq>0 % if there has been any non-zero speech
341
+ lev=sq(1:nsp)/sqrt(lp);
342
+ else
343
+ lev=sq(1:nsp);
344
+ end
345
+ end
create-speaker-mixtures-2345/create-speaker-mixtures-2345/create_wav_2speakers.m ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ % create_wav_2_speakers.m
2
+ %
3
+ % Create 2-speaker mixtures
4
+ %
5
+ % This script assumes that WSJ0's wv1 sphere files have already
6
+ % been converted to wav files, using the original folder structure
7
+ % under wsj0/, e.g.,
8
+ % 11-1.1/wsj0/si_tr_s/01t/01to030v.wv1 is converted to wav and
9
+ % stored in YOUR_PATH/wsj0/si_tr_s/01t/01to030v.wav, and
10
+ % 11-6.1/wsj0/si_dt_05/050/050a0501.wv1 is converted to wav and
11
+ % stored in YOUR_PATH/wsj0/si_dt_05/050/050a0501.wav.
12
+ % Relevant data from all disks are assumed merged under YOUR_PATH/wsj0/
13
+ %
14
+ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
15
+ % Copyright (C) 2016 Mitsubishi Electric Research Labs
16
+ % (Jonathan Le Roux, John R. Hershey, Zhuo Chen)
17
+ % Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
18
+ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
19
+
20
+
21
+ data_type = {'tr','cv','tt'};
22
+ wsj0root = '/home/joseph/Desktop/WSJ0/'; % YOUR_PATH/, the folder containing wsj0/
23
+ output_dir16k='/home/joseph/Desktop/WSJ0/dataset/2speakers/wav16k';
24
+ output_dir8k='/home/joseph/Desktop/WSJ0/dataset/2speakers/wav8k';
25
+
26
+ min_max = {'min'};
27
+
28
+ useaudioread = 0;
29
+ if exist('audioread','file')
30
+ useaudioread = 1;
31
+ end
32
+
33
+ for i_mm = 1:length(min_max)
34
+ for i_type = 1:length(data_type)
35
+ if ~exist([output_dir16k '/' min_max{i_mm} '/' data_type{i_type}],'dir')
36
+ mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type}]);
37
+ end
38
+ if ~exist([output_dir8k '/' min_max{i_mm} '/' data_type{i_type}],'dir')
39
+ mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type}]);
40
+ end
41
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s1/']); %#ok<NASGU>
42
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s2/']); %#ok<NASGU>
43
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/mix/']); %#ok<NASGU>
44
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s1/']); %#ok<NASGU>
45
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s2/']); %#ok<NASGU>
46
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/mix/']);
47
+
48
+ TaskFile = ['mix_2_spk_' data_type{i_type} '.txt'];
49
+ fid=fopen(TaskFile,'r');
50
+ C=textscan(fid,'%s %f %s %f');
51
+
52
+ Source1File = ['mix_2_spk_' min_max{i_mm} '_' data_type{i_type} '_1'];
53
+ Source2File = ['mix_2_spk_' min_max{i_mm} '_' data_type{i_type} '_2'];
54
+ MixFile = ['mix_2_spk_' min_max{i_mm} '_' data_type{i_type} '_mix'];
55
+
56
+ fid_s1 = fopen(Source1File,'w');
57
+ fid_s2 = fopen(Source2File,'w');
58
+ fid_m = fopen(MixFile,'w');
59
+
60
+ num_files = length(C{1});
61
+ fs8k=8000;
62
+
63
+ scaling_16k = zeros(num_files,2);
64
+ scaling_8k = zeros(num_files,2);
65
+ scaling16bit_16k = zeros(num_files,1);
66
+ scaling16bit_8k = zeros(num_files,1);
67
+ fprintf(1,'%s\n',[min_max{i_mm} '_' data_type{i_type}]);
68
+ for i = 1:num_files
69
+ [inwav1_dir,invwav1_name,inwav1_ext] = fileparts(C{1}{i});
70
+ [inwav2_dir,invwav2_name,inwav2_ext] = fileparts(C{3}{i});
71
+ fprintf(fid_s1,'%s\n',C{1}{i});
72
+ fprintf(fid_s2,'%s\n',C{3}{i});
73
+ inwav1_snr = C{2}(i);
74
+ inwav2_snr = C{4}(i);
75
+ mix_name = [invwav1_name,'_',num2str(inwav1_snr),'_',invwav2_name,'_',num2str(inwav2_snr)];
76
+ fprintf(fid_m,'%s\n',mix_name);
77
+
78
+ % get input wavs
79
+ if useaudioread
80
+ [s1, fs] = audioread([wsj0root C{1}{i}]);
81
+ s2 = audioread([wsj0root C{3}{i}]);
82
+ else
83
+ [s1, fs] = wavread([wsj0root C{1}{i}]); %#ok<*DWVRD>
84
+ s2 = wavread([wsj0root C{3}{i}]);
85
+ end
86
+
87
+ % resample, normalize 8 kHz file, save scaling factor
88
+ s1_8k=resample(s1,fs8k,fs);
89
+ [s1_8k,lev1]=activlev(s1_8k,fs8k,'n'); % y_norm = y /sqrt(lev);
90
+ s2_8k=resample(s2,fs8k,fs);
91
+ [s2_8k,lev2]=activlev(s2_8k,fs8k,'n');
92
+
93
+ weight_1=10^(inwav1_snr/20);
94
+ weight_2=10^(inwav2_snr/20);
95
+
96
+ s1_8k = weight_1 * s1_8k;
97
+ s2_8k = weight_2 * s2_8k;
98
+
99
+ switch min_max{i_mm}
100
+ case 'max'
101
+ mix_8k_length = max(length(s1_8k),length(s2_8k));
102
+ s1_8k = cat(1,s1_8k,zeros(mix_8k_length - length(s1_8k),1));
103
+ s2_8k = cat(1,s2_8k,zeros(mix_8k_length - length(s2_8k),1));
104
+ case 'min'
105
+ mix_8k_length = min(length(s1_8k),length(s2_8k));
106
+ s1_8k = s1_8k(1:mix_8k_length);
107
+ s2_8k = s2_8k(1:mix_8k_length);
108
+ end
109
+ mix_8k = s1_8k + s2_8k;
110
+
111
+ max_amp_8k = max(cat(1,abs(mix_8k(:)),abs(s1_8k(:)),abs(s2_8k(:))));
112
+ mix_scaling_8k = 1/max_amp_8k*0.9;
113
+ s1_8k = mix_scaling_8k * s1_8k;
114
+ s2_8k = mix_scaling_8k * s2_8k;
115
+ mix_8k = mix_scaling_8k * mix_8k;
116
+
117
+ % apply same gain to 16 kHz file
118
+ s1_16k = weight_1 * s1 / sqrt(lev1);
119
+ s2_16k = weight_2 * s2 / sqrt(lev2);
120
+
121
+ switch min_max{i_mm}
122
+ case 'max'
123
+ mix_16k_length = max(length(s1_16k),length(s2_16k));
124
+ s1_16k = cat(1,s1_16k,zeros(mix_16k_length - length(s1_16k),1));
125
+ s2_16k = cat(1,s2_16k,zeros(mix_16k_length - length(s2_16k),1));
126
+ case 'min'
127
+ mix_16k_length = min(length(s1_16k),length(s2_16k));
128
+ s1_16k = s1_16k(1:mix_16k_length);
129
+ s2_16k = s2_16k(1:mix_16k_length);
130
+ end
131
+ mix_16k = s1_16k + s2_16k;
132
+
133
+ max_amp_16k = max(cat(1,abs(mix_16k(:)),abs(s1_16k(:)),abs(s2_16k(:))));
134
+ mix_scaling_16k = 1/max_amp_16k*0.9;
135
+ s1_16k = mix_scaling_16k * s1_16k;
136
+ s2_16k = mix_scaling_16k * s2_16k;
137
+ mix_16k = mix_scaling_16k * mix_16k;
138
+
139
+ % save 8 kHz and 16 kHz mixtures, as well as
140
+ % necessary scaling factors
141
+
142
+ scaling_16k(i,1) = weight_1 * mix_scaling_16k/ sqrt(lev1);
143
+ scaling_16k(i,2) = weight_2 * mix_scaling_16k/ sqrt(lev2);
144
+ scaling_8k(i,1) = weight_1 * mix_scaling_8k/ sqrt(lev1);
145
+ scaling_8k(i,2) = weight_2 * mix_scaling_8k/ sqrt(lev2);
146
+
147
+ scaling16bit_16k(i) = mix_scaling_16k;
148
+ scaling16bit_8k(i) = mix_scaling_8k;
149
+
150
+ if useaudioread
151
+ s1_8k = int16(round((2^15)*s1_8k));
152
+ s2_8k = int16(round((2^15)*s2_8k));
153
+ mix_8k = int16(round((2^15)*mix_8k));
154
+ s1_16k = int16(round((2^15)*s1_16k));
155
+ s2_16k = int16(round((2^15)*s2_16k));
156
+ mix_16k = int16(round((2^15)*mix_16k));
157
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s1/' mix_name '.wav'],s1_8k,fs8k);
158
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s1/' mix_name '.wav'],s1_16k,fs);
159
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s2/' mix_name '.wav'],s2_8k,fs8k);
160
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s2/' mix_name '.wav'],s2_16k,fs);
161
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/mix/' mix_name '.wav'],mix_8k,fs8k);
162
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/mix/' mix_name '.wav'],mix_16k,fs);
163
+ else
164
+ wavwrite(s1_8k,fs8k,[output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s1/' mix_name '.wav']); %#ok<*DWVWR>
165
+ wavwrite(s1_16k,fs,[output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s1/' mix_name '.wav']);
166
+ wavwrite(s2_8k,fs8k,[output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s2/' mix_name '.wav']);
167
+ wavwrite(s2_16k,fs,[output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s2/' mix_name '.wav']);
168
+ wavwrite(mix_8k,fs8k,[output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/mix/' mix_name '.wav']);
169
+ wavwrite(mix_16k,fs,[output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/mix/' mix_name '.wav']);
170
+ end
171
+
172
+ if mod(i,10)==0
173
+ fprintf(1,'.');
174
+ if mod(i,200)==0
175
+ fprintf(1,'\n');
176
+ end
177
+ end
178
+
179
+ end
180
+ save([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/scaling.mat'],'scaling_8k','scaling16bit_8k');
181
+ save([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/scaling.mat'],'scaling_16k','scaling16bit_16k');
182
+
183
+ fclose(fid);
184
+ fclose(fid_s1);
185
+ fclose(fid_s2);
186
+ fclose(fid_m);
187
+ end
188
+ end
create-speaker-mixtures-2345/create-speaker-mixtures-2345/create_wav_3speakers.m ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ % create_wav_3_speakers.m
2
+ %
3
+ % Create 3-speaker mixtures
4
+ %
5
+ % This script assumes that WSJ0's wv1 sphere files have already
6
+ % been converted to wav files, using the original folder structure
7
+ % under wsj0/, e.g.,
8
+ % 11-1.1/wsj0/si_tr_s/01t/01to030v.wv1 is converted to wav and
9
+ % stored in YOUR_PATH/wsj0/si_tr_s/01t/01to030v.wav, and
10
+ % 11-6.1/wsj0/si_dt_05/050/050a0501.wv1 is converted to wav and
11
+ % stored in YOUR_PATH/wsj0/si_dt_05/050/050a0501.wav.
12
+ % Relevant data from all disks are assumed merged under YOUR_PATH/wsj0/
13
+ %
14
+ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
15
+ % Copyright (C) 2016 Mitsubishi Electric Research Labs
16
+ % (Jonathan Le Roux, John R. Hershey, Zhuo Chen)
17
+ % Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
18
+ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
19
+
20
+ %addpath('./voicebox')
21
+ data_type = {'tr','cv','tt'};
22
+ wsj0root = '/home/joseph/Desktop/WSJ0/'; % YOUR_PATH/, the folder containing wsj0/
23
+ output_dir16k='/home/joseph/Desktop/WSJ0/dataset/3speakers/wav16k';
24
+ output_dir8k='/home/joseph/Desktop/WSJ0/dataset/3speakers/wav8k';
25
+
26
+ min_max = {'min'}; %{'min','max'};
27
+
28
+ for i_mm = 1:length(min_max)
29
+ for i_type = 1:length(data_type)
30
+ if ~exist([output_dir16k '/' min_max{i_mm} '/' data_type{i_type}],'dir')
31
+ mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type}]);
32
+ end
33
+ if ~exist([output_dir8k '/' min_max{i_mm} '/' data_type{i_type}],'dir')
34
+ mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type}]);
35
+ end
36
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s1/']); %#ok<NASGU>
37
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s2/']); %#ok<NASGU>
38
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s3/']); %#ok<NASGU>
39
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/mix/']); %#ok<NASGU>
40
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s1/']); %#ok<NASGU>
41
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s2/']); %#ok<NASGU>
42
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s3/']); %#ok<NASGU>
43
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/mix/']);
44
+
45
+ TaskFile = ['mix_3_spk_' data_type{i_type} '.txt'];
46
+ fid=fopen(TaskFile,'r');
47
+ C=textscan(fid,'%s %f %s %f %s %f');
48
+
49
+ Source1File = ['mix_3_spk_' min_max{i_mm} '_' data_type{i_type} '_1'];
50
+ Source2File = ['mix_3_spk_' min_max{i_mm} '_' data_type{i_type} '_2'];
51
+ Source3File = ['mix_3_spk_' min_max{i_mm} '_' data_type{i_type} '_3'];
52
+ MixFile = ['mix_3_spk_' min_max{i_mm} '_' data_type{i_type} '_mix'];
53
+ fid_s1 = fopen(Source1File,'w');
54
+ fid_s2 = fopen(Source2File,'w');
55
+ fid_s3 = fopen(Source3File,'w');
56
+ fid_m = fopen(MixFile,'w');
57
+
58
+ num_files = length(C{1});
59
+ fs8k=8000;
60
+
61
+ scaling_16k = zeros(num_files,3);
62
+ scaling_8k = zeros(num_files,3);
63
+ scaling16bit_16k = zeros(num_files,1);
64
+ scaling16bit_8k = zeros(num_files,1);
65
+ fprintf(1,'%s\n',[min_max{i_mm} '_' data_type{i_type}]);
66
+ for i = 1:num_files
67
+ [inwav1_dir,invwav1_name,inwav1_ext] = fileparts(C{1}{i});
68
+ [inwav2_dir,invwav2_name,inwav2_ext] = fileparts(C{3}{i});
69
+ [inwav3_dir,invwav3_name,inwav3_ext] = fileparts(C{5}{i});
70
+ fprintf(fid_s1,'%s\n',C{1}{i});%[inwav1_dir,'/',invwav1_name,inwav1_ext]);
71
+ fprintf(fid_s2,'%s\n',C{3}{i});%[inwav2_dir,'/',invwav2_name,inwav2_ext]);
72
+ fprintf(fid_s3,'%s\n',C{5}{i});%[inwav3_dir,'/',invwav3_name,inwav3_ext]);
73
+ inwav1_snr = C{2}(i);
74
+ inwav2_snr = C{4}(i);
75
+ inwav3_snr = C{6}(i);
76
+ mix_name = [invwav1_name,'_',num2str(inwav1_snr),...
77
+ '_',invwav2_name,'_',num2str(inwav2_snr),...
78
+ '_',invwav3_name,'_',num2str(inwav3_snr)];
79
+ fprintf(fid_m,'%s\n',mix_name);
80
+
81
+ % get input wavs
82
+ [s1, fs] = audioread([wsj0root C{1}{i}]);
83
+ s2 = audioread([wsj0root C{3}{i}]);
84
+ s3 = audioread([wsj0root C{5}{i}]);
85
+
86
+ % resample, normalize 8 kHz file, save scaling factor
87
+ s1_8k=resample(s1,fs8k,fs);
88
+ [s1_8k,lev1]=activlev(s1_8k,fs8k,'n'); % y_norm = y /sqrt(lev);
89
+ s2_8k=resample(s2,fs8k,fs);
90
+ [s2_8k,lev2]=activlev(s2_8k,fs8k,'n');
91
+ s3_8k=resample(s3,fs8k,fs);
92
+ [s3_8k,lev3]=activlev(s3_8k,fs8k,'n');
93
+
94
+ weight_1=10^(inwav1_snr/20);
95
+ weight_2=10^(inwav2_snr/20);
96
+ weight_3=10^(inwav3_snr/20);
97
+
98
+ s1_8k = weight_1 * s1_8k;
99
+ s2_8k = weight_2 * s2_8k;
100
+ s3_8k = weight_3 * s3_8k;
101
+
102
+ switch min_max{i_mm}
103
+ case 'max'
104
+ mix_8k_length = max([length(s1_8k),length(s2_8k),length(s3_8k)]);
105
+ s1_8k = cat(1,s1_8k,zeros(mix_8k_length - length(s1_8k),1));
106
+ s2_8k = cat(1,s2_8k,zeros(mix_8k_length - length(s2_8k),1));
107
+ s3_8k = cat(1,s3_8k,zeros(mix_8k_length - length(s3_8k),1));
108
+ case 'min'
109
+ mix_8k_length = min([length(s1_8k),length(s2_8k),length(s3_8k)]);
110
+ s1_8k = s1_8k(1:mix_8k_length);
111
+ s2_8k = s2_8k(1:mix_8k_length);
112
+ s3_8k = s3_8k(1:mix_8k_length);
113
+ end
114
+ mix_8k = s1_8k + s2_8k + s3_8k;
115
+
116
+ max_amp_8k = max(cat(1,abs(mix_8k(:)),abs(s1_8k(:)),abs(s2_8k(:)),abs(s3_8k(:))));
117
+ mix_scaling_8k = 1/max_amp_8k*0.9;
118
+ s1_8k = mix_scaling_8k * s1_8k;
119
+ s2_8k = mix_scaling_8k * s2_8k;
120
+ s3_8k = mix_scaling_8k * s3_8k;
121
+ mix_8k = mix_scaling_8k * mix_8k;
122
+
123
+ % apply same gain to 16 kHz file
124
+ s1_16k = weight_1 * s1 / sqrt(lev1);
125
+ s2_16k = weight_2 * s2 / sqrt(lev2);
126
+ s3_16k = weight_3 * s3 / sqrt(lev3);
127
+
128
+ switch min_max{i_mm}
129
+ case 'max'
130
+ mix_16k_length = max([length(s1_16k),length(s2_16k),length(s3_16k)]);
131
+ s1_16k = cat(1,s1_16k,zeros(mix_16k_length - length(s1_16k),1));
132
+ s2_16k = cat(1,s2_16k,zeros(mix_16k_length - length(s2_16k),1));
133
+ s3_16k = cat(1,s3_16k,zeros(mix_16k_length - length(s3_16k),1));
134
+ case 'min'
135
+ mix_16k_length = min([length(s1_16k),length(s2_16k),length(s3_16k)]);
136
+ s1_16k = s1_16k(1:mix_16k_length);
137
+ s2_16k = s2_16k(1:mix_16k_length);
138
+ s3_16k = s3_16k(1:mix_16k_length);
139
+ end
140
+ mix_16k = s1_16k + s2_16k + s3_16k;
141
+
142
+ max_amp_16k = max(cat(1,abs(mix_16k(:)),abs(s1_16k(:)),abs(s2_16k(:)),abs(s3_16k(:))));
143
+ mix_scaling_16k = 1/max_amp_16k*0.9;
144
+ s1_16k = mix_scaling_16k * s1_16k;
145
+ s2_16k = mix_scaling_16k * s2_16k;
146
+ s3_16k = mix_scaling_16k * s3_16k;
147
+ mix_16k = mix_scaling_16k * mix_16k;
148
+
149
+ % save 8 kHz and 16 kHz mixtures, as well as
150
+ % necessary scaling factors
151
+
152
+ scaling_16k(i,1) = weight_1 * mix_scaling_16k/ sqrt(lev1);
153
+ scaling_16k(i,2) = weight_2 * mix_scaling_16k/ sqrt(lev2);
154
+ scaling_16k(i,3) = weight_3 * mix_scaling_16k/ sqrt(lev3);
155
+ scaling_8k(i,1) = weight_1 * mix_scaling_8k/ sqrt(lev1);
156
+ scaling_8k(i,2) = weight_2 * mix_scaling_8k/ sqrt(lev2);
157
+ scaling_8k(i,3) = weight_3 * mix_scaling_8k/ sqrt(lev3);
158
+
159
+ scaling16bit_16k(i) = mix_scaling_16k;
160
+ scaling16bit_8k(i) = mix_scaling_8k;
161
+
162
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s1/' mix_name '.wav'], s1_8k,fs8k);
163
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s1/' mix_name '.wav'], s1_16k,fs);
164
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s2/' mix_name '.wav'], s2_8k,fs8k);
165
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s2/' mix_name '.wav'], s2_16k,fs);
166
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s3/' mix_name '.wav'], s3_8k,fs8k);
167
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s3/' mix_name '.wav'], s3_16k,fs);
168
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/mix/' mix_name '.wav'], mix_8k,fs8k);
169
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/mix/' mix_name '.wav'], mix_16k,fs);
170
+
171
+ if mod(i,10)==0
172
+ fprintf(1,'.');
173
+ if mod(i,200)==0
174
+ fprintf(1,'\n');
175
+ end
176
+ end
177
+
178
+ end
179
+ save([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/scaling.mat'],'scaling_8k','scaling16bit_8k');
180
+ save([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/scaling.mat'],'scaling_16k','scaling16bit_16k');
181
+
182
+ fclose(fid);
183
+ fclose(fid_s1);
184
+ fclose(fid_s2);
185
+ fclose(fid_s3);
186
+ fclose(fid_m);
187
+ end
188
+ end
create-speaker-mixtures-2345/create-speaker-mixtures-2345/create_wav_4speakers.m ADDED
@@ -0,0 +1,214 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ % create_wav_3_speakers.m
2
+ %
3
+ % Create 3-speaker mixtures
4
+ %
5
+ % This script assumes that WSJ0's wv1 sphere files have already
6
+ % been converted to wav files, using the original folder structure
7
+ % under wsj0/, e.g.,
8
+ % 11-1.1/wsj0/si_tr_s/01t/01to030v.wv1 is converted to wav and
9
+ % stored in YOUR_PATH/wsj0/si_tr_s/01t/01to030v.wav, and
10
+ % 11-6.1/wsj0/si_dt_05/050/050a0501.wv1 is converted to wav and
11
+ % stored in YOUR_PATH/wsj0/si_dt_05/050/050a0501.wav.
12
+ % Relevant data from all disks are assumed merged under YOUR_PATH/wsj0/
13
+ %
14
+ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
15
+ % Copyright (C) 2016 Mitsubishi Electric Research Labs
16
+ % (Jonathan Le Roux, John R. Hershey, Zhuo Chen)
17
+ % Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
18
+ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
19
+
20
+ %addpath('./voicebox')
21
+ data_type = {'tr','cv','tt'};
22
+ wsj0root = '/home/joseph/Desktop/WSJ0/'; % YOUR_PATH/, the folder containing wsj0/
23
+ output_dir16k='/home/joseph/Desktop/WSJ0/dataset/4speakers/wav16k';
24
+ output_dir8k='/home/joseph/Desktop/WSJ0/dataset/4speakers/wav8k';
25
+
26
+ min_max = {'min'}; %{'min','max'};
27
+
28
+ for i_mm = 1:length(min_max)
29
+ for i_type = 1:length(data_type)
30
+ if ~exist([output_dir16k '/' min_max{i_mm} '/' data_type{i_type}],'dir')
31
+ mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type}]);
32
+ end
33
+ if ~exist([output_dir8k '/' min_max{i_mm} '/' data_type{i_type}],'dir')
34
+ mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type}]);
35
+ end
36
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s1/']); %#ok<NASGU>
37
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s2/']); %#ok<NASGU>
38
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s3/']); %#ok<NASGU>
39
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s4/']); %#ok<NASGU>
40
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/mix/']); %#ok<NASGU>
41
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s1/']); %#ok<NASGU>
42
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s2/']); %#ok<NASGU>
43
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s3/']); %#ok<NASGU>
44
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s4/']); %#ok<NASGU>
45
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/mix/']);
46
+
47
+ TaskFile = ['mix_4_spk_' data_type{i_type} '.txt'];
48
+ fid=fopen(TaskFile,'r');
49
+ C=textscan(fid,'%s %f %s %f %s %f %s %f');
50
+
51
+ Source1File = ['mix_4_spk_' min_max{i_mm} '_' data_type{i_type} '_1'];
52
+ Source2File = ['mix_4_spk_' min_max{i_mm} '_' data_type{i_type} '_2'];
53
+ Source3File = ['mix_4_spk_' min_max{i_mm} '_' data_type{i_type} '_3'];
54
+ Source4File = ['mix_4_spk_' min_max{i_mm} '_' data_type{i_type} '_4'];
55
+ MixFile = ['mix_4_spk_' min_max{i_mm} '_' data_type{i_type} '_mix'];
56
+ fid_s1 = fopen(Source1File,'w');
57
+ fid_s2 = fopen(Source2File,'w');
58
+ fid_s3 = fopen(Source3File,'w');
59
+ fid_s4 = fopen(Source4File,'w');
60
+ fid_m = fopen(MixFile,'w');
61
+
62
+ num_files = length(C{1});
63
+ fs8k=8000;
64
+
65
+ scaling_16k = zeros(num_files,3);
66
+ scaling_8k = zeros(num_files,3);
67
+ scaling16bit_16k = zeros(num_files,1);
68
+ scaling16bit_8k = zeros(num_files,1);
69
+ fprintf(1,'%s\n',[min_max{i_mm} '_' data_type{i_type}]);
70
+ for i = 1:num_files
71
+ [inwav1_dir,invwav1_name,inwav1_ext] = fileparts(C{1}{i});
72
+ [inwav2_dir,invwav2_name,inwav2_ext] = fileparts(C{3}{i});
73
+ [inwav3_dir,invwav3_name,inwav3_ext] = fileparts(C{5}{i});
74
+ [inwav4_dir,invwav4_name,inwav4_ext] = fileparts(C{7}{i});
75
+ fprintf(fid_s1,'%s\n',C{1}{i});%[inwav1_dir,'/',invwav1_name,inwav1_ext]);
76
+ fprintf(fid_s2,'%s\n',C{3}{i});%[inwav2_dir,'/',invwav2_name,inwav2_ext]);
77
+ fprintf(fid_s3,'%s\n',C{5}{i});%[inwav3_dir,'/',invwav3_name,inwav3_ext]);
78
+ fprintf(fid_s4,'%s\n',C{7}{i});%[inwav4_dir,'/',invwav4_name,inwav4_ext]);
79
+ inwav1_snr = C{2}(i);
80
+ inwav2_snr = C{4}(i);
81
+ inwav3_snr = C{6}(i);
82
+ inwav4_snr = C{8}(i);
83
+ mix_name = [invwav1_name,'_',num2str(inwav1_snr),...
84
+ '_',invwav2_name,'_',num2str(inwav2_snr),...
85
+ '_',invwav3_name,'_',num2str(inwav3_snr),...
86
+ '_',invwav4_name,'_',num2str(inwav4_snr)];
87
+ fprintf(fid_m,'%s\n',mix_name);
88
+
89
+ % get input wavs
90
+ [s1, fs] = audioread([wsj0root C{1}{i}]);
91
+ s2 = audioread([wsj0root C{3}{i}]);
92
+ s3 = audioread([wsj0root C{5}{i}]);
93
+ s4 = audioread([wsj0root C{7}{i}]);
94
+
95
+ % resample, normalize 8 kHz file, save scaling factor
96
+ s1_8k=resample(s1,fs8k,fs);
97
+ [s1_8k,lev1]=activlev(s1_8k,fs8k,'n'); % y_norm = y /sqrt(lev);
98
+ s2_8k=resample(s2,fs8k,fs);
99
+ [s2_8k,lev2]=activlev(s2_8k,fs8k,'n');
100
+ s3_8k=resample(s3,fs8k,fs);
101
+ [s3_8k,lev3]=activlev(s3_8k,fs8k,'n');
102
+ s4_8k=resample(s4,fs8k,fs);
103
+ [s4_8k,lev4]=activlev(s4_8k,fs8k,'n');
104
+
105
+ weight_1=10^(inwav1_snr/20);
106
+ weight_2=10^(inwav2_snr/20);
107
+ weight_3=10^(inwav3_snr/20);
108
+ weight_4=10^(inwav4_snr/20);
109
+
110
+ s1_8k = weight_1 * s1_8k;
111
+ s2_8k = weight_2 * s2_8k;
112
+ s3_8k = weight_3 * s3_8k;
113
+ s4_8k = weight_4 * s4_8k;
114
+
115
+ switch min_max{i_mm}
116
+ case 'max'
117
+ mix_8k_length = max([length(s1_8k),length(s2_8k),length(s3_8k),length(s4_8k)]);
118
+ s1_8k = cat(1,s1_8k,zeros(mix_8k_length - length(s1_8k),1));
119
+ s2_8k = cat(1,s2_8k,zeros(mix_8k_length - length(s2_8k),1));
120
+ s3_8k = cat(1,s3_8k,zeros(mix_8k_length - length(s3_8k),1));
121
+ s4_8k = cat(1,s4_8k,zeros(mix_8k_length - length(s4_8k),1));
122
+
123
+ case 'min'
124
+ mix_8k_length = min([length(s1_8k),length(s2_8k),length(s3_8k),length(s4_8k)]);
125
+ s1_8k = s1_8k(1:mix_8k_length);
126
+ s2_8k = s2_8k(1:mix_8k_length);
127
+ s3_8k = s3_8k(1:mix_8k_length);
128
+ s4_8k = s4_8k(1:mix_8k_length);
129
+ end
130
+ mix_8k = s1_8k + s2_8k + s3_8k + s4_8k;
131
+
132
+ max_amp_8k = max(cat(1,abs(mix_8k(:)),abs(s1_8k(:)),abs(s2_8k(:)),abs(s3_8k(:)),abs(s4_8k(:))));
133
+ mix_scaling_8k = 1/max_amp_8k*0.9;
134
+ s1_8k = mix_scaling_8k * s1_8k;
135
+ s2_8k = mix_scaling_8k * s2_8k;
136
+ s3_8k = mix_scaling_8k * s3_8k;
137
+ s4_8k = mix_scaling_8k * s4_8k;
138
+ mix_8k = mix_scaling_8k * mix_8k;
139
+
140
+ % apply same gain to 16 kHz file
141
+ s1_16k = weight_1 * s1 / sqrt(lev1);
142
+ s2_16k = weight_2 * s2 / sqrt(lev2);
143
+ s3_16k = weight_3 * s3 / sqrt(lev3);
144
+ s4_16k = weight_4 * s4 / sqrt(lev4);
145
+
146
+ switch min_max{i_mm}
147
+ case 'max'
148
+ mix_16k_length = max([length(s1_16k),length(s2_16k),length(s3_16k),length(s4_16k)]);
149
+ s1_16k = cat(1,s1_16k,zeros(mix_16k_length - length(s1_16k),1));
150
+ s2_16k = cat(1,s2_16k,zeros(mix_16k_length - length(s2_16k),1));
151
+ s3_16k = cat(1,s3_16k,zeros(mix_16k_length - length(s3_16k),1));
152
+ s4_16k = cat(1,s4_16k,zeros(mix_16k_length - length(s4_16k),1));
153
+ case 'min'
154
+ mix_16k_length = min([length(s1_16k),length(s2_16k),length(s3_16k),length(s4_16k)]);
155
+ s1_16k = s1_16k(1:mix_16k_length);
156
+ s2_16k = s2_16k(1:mix_16k_length);
157
+ s3_16k = s3_16k(1:mix_16k_length);
158
+ s4_16k = s4_16k(1:mix_16k_length);
159
+ end
160
+ mix_16k = s1_16k + s2_16k + s3_16k + s4_16k;
161
+
162
+ max_amp_16k = max(cat(1,abs(mix_16k(:)),abs(s1_16k(:)),abs(s2_16k(:)),abs(s3_16k(:)),abs(s4_16k(:))));
163
+ mix_scaling_16k = 1/max_amp_16k*0.9;
164
+ s1_16k = mix_scaling_16k * s1_16k;
165
+ s2_16k = mix_scaling_16k * s2_16k;
166
+ s3_16k = mix_scaling_16k * s3_16k;
167
+ s4_16k = mix_scaling_16k * s4_16k;
168
+ mix_16k = mix_scaling_16k * mix_16k;
169
+
170
+ % save 8 kHz and 16 kHz mixtures, as well as
171
+ % necessary scaling factors
172
+
173
+ scaling_16k(i,1) = weight_1 * mix_scaling_16k/ sqrt(lev1);
174
+ scaling_16k(i,2) = weight_2 * mix_scaling_16k/ sqrt(lev2);
175
+ scaling_16k(i,3) = weight_3 * mix_scaling_16k/ sqrt(lev3);
176
+ scaling_16k(i,4) = weight_4 * mix_scaling_16k/ sqrt(lev4);
177
+ scaling_8k(i,1) = weight_1 * mix_scaling_8k/ sqrt(lev1);
178
+ scaling_8k(i,2) = weight_2 * mix_scaling_8k/ sqrt(lev2);
179
+ scaling_8k(i,3) = weight_3 * mix_scaling_8k/ sqrt(lev3);
180
+ scaling_8k(i,4) = weight_4 * mix_scaling_8k/ sqrt(lev4);
181
+
182
+ scaling16bit_16k(i) = mix_scaling_16k;
183
+ scaling16bit_8k(i) = mix_scaling_8k;
184
+
185
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s1/' mix_name '.wav'], s1_8k,fs8k);
186
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s1/' mix_name '.wav'], s1_16k,fs);
187
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s2/' mix_name '.wav'], s2_8k,fs8k);
188
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s2/' mix_name '.wav'], s2_16k,fs);
189
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s3/' mix_name '.wav'], s3_8k,fs8k);
190
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s3/' mix_name '.wav'], s3_16k,fs);
191
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s4/' mix_name '.wav'], s4_8k,fs8k);
192
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s4/' mix_name '.wav'], s4_16k,fs);
193
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/mix/' mix_name '.wav'], mix_8k,fs8k);
194
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/mix/' mix_name '.wav'], mix_16k,fs);
195
+
196
+ if mod(i,10)==0
197
+ fprintf(1,'.');
198
+ if mod(i,200)==0
199
+ fprintf(1,'\n');
200
+ end
201
+ end
202
+
203
+ end
204
+ save([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/scaling.mat'],'scaling_8k','scaling16bit_8k');
205
+ save([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/scaling.mat'],'scaling_16k','scaling16bit_16k');
206
+
207
+ fclose(fid);
208
+ fclose(fid_s1);
209
+ fclose(fid_s2);
210
+ fclose(fid_s3);
211
+ fclose(fid_s4);
212
+ fclose(fid_m);
213
+ end
214
+ end
create-speaker-mixtures-2345/create-speaker-mixtures-2345/create_wav_5speakers.m ADDED
@@ -0,0 +1,238 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ % create_wav_3_speakers.m
2
+ %
3
+ % Create 3-speaker mixtures
4
+ %
5
+ % This script assumes that WSJ0's wv1 sphere files have already
6
+ % been converted to wav files, using the original folder structure
7
+ % under wsj0/, e.g.,
8
+ % 11-1.1/wsj0/si_tr_s/01t/01to030v.wv1 is converted to wav and
9
+ % stored in YOUR_PATH/wsj0/si_tr_s/01t/01to030v.wav, and
10
+ % 11-6.1/wsj0/si_dt_05/050/050a0501.wv1 is converted to wav and
11
+ % stored in YOUR_PATH/wsj0/si_dt_05/050/050a0501.wav.
12
+ % Relevant data from all disks are assumed merged under YOUR_PATH/wsj0/
13
+ %
14
+ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
15
+ % Copyright (C) 2016 Mitsubishi Electric Research Labs
16
+ % (Jonathan Le Roux, John R. Hershey, Zhuo Chen)
17
+ % Apache 2.0 (http://www.apache.org/licenses/LICENSE-2.0)
18
+ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
19
+
20
+ %addpath('./voicebox')
21
+ data_type = {'tr','cv','tt'};
22
+ wsj0root = '/home/joseph/Desktop/WSJ0/'; % YOUR_PATH/, the folder containing wsj0/
23
+ output_dir16k='/home/joseph/Desktop/WSJ0/dataset/5speakers/wav16k';
24
+ output_dir8k='/home/joseph/Desktop/WSJ0/dataset/5speakers/wav8k';
25
+
26
+ min_max = {'min'}; %{'min','max'};
27
+
28
+ for i_mm = 1:length(min_max)
29
+ for i_type = 1:length(data_type)
30
+ if ~exist([output_dir16k '/' min_max{i_mm} '/' data_type{i_type}],'dir')
31
+ mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type}]);
32
+ end
33
+ if ~exist([output_dir8k '/' min_max{i_mm} '/' data_type{i_type}],'dir')
34
+ mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type}]);
35
+ end
36
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s1/']); %#ok<NASGU>
37
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s2/']); %#ok<NASGU>
38
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s3/']); %#ok<NASGU>
39
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s4/']); %#ok<NASGU>
40
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s5/']); %#ok<NASGU>
41
+ status = mkdir([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/mix/']); %#ok<NASGU>
42
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s1/']); %#ok<NASGU>
43
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s2/']); %#ok<NASGU>
44
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s3/']); %#ok<NASGU>
45
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s4/']); %#ok<NASGU>
46
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s5/']); %#ok<NASGU>
47
+ status = mkdir([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/mix/']);
48
+
49
+ TaskFile = ['mix_5_spk_' data_type{i_type} '.txt'];
50
+ fid=fopen(TaskFile,'r');
51
+ C=textscan(fid,'%s %f %s %f %s %f %s %f %s %f');
52
+
53
+ Source1File = ['mix_5_spk_' min_max{i_mm} '_' data_type{i_type} '_1'];
54
+ Source2File = ['mix_5_spk_' min_max{i_mm} '_' data_type{i_type} '_2'];
55
+ Source3File = ['mix_5_spk_' min_max{i_mm} '_' data_type{i_type} '_3'];
56
+ Source4File = ['mix_5_spk_' min_max{i_mm} '_' data_type{i_type} '_4'];
57
+ Source5File = ['mix_5_spk_' min_max{i_mm} '_' data_type{i_type} '_5'];
58
+ MixFile = ['mix_5_spk_' min_max{i_mm} '_' data_type{i_type} '_mix'];
59
+ fid_s1 = fopen(Source1File,'w');
60
+ fid_s2 = fopen(Source2File,'w');
61
+ fid_s3 = fopen(Source3File,'w');
62
+ fid_s4 = fopen(Source4File,'w');
63
+ fid_s5 = fopen(Source5File,'w');
64
+ fid_m = fopen(MixFile,'w');
65
+
66
+ num_files = length(C{1});
67
+ fs8k=8000;
68
+
69
+ scaling_16k = zeros(num_files,3);
70
+ scaling_8k = zeros(num_files,3);
71
+ scaling16bit_16k = zeros(num_files,1);
72
+ scaling16bit_8k = zeros(num_files,1);
73
+ fprintf(1,'%s\n',[min_max{i_mm} '_' data_type{i_type}]);
74
+ for i = 1:num_files
75
+ [inwav1_dir,invwav1_name,inwav1_ext] = fileparts(C{1}{i});
76
+ [inwav2_dir,invwav2_name,inwav2_ext] = fileparts(C{3}{i});
77
+ [inwav3_dir,invwav3_name,inwav3_ext] = fileparts(C{5}{i});
78
+ [inwav4_dir,invwav4_name,inwav4_ext] = fileparts(C{7}{i});
79
+ [inwav5_dir,invwav5_name,inwav5_ext] = fileparts(C{9}{i});
80
+ fprintf(fid_s1,'%s\n',C{1}{i});%[inwav1_dir,'/',invwav1_name,inwav1_ext]);
81
+ fprintf(fid_s2,'%s\n',C{3}{i});%[inwav2_dir,'/',invwav2_name,inwav2_ext]);
82
+ fprintf(fid_s3,'%s\n',C{5}{i});%[inwav3_dir,'/',invwav3_name,inwav3_ext]);
83
+ fprintf(fid_s4,'%s\n',C{7}{i});%[inwav4_dir,'/',invwav4_name,inwav4_ext]);
84
+ fprintf(fid_s5,'%s\n',C{9}{i});%[inwav5_dir,'/',invwav5_name,inwav5_ext]);
85
+ inwav1_snr = C{2}(i);
86
+ inwav2_snr = C{4}(i);
87
+ inwav3_snr = C{6}(i);
88
+ inwav4_snr = C{8}(i);
89
+ inwav5_snr = C{10}(i);
90
+ mix_name = [invwav1_name,'_',num2str(inwav1_snr),...
91
+ '_',invwav2_name,'_',num2str(inwav2_snr),...
92
+ '_',invwav3_name,'_',num2str(inwav3_snr),...
93
+ '_',invwav4_name,'_',num2str(inwav4_snr),...
94
+ '_',invwav5_name,'_',num2str(inwav5_snr)];
95
+ fprintf(fid_m,'%s\n',mix_name);
96
+
97
+ % get input wavs
98
+ [s1, fs] = audioread([wsj0root C{1}{i}]);
99
+ s2 = audioread([wsj0root C{3}{i}]);
100
+ s3 = audioread([wsj0root C{5}{i}]);
101
+ s4 = audioread([wsj0root C{7}{i}]);
102
+ s5 = audioread([wsj0root C{9}{i}]);
103
+
104
+ % resample, normalize 8 kHz file, save scaling factor
105
+ s1_8k=resample(s1,fs8k,fs);
106
+ [s1_8k,lev1]=activlev(s1_8k,fs8k,'n'); % y_norm = y /sqrt(lev);
107
+ s2_8k=resample(s2,fs8k,fs);
108
+ [s2_8k,lev2]=activlev(s2_8k,fs8k,'n');
109
+ s3_8k=resample(s3,fs8k,fs);
110
+ [s3_8k,lev3]=activlev(s3_8k,fs8k,'n');
111
+ s4_8k=resample(s4,fs8k,fs);
112
+ [s4_8k,lev4]=activlev(s4_8k,fs8k,'n');
113
+ s5_8k=resample(s5,fs8k,fs);
114
+ [s5_8k,lev5]=activlev(s5_8k,fs8k,'n');
115
+
116
+ weight_1=10^(inwav1_snr/20);
117
+ weight_2=10^(inwav2_snr/20);
118
+ weight_3=10^(inwav3_snr/20);
119
+ weight_4=10^(inwav4_snr/20);
120
+ weight_5=10^(inwav5_snr/20);
121
+
122
+ s1_8k = weight_1 * s1_8k;
123
+ s2_8k = weight_2 * s2_8k;
124
+ s3_8k = weight_3 * s3_8k;
125
+ s4_8k = weight_4 * s4_8k;
126
+ s5_8k = weight_5 * s5_8k;
127
+
128
+ switch min_max{i_mm}
129
+ case 'max'
130
+ mix_8k_length = max([length(s1_8k),length(s2_8k),length(s3_8k),length(s4_8k),length(s5_8k)]);
131
+ s1_8k = cat(1,s1_8k,zeros(mix_8k_length - length(s1_8k),1));
132
+ s2_8k = cat(1,s2_8k,zeros(mix_8k_length - length(s2_8k),1));
133
+ s3_8k = cat(1,s3_8k,zeros(mix_8k_length - length(s3_8k),1));
134
+ s4_8k = cat(1,s4_8k,zeros(mix_8k_length - length(s4_8k),1));
135
+ s5_8k = cat(1,s5_8k,zeros(mix_8k_length - length(s5_8k),1));
136
+
137
+ case 'min'
138
+ mix_8k_length = min([length(s1_8k),length(s2_8k),length(s3_8k),length(s4_8k),length(s5_8k)]);
139
+ s1_8k = s1_8k(1:mix_8k_length);
140
+ s2_8k = s2_8k(1:mix_8k_length);
141
+ s3_8k = s3_8k(1:mix_8k_length);
142
+ s4_8k = s4_8k(1:mix_8k_length);
143
+ s5_8k = s5_8k(1:mix_8k_length);
144
+ end
145
+ mix_8k = s1_8k + s2_8k + s3_8k + s4_8k + s5_8k;
146
+
147
+ max_amp_8k = max(cat(1,abs(mix_8k(:)),abs(s1_8k(:)),abs(s2_8k(:)),abs(s3_8k(:)),abs(s4_8k(:)),abs(s5_8k(:))));
148
+ mix_scaling_8k = 1/max_amp_8k*0.9;
149
+ s1_8k = mix_scaling_8k * s1_8k;
150
+ s2_8k = mix_scaling_8k * s2_8k;
151
+ s3_8k = mix_scaling_8k * s3_8k;
152
+ s4_8k = mix_scaling_8k * s4_8k;
153
+ s5_8k = mix_scaling_8k * s5_8k;
154
+ mix_8k = mix_scaling_8k * mix_8k;
155
+
156
+ % apply same gain to 16 kHz file
157
+ s1_16k = weight_1 * s1 / sqrt(lev1);
158
+ s2_16k = weight_2 * s2 / sqrt(lev2);
159
+ s3_16k = weight_3 * s3 / sqrt(lev3);
160
+ s4_16k = weight_4 * s4 / sqrt(lev4);
161
+ s5_16k = weight_5 * s5 / sqrt(lev5);
162
+
163
+ switch min_max{i_mm}
164
+ case 'max'
165
+ mix_16k_length = max([length(s1_16k),length(s2_16k),length(s3_16k),length(s4_16k),length(s5_16k)]);
166
+ s1_16k = cat(1,s1_16k,zeros(mix_16k_length - length(s1_16k),1));
167
+ s2_16k = cat(1,s2_16k,zeros(mix_16k_length - length(s2_16k),1));
168
+ s3_16k = cat(1,s3_16k,zeros(mix_16k_length - length(s3_16k),1));
169
+ s4_16k = cat(1,s4_16k,zeros(mix_16k_length - length(s4_16k),1));
170
+ s5_16k = cat(1,s5_16k,zeros(mix_16k_length - length(s5_16k),1));
171
+ case 'min'
172
+ mix_16k_length = min([length(s1_16k),length(s2_16k),length(s3_16k),length(s4_16k),length(s5_16k)]);
173
+ s1_16k = s1_16k(1:mix_16k_length);
174
+ s2_16k = s2_16k(1:mix_16k_length);
175
+ s3_16k = s3_16k(1:mix_16k_length);
176
+ s4_16k = s4_16k(1:mix_16k_length);
177
+ s5_16k = s5_16k(1:mix_16k_length);
178
+ end
179
+ mix_16k = s1_16k + s2_16k + s3_16k + s4_16k + s5_16k;
180
+
181
+ max_amp_16k = max(cat(1,abs(mix_16k(:)),abs(s1_16k(:)),abs(s2_16k(:)),abs(s3_16k(:)),abs(s4_16k(:)),abs(s5_16k(:))));
182
+ mix_scaling_16k = 1/max_amp_16k*0.9;
183
+ s1_16k = mix_scaling_16k * s1_16k;
184
+ s2_16k = mix_scaling_16k * s2_16k;
185
+ s3_16k = mix_scaling_16k * s3_16k;
186
+ s4_16k = mix_scaling_16k * s4_16k;
187
+ s5_16k = mix_scaling_16k * s5_16k;
188
+ mix_16k = mix_scaling_16k * mix_16k;
189
+
190
+ % save 8 kHz and 16 kHz mixtures, as well as
191
+ % necessary scaling factors
192
+
193
+ scaling_16k(i,1) = weight_1 * mix_scaling_16k/ sqrt(lev1);
194
+ scaling_16k(i,2) = weight_2 * mix_scaling_16k/ sqrt(lev2);
195
+ scaling_16k(i,3) = weight_3 * mix_scaling_16k/ sqrt(lev3);
196
+ scaling_16k(i,4) = weight_4 * mix_scaling_16k/ sqrt(lev4);
197
+ scaling_16k(i,5) = weight_5 * mix_scaling_16k/ sqrt(lev5);
198
+ scaling_8k(i,1) = weight_1 * mix_scaling_8k/ sqrt(lev1);
199
+ scaling_8k(i,2) = weight_2 * mix_scaling_8k/ sqrt(lev2);
200
+ scaling_8k(i,3) = weight_3 * mix_scaling_8k/ sqrt(lev3);
201
+ scaling_8k(i,4) = weight_4 * mix_scaling_8k/ sqrt(lev4);
202
+ scaling_8k(i,5) = weight_5 * mix_scaling_8k/ sqrt(lev5);
203
+
204
+ scaling16bit_16k(i) = mix_scaling_16k;
205
+ scaling16bit_8k(i) = mix_scaling_8k;
206
+
207
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s1/' mix_name '.wav'], s1_8k,fs8k);
208
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s1/' mix_name '.wav'], s1_16k,fs);
209
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s2/' mix_name '.wav'], s2_8k,fs8k);
210
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s2/' mix_name '.wav'], s2_16k,fs);
211
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s3/' mix_name '.wav'], s3_8k,fs8k);
212
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s3/' mix_name '.wav'], s3_16k,fs);
213
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s4/' mix_name '.wav'], s4_8k,fs8k);
214
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s4/' mix_name '.wav'], s4_16k,fs);
215
+ audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/s5/' mix_name '.wav'], s5_8k,fs8k);
216
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/s5/' mix_name '.wav'], s5_16k,fs); audiowrite([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/mix/' mix_name '.wav'], mix_8k,fs8k);
217
+ audiowrite([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/mix/' mix_name '.wav'], mix_16k,fs);
218
+
219
+ if mod(i,10)==0
220
+ fprintf(1,'.');
221
+ if mod(i,200)==0
222
+ fprintf(1,'\n');
223
+ end
224
+ end
225
+
226
+ end
227
+ save([output_dir8k '/' min_max{i_mm} '/' data_type{i_type} '/scaling.mat'],'scaling_8k','scaling16bit_8k');
228
+ save([output_dir16k '/' min_max{i_mm} '/' data_type{i_type} '/scaling.mat'],'scaling_16k','scaling16bit_16k');
229
+
230
+ fclose(fid);
231
+ fclose(fid_s1);
232
+ fclose(fid_s2);
233
+ fclose(fid_s3);
234
+ fclose(fid_s4);
235
+ fclose(fid_s5);
236
+ fclose(fid_m);
237
+ end
238
+ end
create-speaker-mixtures-2345/create-speaker-mixtures-2345/maxfilt.m ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ function [y,k,y0]=maxfilt(x,f,n,d,x0)
2
+ %MAXFILT find max of an exponentially weighted sliding window [Y,K,Y0]=(X,F,nn,D,X0)
3
+ %
4
+ % Usage: (1) y=maxfilt(x) % maximum filter along first non-singleton dimension
5
+ % (2) y=maxfilt(x,0.95) % use a forgetting factor of 0.95 (= time const of -1/log(0.95)=19.5 samples)
6
+ % (3) Two equivalent methods (i.e. you can process x in chunks):
7
+ % y=maxfilt([u v]); [yu,ku,x0)=maxfilt(u);
8
+ % yv=maxfilt(v,[],[],[],x0);
9
+ % y=[yu yv];
10
+ %
11
+ % Inputs: X Vector or matrix of input data
12
+ % F exponential forgetting factor in the range 0 (very forgetful) to 1 (no forgetting)
13
+ % F=exp(-1/T) gives a time constant of T samples [default = 1]
14
+ % n Length of sliding window [default = Inf (equivalent to [])]
15
+ % D Dimension for work along [default = first non-singleton dimension]
16
+ % X0 Initial values placed in front of the X data
17
+ %
18
+ % Outputs: Y Output matrix - same size as X
19
+ % K Index array: Y=X(K). (Note that these value may be <=0 if input X0 is present)
20
+ % Y0 Last nn-1 values (used to initialize a subsequent call to
21
+ % maxfilt()) (or last output if n=Inf)
22
+ %
23
+ % This routine calaulates y(p)=max(f^r*x(p-r), r=0:n-1) where x(r)=-inf for r<1
24
+ % y=x(k) on output
25
+
26
+ % Example: find all peaks in x that are not exceeded within +-w samples
27
+ % w=4;m=100;x=rand(m,1);[y,k]=maxfilt(x,1,2*w+1);p=find(((1:m)-k)==w);plot(1:m,x,'-',p-w,x(p-w),'+')
28
+
29
+ % Copyright (C) Mike Brookes 2003
30
+ % Version: $Id: maxfilt.m 4054 2014-01-12 19:11:46Z dmb $
31
+ %
32
+ % VOICEBOX is a MATLAB toolbox for speech processing.
33
+ % Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
34
+ %
35
+ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
36
+ % This program is free software; you can redistribute it and/or modify
37
+ % it under the terms of the GNU General Public License as published by
38
+ % the Free Software Foundation; either version 2 of the License, or
39
+ % (at your option) any later version.
40
+ %
41
+ % This program is distributed in the hope that it will be useful,
42
+ % but WITHOUT ANY WARRANTY; without even the implied warranty of
43
+ % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
44
+ % GNU General Public License for more details.
45
+ %
46
+ % You can obtain a copy of the GNU General Public License from
47
+ % http://www.gnu.org/copyleft/gpl.html or by writing to
48
+ % Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
49
+ %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
50
+
51
+ s=size(x);
52
+ if nargin<4 || isempty(d)
53
+ d=find(s>1,1); % find first non-singleton dimension
54
+ if isempty(d)
55
+ d=1;
56
+ end
57
+ end
58
+ if nargin>4 && numel(x0)>0 % initial values specified
59
+ y=shiftdim(cat(d,x0,x),d-1); % concatenate x0 and x along d
60
+ nx0=size(x0,d); % number of values added onto front of data
61
+ else % dimension specified, d
62
+ y=shiftdim(x,d-1);
63
+ nx0=0;
64
+ end
65
+ s=size(y);
66
+ s1=s(1);
67
+ if nargin<3 || isempty(n)
68
+ n0=Inf;
69
+ else
70
+ n0=max(n,1);
71
+ end
72
+ if nargin<2 || isempty(f)
73
+ f=1;
74
+ end
75
+ nn=n0;
76
+ if nargout>2 % we need to output the tail for next time
77
+ if n0<Inf
78
+ ny0=min(s1,nn-1);
79
+ else
80
+ ny0=min(s1,1);
81
+ end
82
+ sy0=s;
83
+ sy0(1)=ny0;
84
+ if ny0<=0 || n0==Inf
85
+ y0=zeros(sy0);
86
+ else
87
+ y0=reshape(y(1+s1-ny0:end,:),sy0);
88
+ y0=shiftdim(y0,ndims(x)-d+1);
89
+ end
90
+ end
91
+ nn=min(nn,s1); % no point in having nn>s1
92
+ k=repmat((1:s1)',[1 s(2:end)]);
93
+ if nn>1
94
+ j=1;
95
+ j2=1;
96
+ while j>0
97
+ g=f^j;
98
+ m=find(y(j+1:s1,:)<=g*y(1:s1-j,:));
99
+ m=m+j*fix((m-1)/(s1-j));
100
+ y(m+j)=g*y(m);
101
+ k(m+j)=k(m);
102
+ j2=j2+j;
103
+ j=min(j2,nn-j2); % j approximately doubles each iteration
104
+ end
105
+ end
106
+ if nargout==0
107
+ if nargin<3
108
+ x=shiftdim(x);
109
+ else
110
+ x=shiftdim(x,d-1);
111
+ end
112
+ ss=min(prod(s(2:end)),5); % maximum of 5 plots
113
+ plot(1:s1,reshape(y(nx0+1:end,1:ss),s1,ss),'-r',1:s1,reshape(x(:,1:ss),s1,ss),'-b');
114
+ else
115
+ if nargout>2 && n0==Inf && ny0==1 % if n0==Inf, we need to save the final output
116
+ y0=reshape(y(end,:),sy0);
117
+ y0=shiftdim(y0,ndims(x)-d+1);
118
+ end
119
+ if nx0>0 % pre-data specified, x0
120
+ s(1)=s(1)-nx0;
121
+ y=shiftdim(reshape(y(nx0+1:end,:),s),ndims(x)-d+1);
122
+ k=shiftdim(reshape(k(nx0+1:end,:),s),ndims(x)-d+1)-nx0;
123
+ else % no pre-data
124
+ y=shiftdim(y,ndims(x)-d+1);
125
+ k=shiftdim(k,ndims(x)-d+1);
126
+ end
127
+ end
create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_2_spk_cv.txt ADDED
The diff for this file is too large to render. See raw diff
create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_2_spk_tr.txt ADDED
The diff for this file is too large to render. See raw diff
create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_2_spk_tt.txt ADDED
The diff for this file is too large to render. See raw diff
create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_3_spk_cv.txt ADDED
The diff for this file is too large to render. See raw diff
create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_3_spk_tr.txt ADDED
The diff for this file is too large to render. See raw diff
create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_3_spk_tt.txt ADDED
The diff for this file is too large to render. See raw diff
create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_4_spk_cv.txt ADDED
The diff for this file is too large to render. See raw diff
create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_4_spk_tr.txt ADDED
The diff for this file is too large to render. See raw diff
create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_4_spk_tt.txt ADDED
The diff for this file is too large to render. See raw diff
create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_5_spk_cv.txt ADDED
The diff for this file is too large to render. See raw diff
create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_5_spk_tr.txt ADDED
The diff for this file is too large to render. See raw diff
create-speaker-mixtures-2345/create-speaker-mixtures-2345/mix_5_spk_tt.txt ADDED
The diff for this file is too large to render. See raw diff