Question about reconstruction

#1
by Leying - opened

Hi, thanks for sharing this vocoder.

I am using this vocoder to reconstruct the melspectrogram. I have a wav of sample rate 16k, and the length of wav is 65280. I first extract the melspectrogram of hopsize=256 and windowsize=1024, and I get a melspec of dimension [80,245]. I turn it back to wav by using this vocoder, but the reconstructed wav has length 62720 (does not match the input)! The difference is always 2560

I checked the config of vocoder, including hopsize and window size, and they are the same as the mel extraction process. Although there is no significant difference when human listening, the objective evaluation, like stoi and snr and sdr are very very bad (stoi is only 0.15 and sisnr, sdr are negative! ). I think it is because the misalignment between input and the output, but how to fix this problem?

image.png

Sign up or log in to comment