Source and computation of w2vbert2_mean_var_stats_emilia.pt

by Vartul27 - opened 4 days ago

Discussion

Vartul27

4 days ago

•

edited 4 days ago

I am using DualCodec and noticed that the released checkpoints require the file:

w2vbert2_mean_var_stats_emilia.pt

After inspecting it, it contains:

{
"mean": tensor([1024]),
"var": tensor([1024])
}

I would like to understand exactly how these statistics were computed.

Are these the mean and variance of W2V-BERT hidden representations computed on the Emilia dataset?
If so, which hidden layer was used (e.g., Layer 16)?
Were the statistics computed directly on the hidden states, or on some post-processed representation?
If using different training dataset such as LibriSpeech, should separate mean/variance statistics be recomputed?

Thank you.

jiaqili3

Amphion org 4 days ago

I will link you to a similar issue on GitHub https://github.com/jiaqili3/DualCodec/issues/5
The statistics is calculated on the same hidden layer that was used to extract the semantic features and was not post-processed. If you use some different data sets, I think separate statistics will not be needed.

Vartul27

3 days ago

Thankyou for the reply, but if we use some different w2v2 layer, then do we need to calculate new statistics??

jiaqili3

Amphion org 3 days ago

Normally, yes

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment