Source and computation of w2vbert2_mean_var_stats_emilia.pt
I am using DualCodec and noticed that the released checkpoints require the file:
w2vbert2_mean_var_stats_emilia.pt
After inspecting it, it contains:
{
"mean": tensor([1024]),
"var": tensor([1024])
}
I would like to understand exactly how these statistics were computed.
Are these the mean and variance of W2V-BERT hidden representations computed on the Emilia dataset?
If so, which hidden layer was used (e.g., Layer 16)?
Were the statistics computed directly on the hidden states, or on some post-processed representation?
If using different training dataset such as LibriSpeech, should separate mean/variance statistics be recomputed?
Thank you.
I will link you to a similar issue on GitHub https://github.com/jiaqili3/DualCodec/issues/5
The statistics is calculated on the same hidden layer that was used to extract the semantic features and was not post-processed. If you use some different data sets, I think separate statistics will not be needed.
Thankyou for the reply, but if we use some different w2v2 layer, then do we need to calculate new statistics??
Normally, yes