FAQ
Here are some questions I encountered and my answers, hope this can help you.
I'm new to this and I don't know how to code-format this for it to be easier to read. sorry.
1. running on offline servers
download model folder and upload to your server
import os
os.environ['HF_DATASETS_OFFLINE'] = "1"
os.environ['TRANSFORMERS_OFFLINE'] = "1"
vae_path='path_to_model_direc'
model = EncodecModel.from_pretrained(vae_path, local_files_only=True)
model=model.cuda()
processor = AutoProcessor.from_pretrained(vae_path, local_files_only=True)
you will receive this warning that look really like an error messege.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration. Please open a PR/issue to update
preprocessor_config.json
to useimage_processor_type
instead offeature_extractor_type
. This warning will be removed in v4.40.
2.bandwidth
default bandwidth is 6, we can change it when calling model.encode:
encoder_outputs = model.encode(data["input_values"], data["padding_mask"],bandwidth=24)
3. frame/chunking
by default the model encodes on the entire waveform.
we can change it so the model first cuts the waveform into small pieces and encodes on each small piece without information from other pieces. this should be changed only for very long waveform as it decease performance according to my tests.
model.config.chunk_length_s=1 #seconds of each chunk
model.config.overlap=0.4 #overlap ratio
print(model.config.chunk_length) #result: 24000
print(model.config.chunk_stride) # result: 14400
the length detection code could be flawed(or intended?) and does not allow overlap >= 0.5
it should be like assert (length-offset)% stride==0 instead of assert length% stride==offset
4.code to pad the waveform if needed:
def pad_waveform_to_nearest_length(waveform, sample_rate, length=1, overlap=0.4):
stride=length*(1-overlap)
offset=length*overlap
waveform_length_sec = waveform.size(1) / sample_rate
target_length_sec = np.ceil((waveform_length_sec-offset)/ stride) * stride+offset # Calculate the target length
target_num_samples = int(target_length_sec * sample_rate)
pad_length = target_num_samples - waveform.size(1)
if pad_length > 0:
return torch.nn.functional.pad(waveform, (0, pad_length), mode='constant', value=0)
else:
return waveform
check this page for the model class https://github.com/huggingface/transformers/blob/096f304695f7e7b169b031f7814352e900ad71c4/src/transformers/models/encodec/modeling_encodec.py#L526