FAQ

#3
by YS1619538093 - opened

Here are some questions I encountered and my answers, hope this can help you.
I'm new to this and I don't know how to code-format this for it to be easier to read. sorry.

1. running on offline servers

download model folder and upload to your server

import os
os.environ['HF_DATASETS_OFFLINE'] = "1"
os.environ['TRANSFORMERS_OFFLINE'] = "1"
vae_path='path_to_model_direc'
model = EncodecModel.from_pretrained(vae_path, local_files_only=True)
model=model.cuda()
processor = AutoProcessor.from_pretrained(vae_path, local_files_only=True)

you will receive this warning that look really like an error messege.

Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration. Please open a PR/issue to update preprocessor_config.json to use image_processor_type instead of feature_extractor_type. This warning will be removed in v4.40.

2.bandwidth

default bandwidth is 6, we can change it when calling model.encode:

encoder_outputs = model.encode(data["input_values"], data["padding_mask"],bandwidth=24)

3. frame/chunking

by default the model encodes on the entire waveform.

we can change it so the model first cuts the waveform into small pieces and encodes on each small piece without information from other pieces. this should be changed only for very long waveform as it decease performance according to my tests.

model.config.chunk_length_s=1 #seconds of each chunk
model.config.overlap=0.4 #overlap ratio
print(model.config.chunk_length) #result: 24000
print(model.config.chunk_stride) # result: 14400

the length detection code could be flawed(or intended?) and does not allow overlap >= 0.5
it should be like assert (length-offset)% stride==0 instead of assert length% stride==offset

4.code to pad the waveform if needed:

def pad_waveform_to_nearest_length(waveform, sample_rate, length=1, overlap=0.4):
stride=length*(1-overlap)
offset=length*overlap
waveform_length_sec = waveform.size(1) / sample_rate
target_length_sec = np.ceil((waveform_length_sec-offset)/ stride) * stride+offset # Calculate the target length
target_num_samples = int(target_length_sec * sample_rate)
pad_length = target_num_samples - waveform.size(1)
if pad_length > 0:
return torch.nn.functional.pad(waveform, (0, pad_length), mode='constant', value=0)
else:
return waveform

check this page for the model class https://github.com/huggingface/transformers/blob/096f304695f7e7b169b031f7814352e900ad71c4/src/transformers/models/encodec/modeling_encodec.py#L526

This comment has been hidden
YS1619538093 changed discussion title from Q&A to FAQ

Sign up or log in to comment