Getting nan values in stage 2 training
Hi I trained stage 1 for about 130 epoch. When I trained it on stage 2 it started giving nan loss values right from the beginning. Interestingly when I took the checkpoint that they have provided and loaded and unloaded the model using the first stage code, it still gave nan values even though it work fine if I directly load it to stage 2 training.
I have experienced this before in a few situations:
- actual model parameters are not being loaded from the checkpoint (there is some weird naming error involving "module" prefix between stages 1 and 2 & whether you are using distributed vs. non-distributed training; try changing strict loading to true and see what happens with keys)
- multispeaker is set incorrectly
- certain batch sizes with mixed precision (try changing batch sizes)
can you please share the config file? Ill try replilate with your parameters.
https://huggingface.co/therealvul/StyleTTS2/blob/main/Multi0/config_40_1c872.yml
This is the config file produced in 2nd stage slm adversarial training. However the strict loading change should be made in code
Thanks alot. Do you also have the config file for that first stage?
Unfortunately no
Yeah i believe that module prefix is the problem, I trained over the stage 1 checkpoint you provide and i gave the same error and as you mentioned the module was missing .
Shouldn’t strict loading give an error as it tries to strictly match the keys while loading. Does this solve the problem or do i need to add module in the keys by brute force?
Because the keys don't match none of the weights will actually be loaded if you disable strict loading resulting in the nan calculations. I brute force added "module" in the keys.
Hi, i tried your model trained till 1st epoch on the infere script written in the jupiter notebook but it seems to not gebearting anything. Just blank sound. Similar thing i shappening with a model i trained too. Any idea?
- What model checkpoint are you referring to specifically?
- From my testing styletts2 models are very sensitive to the particular config values from training. max_len must match or it will only generate silence. During second stage diffusion training the training will also output a value for sigma_data into the generated config in the log directody, which should be the value used for inference.
I tried epoch_1st_00067.pth
Did you infer using checkpoint just trained till first stage? The second stage checkpoints work quite well. I guess ther are probably a few models which are not yet trained in the first checkpoint that are needed for end to end generatoin using just text as input.
epoch_1st is not 1st epoch checkpoint, it is the 67th epoch of the 1st stage (decoder and text aligner training only). It would not be able to generate TTS