Unable to reproduce Lady Gaga

#13
by husjerry - opened

Hi Marco, I'm having some trouble reproducing your Lady Gaga model results.
I used so-vits-svc stable 4.1 release https://github.com/svc-develop-team/so-vits-svc and followed the tutorial and did

  1. Slice audio to requirements
  2. used vec768l12 and vec256l9 required the encoder ContentVec: checkpoint_best_legacy_500.pt and placed it under the pretrain directory
  3. placed your model .pth file into logs/44k/ folder, and your config into configs/, and also added "speech_encoder":"vec256l9" into model json struct.

But the output file after I run python inference_main.py -m "logs/44k/G_14400.pth" -c "configs/config.json" -n "gem_like_you_vocals_1.wav" -t 0 -s "gg" is not very good ie.

Could you help me understand what could be the reason?
Do you see anything immediately problematic or that I am missing?
Any pointer will be appreciated, Thank you!!

Best,

At some point I started to have problems with so-vits version and now I have been using version 2.1.5 only.

You can do it by using the folloswing command:
pip install -U so-vits-svc-fork==2.1.5

You can try it just for this Lady Gaga running and change back to newer versions when you need.

Actually, when I used f0_predictor, the conversion was much better, but the pitch was completely off.
So I think my steps are mostly fine, it was just the model itself is not able to inference the best quality.

What do you think?

But your Lady Gaga demo link was so good though

At some point I started to have problems with so-vits version and now I have been using version 2.1.5 only.

You can do it by using the folloswing command:
pip install -U so-vits-svc-fork==2.1.5

You can try it just for this Lady Gaga running and change back to newer versions when you need.

Thanks! I'll try that

Sign up or log in to comment