Error During Fine-Tuning Nvidia TTS Fastpitch Model with Custom Dataset

#3
by HasanAli - opened

Description:

I am currently trying to fine-tune the FastPitch model from NVIDIA NeMo on a custom dataset but encountered the error upon running this part of code:

!(python fastpitch_finetune.py --config-name=fastpitch_align_v1.05.yaml \
  train_dataset=./9017_manifest_train_dur_5_mins_local.json \
  validation_datasets=./9017_manifest_dev_ns_all_local.json \
  sup_data_path=./fastpitch_sup_data \
  phoneme_dict_path=tts_dataset_files/cmudict-0.7b_nv22.10 \
  heteronyms_path=tts_dataset_files/heteronyms-052722 \
  exp_manager.exp_dir=./ljspeech_to_9017_no_mixing_5_mins \
  +init_from_nemo_model=./tts_en_fastpitch_align.nemo \
  +trainer.max_steps=1000 ~trainer.max_epochs \
  trainer.check_val_every_n_epoch=25 \
  model.train_ds.dataloader_params.batch_size=24 model.validation_ds.dataloader_params.batch_size=2 \
  model.n_speakers=1 model.pitch_mean=152.3 model.pitch_std=64.0 \
  model.pitch_fmin=30 model.pitch_fmax=512 model.optim.lr=2e-4 \
  ~model.optim.sched model.optim.name=adam trainer.devices=1 trainer.strategy=auto \
  +model.text_tokenizer.add_blank_at=true \
)

RuntimeError:

 The size of tensor a (128) must match the size of tensor b (122) at non-singleton dimension 2

Detailed Error Log:

[NeMo W 2024-06-20 13:08:38 nemo_logging:349] /home/rev9ai/anaconda3/envs/voice_my/lib/python3.10/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.

[NeMo W 2024-06-20 13:09:13 modelPT:183] If you intend to do validation, please call the ModelPT.setup_validation_data() or ModelPT.setup_multiple_validation_data() method and provide a valid configuration file to setup the validation data loader(s).

[NeMo I 2024-06-20 13:09:13 save_restore_connector:263] Model FastPitchModel was successfully restored from /mnt/ssd/hasan/voice_my/tts_en_fastpitch_align.nemo.

...

RuntimeError: The size of tensor a (128) must match the size of tensor b (122) at non-singleton dimension 2

Manifest.json Data Sample:

{"audio_filepath": "audio/segment_5440.flac", "text": " Chapter 3", "duration": 1.03, "normalized_text": "chapter three"}

{"audio_filepath": "audio/segment_5441.flac", "text": " Old World Marketing vs. New World Marketing", "duration": 3.08, "normalized_text": "old world marketing versus new world marketing"}

{"audio_filepath": "audio/segment_5442.flac", "text": " Smart orthodontists in this economy research and commit to marketing strategies which are proven and which provide a high return on investment.", "duration": 8.46, "normalized_text": "smart orthodontists in this economy research and commit to marketing strategies which are proven and which provide a high return on investment."}

{"audio_filepath": "audio/segment_5443.flac", "text": " Time and time again I see orthodontists invest in marketing strategies that no longer work, which are not measurable in any format other than money out of their pockets. This chapter will put a stop to this nonsense once and for all. If you decide to listen and implement the proven money-making strategies for you and your practice.", "duration": 19.34, "normalized_text": "time and time again i see orthodontists invest in marketing strategies that no longer work, which are not measurable in any format other than money out of their pockets. this chapter will put a stop to this nonsense once and for all. if you decide to listen and implement the proven money-making strategies for you and your practice."}

{"audio_filepath": "audio/segment_5444.flac", "text": " Old world marketing refers to different media techniques and strategies which were once effective and may still be effective in today's quickly changing world.", "duration": 8.71, "normalized_text": "old world marketing refers to different media techniques and strategies which were once effective and may still be effective in today's quickly changing world."}

Steps Taken:

Firstly, I Followed the FastPitch Finetuning tutorial.

But I encountered the tensor size mismatch error. Then I Passed my data through the Data Preparation pipeline & completed text and audio preprocessing as outlined here.

Then I gave preprocessed data to FastPitch_Finetuning.ipynb But still the same tensor size mismatch error persists.

Moreover, if I use the FastPitch_Data_Preparation.ipynb pipeline for finetuning as well then its results are not good.

I also Configured training parameters and paths as specified.

Environment:
1-Python=3.10.12

2-torch=2.0.1

3-torchvision=0.15.2

Any insights or suggestions to resolve this error would be greatly appreciated. Thanks

Sign up or log in to comment