ales's picture
added logs from run 2 of fine-tuning
4b966a1

Fine-tuning run 2

Tried to improve model fine-tuned during run 1.

Checkpoint used: checkpoint-12000

  • Learning rate picked for fine-tuning in run 2 turned out to be too small. WER did not improve compared to run 1.
  • Fine-tuning during run 2 followed WER trajectory of the end of run 1: from checkpoint-8000 - checkpoint-10000
  • Have stopped run 2 after 3000 steps
  • do not upload checkpoints from that run
  • uploading training stdout logs and tensorboard logs

Advices

  • For the next fine-tuning it's better to use higher Learning Rates. As for LR Scheduler it's better to:
    • either use a constant Learning Rate Scheduler
    • or manually instantiate a LinearSchedulerWithWarmups and set num_training_steps to be larger than the actual number of optimization in the run, so that LR in the end would be >> 0 (much larger than 0)
  • need to use seed other than the one used during run 1. e.g. seed=43
    actual seed used during train dataset reshuffling is computed as: train_dataloader.dataset.set_epoch(train_dataloader.dataset._epoch + 1) however, when resuming training train_dataloader.dataset._epoch is reset to 0.
    thus need to provide different seed
  • can use original Mozilla Common Voice dataset instead of a HuggingFace's one.
    the reason is that original contains multiple voicings of same sentence - so there is at least twice as more data.
    to use this "additional" data, train, validation, test sets need to be enlarged using validated set - the one that is absent in HuggingFace's CV11 dataset