|
## Fine-tuning run 3 |
|
|
|
Tried to improve model fine-tuned during run 1. |
|
|
|
Checkpoint used: checkpoint-12000 |
|
|
|
* Trained for 6000 steps |
|
* Used custom Learning Rate scheduler initialized in: `custom_trainer.Seq2SeqTrainerCustomLinearScheduler`: |
|
* `--learning_rate="3e-5"` |
|
* `--learning_rate_end="1e-5"` |
|
* no warmup was used |
|
* no WER improvements compared to checkpoint-12000 of run 1 |
|
* using `seed=43` |
|
* do not upload checkpoints from that run |
|
* uploading src, logs, tensorboard logs, trainer_state |
|
|
|
## Advices |
|
* I guess, we need to use warmup when resuming training and increasing LR compared to the last LR in previous run |
|
* need to set number of steps > 6000. because model improved WER veeery slowly |
|
* probably need to load `optimizer.pt` and `scaler.pt` from checkpoint before resuming training. |
|
otherwise, I guess, we |
|
* reinitialize optimizer and loose history of parameters momentum (exponential weighted average) |
|
* scale loss incorrectly |
|
* can use original Mozilla Common Voice dataset instead of a HuggingFace's one.<br> |
|
the reason is that original contains multiple voicings of same sentence - |
|
so there is at least twice as more data.<br> |
|
to use this "additional" data, train, validation, test sets need to be enlarged using `validated` set - |
|
the one that is absent in HuggingFace's CV11 dataset |
|
|