Knowledge Continuity Regularized Network
Trainer Hyperparameters:
lr
= 1e-05per_device_batch_size
= 8gradient_accumulation_steps
= 2weight_decay
= 1e-09seed
= 42
Regularization Hyperparameters
numerical stability denominator constant
= 0.01lambda
= 0.001alpha
= 2.0beta
= 1.0
Extended Logs:
eval_loss | eval_accuracy | epoch |
---|---|---|
5.407 | 0.792 | 1.0 |
5.131 | 0.792 | 2.0 |
4.966 | 0.792 | 3.0 |
4.796 | 0.792 | 4.0 |
4.710 | 0.792 | 5.0 |
4.653 | 0.792 | 6.0 |
4.534 | 0.792 | 7.0 |