Knowledge Continuity Regularized Network
Dataset: GLUE
Trainer Hyperparameters:
lr
= 5e-05per_device_batch_size
= 16gradient_accumulation_steps
= 1weight_decay
= 1e-09seed
= 42
Regularization Hyperparameters
numerical stability denominator constant
= 0.01lambda
= 0.02alpha
= 2.0beta
= 1.0
Extended Logs:
eval_loss | eval_accuracy | epoch |
---|---|---|
24.363 | 0.790 | 1.0 |
23.650 | 0.813 | 2.0 |
Unable to determine this model’s pipeline type. Check the
docs
.