--- language: en license: mit library_name: pytorch --- # Knowledge Continuity Regularized Network Trainer Hyperparameters: - `lr` = 1e-05 - `per_device_batch_size` = 8 - `gradient_accumulation_steps` = 2 - `weight_decay` = 1e-09 - `seed` = 42 Regularization Hyperparameters - `numerical stability denominator constant` = 0.01 - `lambda` = 0.001 - `alpha` = 2.0 - `beta` = 2.0 Extended Logs: |eval_loss|eval_accuracy|epoch| |--|--|--| |8.842|0.208|1.0| |8.834|0.208|2.0| |7.990|0.292|3.0| |4.612|0.750|4.0| |8.841|0.208|5.0| |8.842|0.208|6.0| |8.843|0.208|7.0| |8.842|0.208|8.0| |8.845|0.208|9.0| |8.842|0.208|10.0| |8.843|0.208|11.0| |8.843|0.208|12.0| |8.843|0.208|13.0| |8.841|0.208|14.0| |8.843|0.208|15.0| |8.843|0.208|16.0| |8.843|0.208|17.0| |8.845|0.208|18.0| |8.843|0.208|19.0|