--- language: en license: mit library_name: pytorch --- # Knowledge Continuity Regularized Network Trainer Hyperparameters: - `lr` = 5e-05 - `per_device_batch_size` = 8 - `gradient_accumulation_steps` = 2 - `weight_decay` = 1e-09 - `seed` = 42 Regularization Hyperparameters - `numerical stability denominator constant` = 0.01 - `lambda` = 0.1 - `alpha` = 2.0 - `beta` = 1.0 Extended Logs: |eval_loss|eval_accuracy|epoch| |--|--|--| |6.444|0.911|1.0| |6.355|0.918|2.0| |6.640|0.899|3.0| |6.167|0.929|4.0| |6.211|0.924|5.0| |6.171|0.929|6.0| |6.116|0.934|7.0| |6.285|0.925|8.0| |6.154|0.929|9.0| |6.155|0.929|10.0| |6.086|0.933|11.0| |6.109|0.933|12.0| |6.128|0.934|13.0| |6.141|0.931|14.0| |6.147|0.931|15.0| |6.379|0.919|16.0| |6.105|0.933|17.0| |6.063|0.935|18.0| |6.174|0.929|19.0| |6.115|0.932|20.0| |7.263|0.866|21.0| |6.026|0.938|22.0| |6.138|0.931|23.0| |6.139|0.932|24.0| |6.059|0.935|25.0| |6.099|0.934|26.0| |6.068|0.935|27.0| |6.088|0.934|28.0| |6.081|0.934|29.0| |6.083|0.935|30.0| |6.073|0.936|31.0| |6.107|0.935|32.0| |6.052|0.936|33.0| |6.065|0.936|34.0| |6.116|0.931|35.0| |6.128|0.934|36.0| |6.030|0.937|37.0| |6.163|0.932|38.0| |6.000|0.940|39.0| |6.064|0.938|40.0| |6.056|0.936|41.0| |6.071|0.935|42.0| |6.012|0.939|43.0| |6.027|0.940|44.0| |6.017|0.939|45.0| |5.976|0.941|46.0| |5.982|0.940|47.0| |5.987|0.941|48.0| |5.991|0.941|49.0|