language: en | |
license: mit | |
library_name: pytorch | |
# Knowledge Continuity Regularized Network | |
Trainer Hyperparameters: | |
- `lr` = 5e-05 | |
- `per_device_batch_size` = 8 | |
- `gradient_accumulation_steps` = 2 | |
- `weight_decay` = 1e-09 | |
- `seed` = 42 | |
Regularization Hyperparameters | |
- `numerical stability denominator constant` = 0.01 | |
- `lambda` = 0.1 | |
- `alpha` = 2.0 | |
- `beta` = 1.0 | |
Extended Logs: | |
|eval_loss|eval_accuracy|epoch| | |
|--|--|--| | |
|6.444|0.911|1.0| | |
|6.355|0.918|2.0| | |
|6.640|0.899|3.0| | |
|6.167|0.929|4.0| | |
|6.211|0.924|5.0| | |
|6.171|0.929|6.0| | |
|6.116|0.934|7.0| | |
|6.285|0.925|8.0| | |
|6.154|0.929|9.0| | |
|6.155|0.929|10.0| | |
|6.086|0.933|11.0| | |
|6.109|0.933|12.0| | |
|6.128|0.934|13.0| | |
|6.141|0.931|14.0| | |
|6.147|0.931|15.0| | |
|6.379|0.919|16.0| | |
|6.105|0.933|17.0| | |
|6.063|0.935|18.0| | |
|6.174|0.929|19.0| | |
|6.115|0.932|20.0| | |
|7.263|0.866|21.0| | |
|6.026|0.938|22.0| | |
|6.138|0.931|23.0| | |
|6.139|0.932|24.0| | |
|6.059|0.935|25.0| | |
|6.099|0.934|26.0| | |
|6.068|0.935|27.0| | |
|6.088|0.934|28.0| | |
|6.081|0.934|29.0| | |
|6.083|0.935|30.0| | |
|6.073|0.936|31.0| | |
|6.107|0.935|32.0| | |
|6.052|0.936|33.0| | |
|6.065|0.936|34.0| | |
|6.116|0.931|35.0| | |
|6.128|0.934|36.0| | |
|6.030|0.937|37.0| | |
|6.163|0.932|38.0| | |
|6.000|0.940|39.0| | |
|6.064|0.938|40.0| | |
|6.056|0.936|41.0| | |
|6.071|0.935|42.0| | |
|6.012|0.939|43.0| | |
|6.027|0.940|44.0| | |
|6.017|0.939|45.0| | |
|5.976|0.941|46.0| | |
|5.982|0.940|47.0| | |
|5.987|0.941|48.0| | |
|5.991|0.941|49.0| | |