--- language: en license: mit library_name: pytorch --- # Knowledge Continuity Regularized Network Trainer Hyperparameters: - `lr` = 5e-05 - `per_device_batch_size` = 8 - `gradient_accumulation_steps` = 2 - `weight_decay` = 1e-09 - `seed` = 42 Regularization Hyperparameters - `numerical stability denominator constant` = 0.001 - `lambda` = 0.01 - `alpha` = 2.0 - `beta` = 2.0 Extended Logs: |eval_loss|eval_accuracy|epoch| |--|--|--| |14.430|0.792|0.67| |14.131|0.792|2.0| |13.810|0.875|2.67| |13.640|0.875|4.0| |13.667|0.875|4.67| |13.247|0.875|6.0| |12.928|0.875|6.67| |12.673|0.875|8.0| |12.596|0.875|8.67| |12.450|0.875|10.0| |12.382|0.875|10.67| |12.298|0.875|12.0| |12.289|0.875|12.67|