Knowledge Continuity Regularized Network

Trainer Hyperparameters:

  • lr = 5e-05
  • per_device_batch_size = 8
  • gradient_accumulation_steps = 1
  • weight_decay = 1e-09
  • seed = 42

Regularization Hyperparameters

  • numerical stability denominator constant = 0.01
  • lambda = 0.02
  • alpha = 2.0
  • beta = 1.0

Extended Logs:

eval_loss eval_accuracy epoch
13.389 0.895 1.0
13.137 0.903 2.0
12.736 0.915 3.0
12.681 0.917 4.0
12.631 0.918 5.0
13.114 0.903 6.0
12.288 0.928 7.0
12.572 0.920 8.0
12.298 0.928 9.0
12.522 0.922 10.0
12.358 0.927 11.0
12.430 0.925 12.0
12.323 0.928 13.0
12.552 0.920 14.0
12.125 0.934 15.0
12.377 0.926 16.0
12.283 0.929 17.0
12.615 0.920 18.0
12.312 0.929 19.0
12.208 0.932 20.0
12.469 0.924 21.0
12.234 0.931 22.0
12.075 0.936 23.0
12.113 0.935 24.0
12.226 0.931 25.0
12.110 0.935 26.0
12.310 0.928 27.0
12.121 0.934 28.0
12.081 0.936 29.0
12.041 0.937 30.0
12.116 0.935 31.0
12.059 0.936 32.0
11.983 0.938 33.0
12.000 0.938 34.0
12.036 0.937 35.0
11.995 0.938 36.0
11.967 0.939 37.0
11.988 0.939 38.0
11.939 0.940 39.0
11.931 0.940 40.0
11.913 0.941 41.0
11.962 0.939 42.0
11.971 0.939 43.0
12.003 0.938 44.0
11.973 0.939 45.0
11.984 0.939 46.0
12.006 0.938 47.0
12.010 0.938 48.0
12.000 0.938 49.0
Downloads last month
3
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support