pain commited on
Commit
b05b695
1 Parent(s): 6a26c83

End of training

Browse files
Files changed (3) hide show
  1. all_results.json +5 -5
  2. train_results.json +5 -5
  3. trainer_state.json +3014 -5
all_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
- "epoch": 500.0,
3
- "train_loss": 0.0,
4
- "train_runtime": 1.3574,
5
  "train_samples": 100,
6
- "train_samples_per_second": 36834.222,
7
- "train_steps_per_second": 4788.449
8
  }
 
1
  {
2
+ "epoch": 1000.0,
3
+ "train_loss": 0.655489089525663,
4
+ "train_runtime": 67792.7328,
5
  "train_samples": 100,
6
+ "train_samples_per_second": 1.475,
7
+ "train_steps_per_second": 0.192
8
  }
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
- "epoch": 500.0,
3
- "train_loss": 0.0,
4
- "train_runtime": 1.3574,
5
  "train_samples": 100,
6
- "train_samples_per_second": 36834.222,
7
- "train_steps_per_second": 4788.449
8
  }
 
1
  {
2
+ "epoch": 1000.0,
3
+ "train_loss": 0.655489089525663,
4
+ "train_runtime": 67792.7328,
5
  "train_samples": 100,
6
+ "train_samples_per_second": 1.475,
7
+ "train_steps_per_second": 0.192
8
  }
trainer_state.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 500.0,
5
  "eval_steps": 500,
6
- "global_step": 6500,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
@@ -3025,13 +3025,3022 @@
3025
  "train_runtime": 1.3574,
3026
  "train_samples_per_second": 36834.222,
3027
  "train_steps_per_second": 4788.449
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3028
  }
3029
  ],
3030
  "logging_steps": 500,
3031
- "max_steps": 6500,
3032
- "num_train_epochs": 500,
3033
  "save_steps": 500,
3034
- "total_flos": 284798065115136.0,
3035
  "trial_name": null,
3036
  "trial_params": null
3037
  }
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 1000.0,
5
  "eval_steps": 500,
6
+ "global_step": 13000,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
 
3025
  "train_runtime": 1.3574,
3026
  "train_samples_per_second": 36834.222,
3027
  "train_steps_per_second": 4788.449
3028
+ },
3029
+ {
3030
+ "epoch": 501.0,
3031
+ "learning_rate": 0.003996,
3032
+ "loss": 1.4863,
3033
+ "step": 6513
3034
+ },
3035
+ {
3036
+ "epoch": 502.0,
3037
+ "learning_rate": 0.003992,
3038
+ "loss": 1.6645,
3039
+ "step": 6526
3040
+ },
3041
+ {
3042
+ "epoch": 503.0,
3043
+ "learning_rate": 0.003988,
3044
+ "loss": 1.7012,
3045
+ "step": 6539
3046
+ },
3047
+ {
3048
+ "epoch": 504.0,
3049
+ "learning_rate": 0.003984,
3050
+ "loss": 1.6185,
3051
+ "step": 6552
3052
+ },
3053
+ {
3054
+ "epoch": 505.0,
3055
+ "learning_rate": 0.00398,
3056
+ "loss": 1.5629,
3057
+ "step": 6565
3058
+ },
3059
+ {
3060
+ "epoch": 506.0,
3061
+ "learning_rate": 0.003976,
3062
+ "loss": 1.5867,
3063
+ "step": 6578
3064
+ },
3065
+ {
3066
+ "epoch": 507.0,
3067
+ "learning_rate": 0.003972,
3068
+ "loss": 1.6144,
3069
+ "step": 6591
3070
+ },
3071
+ {
3072
+ "epoch": 508.0,
3073
+ "learning_rate": 0.003968,
3074
+ "loss": 1.7844,
3075
+ "step": 6604
3076
+ },
3077
+ {
3078
+ "epoch": 509.0,
3079
+ "learning_rate": 0.003964,
3080
+ "loss": 1.7508,
3081
+ "step": 6617
3082
+ },
3083
+ {
3084
+ "epoch": 510.0,
3085
+ "learning_rate": 0.00396,
3086
+ "loss": 1.7693,
3087
+ "step": 6630
3088
+ },
3089
+ {
3090
+ "epoch": 511.0,
3091
+ "learning_rate": 0.003956,
3092
+ "loss": 1.8884,
3093
+ "step": 6643
3094
+ },
3095
+ {
3096
+ "epoch": 512.0,
3097
+ "learning_rate": 0.003952,
3098
+ "loss": 1.8287,
3099
+ "step": 6656
3100
+ },
3101
+ {
3102
+ "epoch": 513.0,
3103
+ "learning_rate": 0.003948,
3104
+ "loss": 1.8228,
3105
+ "step": 6669
3106
+ },
3107
+ {
3108
+ "epoch": 514.0,
3109
+ "learning_rate": 0.0039440000000000005,
3110
+ "loss": 1.7632,
3111
+ "step": 6682
3112
+ },
3113
+ {
3114
+ "epoch": 515.0,
3115
+ "learning_rate": 0.00394,
3116
+ "loss": 1.7943,
3117
+ "step": 6695
3118
+ },
3119
+ {
3120
+ "epoch": 516.0,
3121
+ "learning_rate": 0.003936,
3122
+ "loss": 1.7451,
3123
+ "step": 6708
3124
+ },
3125
+ {
3126
+ "epoch": 517.0,
3127
+ "learning_rate": 0.003932,
3128
+ "loss": 1.8542,
3129
+ "step": 6721
3130
+ },
3131
+ {
3132
+ "epoch": 518.0,
3133
+ "learning_rate": 0.003928,
3134
+ "loss": 2.0283,
3135
+ "step": 6734
3136
+ },
3137
+ {
3138
+ "epoch": 519.0,
3139
+ "learning_rate": 0.003924,
3140
+ "loss": 2.0074,
3141
+ "step": 6747
3142
+ },
3143
+ {
3144
+ "epoch": 520.0,
3145
+ "learning_rate": 0.00392,
3146
+ "loss": 2.1644,
3147
+ "step": 6760
3148
+ },
3149
+ {
3150
+ "epoch": 521.0,
3151
+ "learning_rate": 0.003916,
3152
+ "loss": 1.9558,
3153
+ "step": 6773
3154
+ },
3155
+ {
3156
+ "epoch": 522.0,
3157
+ "learning_rate": 0.003912,
3158
+ "loss": 1.9104,
3159
+ "step": 6786
3160
+ },
3161
+ {
3162
+ "epoch": 523.0,
3163
+ "learning_rate": 0.003908,
3164
+ "loss": 1.9961,
3165
+ "step": 6799
3166
+ },
3167
+ {
3168
+ "epoch": 524.0,
3169
+ "learning_rate": 0.003904,
3170
+ "loss": 2.0827,
3171
+ "step": 6812
3172
+ },
3173
+ {
3174
+ "epoch": 525.0,
3175
+ "learning_rate": 0.0039,
3176
+ "loss": 2.0293,
3177
+ "step": 6825
3178
+ },
3179
+ {
3180
+ "epoch": 526.0,
3181
+ "learning_rate": 0.003896,
3182
+ "loss": 1.9904,
3183
+ "step": 6838
3184
+ },
3185
+ {
3186
+ "epoch": 527.0,
3187
+ "learning_rate": 0.003892,
3188
+ "loss": 1.9175,
3189
+ "step": 6851
3190
+ },
3191
+ {
3192
+ "epoch": 528.0,
3193
+ "learning_rate": 0.003888,
3194
+ "loss": 1.8658,
3195
+ "step": 6864
3196
+ },
3197
+ {
3198
+ "epoch": 529.0,
3199
+ "learning_rate": 0.003884,
3200
+ "loss": 1.8219,
3201
+ "step": 6877
3202
+ },
3203
+ {
3204
+ "epoch": 530.0,
3205
+ "learning_rate": 0.0038799999999999998,
3206
+ "loss": 1.884,
3207
+ "step": 6890
3208
+ },
3209
+ {
3210
+ "epoch": 531.0,
3211
+ "learning_rate": 0.003876,
3212
+ "loss": 1.9361,
3213
+ "step": 6903
3214
+ },
3215
+ {
3216
+ "epoch": 532.0,
3217
+ "learning_rate": 0.003872,
3218
+ "loss": 1.8961,
3219
+ "step": 6916
3220
+ },
3221
+ {
3222
+ "epoch": 533.0,
3223
+ "learning_rate": 0.003868,
3224
+ "loss": 1.9082,
3225
+ "step": 6929
3226
+ },
3227
+ {
3228
+ "epoch": 534.0,
3229
+ "learning_rate": 0.003864,
3230
+ "loss": 2.0034,
3231
+ "step": 6942
3232
+ },
3233
+ {
3234
+ "epoch": 535.0,
3235
+ "learning_rate": 0.00386,
3236
+ "loss": 2.0058,
3237
+ "step": 6955
3238
+ },
3239
+ {
3240
+ "epoch": 536.0,
3241
+ "learning_rate": 0.003856,
3242
+ "loss": 1.9934,
3243
+ "step": 6968
3244
+ },
3245
+ {
3246
+ "epoch": 537.0,
3247
+ "learning_rate": 0.003852,
3248
+ "loss": 1.9674,
3249
+ "step": 6981
3250
+ },
3251
+ {
3252
+ "epoch": 538.0,
3253
+ "learning_rate": 0.003848,
3254
+ "loss": 1.9737,
3255
+ "step": 6994
3256
+ },
3257
+ {
3258
+ "epoch": 539.0,
3259
+ "learning_rate": 0.0038439999999999998,
3260
+ "loss": 1.9184,
3261
+ "step": 7007
3262
+ },
3263
+ {
3264
+ "epoch": 540.0,
3265
+ "learning_rate": 0.00384,
3266
+ "loss": 1.9147,
3267
+ "step": 7020
3268
+ },
3269
+ {
3270
+ "epoch": 541.0,
3271
+ "learning_rate": 0.003836,
3272
+ "loss": 1.9792,
3273
+ "step": 7033
3274
+ },
3275
+ {
3276
+ "epoch": 542.0,
3277
+ "learning_rate": 0.003832,
3278
+ "loss": 1.9448,
3279
+ "step": 7046
3280
+ },
3281
+ {
3282
+ "epoch": 543.0,
3283
+ "learning_rate": 0.003828,
3284
+ "loss": 1.8897,
3285
+ "step": 7059
3286
+ },
3287
+ {
3288
+ "epoch": 544.0,
3289
+ "learning_rate": 0.0038239999999999997,
3290
+ "loss": 1.9048,
3291
+ "step": 7072
3292
+ },
3293
+ {
3294
+ "epoch": 545.0,
3295
+ "learning_rate": 0.00382,
3296
+ "loss": 1.9577,
3297
+ "step": 7085
3298
+ },
3299
+ {
3300
+ "epoch": 546.0,
3301
+ "learning_rate": 0.003816,
3302
+ "loss": 1.9996,
3303
+ "step": 7098
3304
+ },
3305
+ {
3306
+ "epoch": 547.0,
3307
+ "learning_rate": 0.003812,
3308
+ "loss": 1.9895,
3309
+ "step": 7111
3310
+ },
3311
+ {
3312
+ "epoch": 548.0,
3313
+ "learning_rate": 0.0038079999999999998,
3314
+ "loss": 2.0381,
3315
+ "step": 7124
3316
+ },
3317
+ {
3318
+ "epoch": 549.0,
3319
+ "learning_rate": 0.003804,
3320
+ "loss": 1.9362,
3321
+ "step": 7137
3322
+ },
3323
+ {
3324
+ "epoch": 550.0,
3325
+ "learning_rate": 0.0038,
3326
+ "loss": 1.9544,
3327
+ "step": 7150
3328
+ },
3329
+ {
3330
+ "epoch": 551.0,
3331
+ "learning_rate": 0.003796,
3332
+ "loss": 1.9497,
3333
+ "step": 7163
3334
+ },
3335
+ {
3336
+ "epoch": 552.0,
3337
+ "learning_rate": 0.003792,
3338
+ "loss": 1.9105,
3339
+ "step": 7176
3340
+ },
3341
+ {
3342
+ "epoch": 553.0,
3343
+ "learning_rate": 0.0037879999999999997,
3344
+ "loss": 1.896,
3345
+ "step": 7189
3346
+ },
3347
+ {
3348
+ "epoch": 554.0,
3349
+ "learning_rate": 0.003784,
3350
+ "loss": 2.0015,
3351
+ "step": 7202
3352
+ },
3353
+ {
3354
+ "epoch": 555.0,
3355
+ "learning_rate": 0.00378,
3356
+ "loss": 1.8753,
3357
+ "step": 7215
3358
+ },
3359
+ {
3360
+ "epoch": 556.0,
3361
+ "learning_rate": 0.003776,
3362
+ "loss": 1.9087,
3363
+ "step": 7228
3364
+ },
3365
+ {
3366
+ "epoch": 557.0,
3367
+ "learning_rate": 0.0037719999999999997,
3368
+ "loss": 1.959,
3369
+ "step": 7241
3370
+ },
3371
+ {
3372
+ "epoch": 558.0,
3373
+ "learning_rate": 0.003768,
3374
+ "loss": 2.0541,
3375
+ "step": 7254
3376
+ },
3377
+ {
3378
+ "epoch": 559.0,
3379
+ "learning_rate": 0.003764,
3380
+ "loss": 2.0614,
3381
+ "step": 7267
3382
+ },
3383
+ {
3384
+ "epoch": 560.0,
3385
+ "learning_rate": 0.00376,
3386
+ "loss": 2.0349,
3387
+ "step": 7280
3388
+ },
3389
+ {
3390
+ "epoch": 561.0,
3391
+ "learning_rate": 0.003756,
3392
+ "loss": 1.9517,
3393
+ "step": 7293
3394
+ },
3395
+ {
3396
+ "epoch": 562.0,
3397
+ "learning_rate": 0.0037519999999999997,
3398
+ "loss": 2.0094,
3399
+ "step": 7306
3400
+ },
3401
+ {
3402
+ "epoch": 563.0,
3403
+ "learning_rate": 0.0037480000000000005,
3404
+ "loss": 2.0459,
3405
+ "step": 7319
3406
+ },
3407
+ {
3408
+ "epoch": 564.0,
3409
+ "learning_rate": 0.0037440000000000004,
3410
+ "loss": 2.172,
3411
+ "step": 7332
3412
+ },
3413
+ {
3414
+ "epoch": 565.0,
3415
+ "learning_rate": 0.0037400000000000003,
3416
+ "loss": 2.0798,
3417
+ "step": 7345
3418
+ },
3419
+ {
3420
+ "epoch": 566.0,
3421
+ "learning_rate": 0.003736,
3422
+ "loss": 2.0222,
3423
+ "step": 7358
3424
+ },
3425
+ {
3426
+ "epoch": 567.0,
3427
+ "learning_rate": 0.003732,
3428
+ "loss": 2.0051,
3429
+ "step": 7371
3430
+ },
3431
+ {
3432
+ "epoch": 568.0,
3433
+ "learning_rate": 0.0037280000000000004,
3434
+ "loss": 2.2096,
3435
+ "step": 7384
3436
+ },
3437
+ {
3438
+ "epoch": 569.0,
3439
+ "learning_rate": 0.0037240000000000003,
3440
+ "loss": 2.2197,
3441
+ "step": 7397
3442
+ },
3443
+ {
3444
+ "epoch": 570.0,
3445
+ "learning_rate": 0.00372,
3446
+ "loss": 2.1259,
3447
+ "step": 7410
3448
+ },
3449
+ {
3450
+ "epoch": 571.0,
3451
+ "learning_rate": 0.003716,
3452
+ "loss": 2.1098,
3453
+ "step": 7423
3454
+ },
3455
+ {
3456
+ "epoch": 572.0,
3457
+ "learning_rate": 0.0037120000000000005,
3458
+ "loss": 2.0734,
3459
+ "step": 7436
3460
+ },
3461
+ {
3462
+ "epoch": 573.0,
3463
+ "learning_rate": 0.0037080000000000004,
3464
+ "loss": 2.0822,
3465
+ "step": 7449
3466
+ },
3467
+ {
3468
+ "epoch": 574.0,
3469
+ "learning_rate": 0.0037040000000000003,
3470
+ "loss": 2.0771,
3471
+ "step": 7462
3472
+ },
3473
+ {
3474
+ "epoch": 575.0,
3475
+ "learning_rate": 0.0037,
3476
+ "loss": 2.0217,
3477
+ "step": 7475
3478
+ },
3479
+ {
3480
+ "epoch": 576.0,
3481
+ "learning_rate": 0.003696,
3482
+ "loss": 1.9762,
3483
+ "step": 7488
3484
+ },
3485
+ {
3486
+ "epoch": 577.0,
3487
+ "learning_rate": 0.0036920000000000004,
3488
+ "loss": 1.9341,
3489
+ "step": 7501
3490
+ },
3491
+ {
3492
+ "epoch": 578.0,
3493
+ "learning_rate": 0.0036880000000000003,
3494
+ "loss": 1.9837,
3495
+ "step": 7514
3496
+ },
3497
+ {
3498
+ "epoch": 579.0,
3499
+ "learning_rate": 0.003684,
3500
+ "loss": 1.9337,
3501
+ "step": 7527
3502
+ },
3503
+ {
3504
+ "epoch": 580.0,
3505
+ "learning_rate": 0.00368,
3506
+ "loss": 1.8968,
3507
+ "step": 7540
3508
+ },
3509
+ {
3510
+ "epoch": 581.0,
3511
+ "learning_rate": 0.0036760000000000004,
3512
+ "loss": 1.8705,
3513
+ "step": 7553
3514
+ },
3515
+ {
3516
+ "epoch": 582.0,
3517
+ "learning_rate": 0.0036720000000000004,
3518
+ "loss": 1.8261,
3519
+ "step": 7566
3520
+ },
3521
+ {
3522
+ "epoch": 583.0,
3523
+ "learning_rate": 0.0036680000000000003,
3524
+ "loss": 1.9411,
3525
+ "step": 7579
3526
+ },
3527
+ {
3528
+ "epoch": 584.0,
3529
+ "learning_rate": 0.003664,
3530
+ "loss": 1.9961,
3531
+ "step": 7592
3532
+ },
3533
+ {
3534
+ "epoch": 585.0,
3535
+ "learning_rate": 0.00366,
3536
+ "loss": 1.8865,
3537
+ "step": 7605
3538
+ },
3539
+ {
3540
+ "epoch": 586.0,
3541
+ "learning_rate": 0.0036560000000000004,
3542
+ "loss": 1.829,
3543
+ "step": 7618
3544
+ },
3545
+ {
3546
+ "epoch": 587.0,
3547
+ "learning_rate": 0.0036520000000000003,
3548
+ "loss": 1.8424,
3549
+ "step": 7631
3550
+ },
3551
+ {
3552
+ "epoch": 588.0,
3553
+ "learning_rate": 0.003648,
3554
+ "loss": 1.8463,
3555
+ "step": 7644
3556
+ },
3557
+ {
3558
+ "epoch": 589.0,
3559
+ "learning_rate": 0.003644,
3560
+ "loss": 1.8452,
3561
+ "step": 7657
3562
+ },
3563
+ {
3564
+ "epoch": 590.0,
3565
+ "learning_rate": 0.00364,
3566
+ "loss": 1.7974,
3567
+ "step": 7670
3568
+ },
3569
+ {
3570
+ "epoch": 591.0,
3571
+ "learning_rate": 0.0036360000000000003,
3572
+ "loss": 1.7995,
3573
+ "step": 7683
3574
+ },
3575
+ {
3576
+ "epoch": 592.0,
3577
+ "learning_rate": 0.0036320000000000002,
3578
+ "loss": 1.7664,
3579
+ "step": 7696
3580
+ },
3581
+ {
3582
+ "epoch": 593.0,
3583
+ "learning_rate": 0.003628,
3584
+ "loss": 1.7451,
3585
+ "step": 7709
3586
+ },
3587
+ {
3588
+ "epoch": 594.0,
3589
+ "learning_rate": 0.003624,
3590
+ "loss": 1.7978,
3591
+ "step": 7722
3592
+ },
3593
+ {
3594
+ "epoch": 595.0,
3595
+ "learning_rate": 0.0036200000000000004,
3596
+ "loss": 1.9067,
3597
+ "step": 7735
3598
+ },
3599
+ {
3600
+ "epoch": 596.0,
3601
+ "learning_rate": 0.0036160000000000003,
3602
+ "loss": 1.8932,
3603
+ "step": 7748
3604
+ },
3605
+ {
3606
+ "epoch": 597.0,
3607
+ "learning_rate": 0.003612,
3608
+ "loss": 1.9407,
3609
+ "step": 7761
3610
+ },
3611
+ {
3612
+ "epoch": 598.0,
3613
+ "learning_rate": 0.003608,
3614
+ "loss": 1.8776,
3615
+ "step": 7774
3616
+ },
3617
+ {
3618
+ "epoch": 599.0,
3619
+ "learning_rate": 0.003604,
3620
+ "loss": 1.8223,
3621
+ "step": 7787
3622
+ },
3623
+ {
3624
+ "epoch": 600.0,
3625
+ "learning_rate": 0.0036000000000000003,
3626
+ "loss": 1.7761,
3627
+ "step": 7800
3628
+ },
3629
+ {
3630
+ "epoch": 601.0,
3631
+ "learning_rate": 0.0035960000000000002,
3632
+ "loss": 1.7768,
3633
+ "step": 7813
3634
+ },
3635
+ {
3636
+ "epoch": 602.0,
3637
+ "learning_rate": 0.003592,
3638
+ "loss": 1.8109,
3639
+ "step": 7826
3640
+ },
3641
+ {
3642
+ "epoch": 603.0,
3643
+ "learning_rate": 0.003588,
3644
+ "loss": 1.7787,
3645
+ "step": 7839
3646
+ },
3647
+ {
3648
+ "epoch": 604.0,
3649
+ "learning_rate": 0.003584,
3650
+ "loss": 1.9842,
3651
+ "step": 7852
3652
+ },
3653
+ {
3654
+ "epoch": 605.0,
3655
+ "learning_rate": 0.0035800000000000003,
3656
+ "loss": 1.9262,
3657
+ "step": 7865
3658
+ },
3659
+ {
3660
+ "epoch": 606.0,
3661
+ "learning_rate": 0.003576,
3662
+ "loss": 1.9124,
3663
+ "step": 7878
3664
+ },
3665
+ {
3666
+ "epoch": 607.0,
3667
+ "learning_rate": 0.003572,
3668
+ "loss": 1.8407,
3669
+ "step": 7891
3670
+ },
3671
+ {
3672
+ "epoch": 608.0,
3673
+ "learning_rate": 0.003568,
3674
+ "loss": 1.8722,
3675
+ "step": 7904
3676
+ },
3677
+ {
3678
+ "epoch": 609.0,
3679
+ "learning_rate": 0.0035640000000000003,
3680
+ "loss": 1.7409,
3681
+ "step": 7917
3682
+ },
3683
+ {
3684
+ "epoch": 610.0,
3685
+ "learning_rate": 0.0035600000000000002,
3686
+ "loss": 1.712,
3687
+ "step": 7930
3688
+ },
3689
+ {
3690
+ "epoch": 611.0,
3691
+ "learning_rate": 0.003556,
3692
+ "loss": 1.6115,
3693
+ "step": 7943
3694
+ },
3695
+ {
3696
+ "epoch": 612.0,
3697
+ "learning_rate": 0.003552,
3698
+ "loss": 1.6805,
3699
+ "step": 7956
3700
+ },
3701
+ {
3702
+ "epoch": 613.0,
3703
+ "learning_rate": 0.003548,
3704
+ "loss": 1.7829,
3705
+ "step": 7969
3706
+ },
3707
+ {
3708
+ "epoch": 614.0,
3709
+ "learning_rate": 0.0035440000000000003,
3710
+ "loss": 1.7498,
3711
+ "step": 7982
3712
+ },
3713
+ {
3714
+ "epoch": 615.0,
3715
+ "learning_rate": 0.00354,
3716
+ "loss": 1.7536,
3717
+ "step": 7995
3718
+ },
3719
+ {
3720
+ "epoch": 616.0,
3721
+ "learning_rate": 0.003536,
3722
+ "loss": 1.7015,
3723
+ "step": 8008
3724
+ },
3725
+ {
3726
+ "epoch": 617.0,
3727
+ "learning_rate": 0.003532,
3728
+ "loss": 1.6556,
3729
+ "step": 8021
3730
+ },
3731
+ {
3732
+ "epoch": 618.0,
3733
+ "learning_rate": 0.003528,
3734
+ "loss": 1.7314,
3735
+ "step": 8034
3736
+ },
3737
+ {
3738
+ "epoch": 619.0,
3739
+ "learning_rate": 0.0035240000000000002,
3740
+ "loss": 1.6996,
3741
+ "step": 8047
3742
+ },
3743
+ {
3744
+ "epoch": 620.0,
3745
+ "learning_rate": 0.00352,
3746
+ "loss": 1.6819,
3747
+ "step": 8060
3748
+ },
3749
+ {
3750
+ "epoch": 621.0,
3751
+ "learning_rate": 0.003516,
3752
+ "loss": 1.6994,
3753
+ "step": 8073
3754
+ },
3755
+ {
3756
+ "epoch": 622.0,
3757
+ "learning_rate": 0.003512,
3758
+ "loss": 1.6657,
3759
+ "step": 8086
3760
+ },
3761
+ {
3762
+ "epoch": 623.0,
3763
+ "learning_rate": 0.0035080000000000003,
3764
+ "loss": 1.6558,
3765
+ "step": 8099
3766
+ },
3767
+ {
3768
+ "epoch": 624.0,
3769
+ "learning_rate": 0.003504,
3770
+ "loss": 1.6822,
3771
+ "step": 8112
3772
+ },
3773
+ {
3774
+ "epoch": 625.0,
3775
+ "learning_rate": 0.0035,
3776
+ "loss": 1.7245,
3777
+ "step": 8125
3778
+ },
3779
+ {
3780
+ "epoch": 626.0,
3781
+ "learning_rate": 0.003496,
3782
+ "loss": 1.8045,
3783
+ "step": 8138
3784
+ },
3785
+ {
3786
+ "epoch": 627.0,
3787
+ "learning_rate": 0.003492,
3788
+ "loss": 1.7307,
3789
+ "step": 8151
3790
+ },
3791
+ {
3792
+ "epoch": 628.0,
3793
+ "learning_rate": 0.003488,
3794
+ "loss": 1.7469,
3795
+ "step": 8164
3796
+ },
3797
+ {
3798
+ "epoch": 629.0,
3799
+ "learning_rate": 0.003484,
3800
+ "loss": 1.7047,
3801
+ "step": 8177
3802
+ },
3803
+ {
3804
+ "epoch": 630.0,
3805
+ "learning_rate": 0.00348,
3806
+ "loss": 1.6359,
3807
+ "step": 8190
3808
+ },
3809
+ {
3810
+ "epoch": 631.0,
3811
+ "learning_rate": 0.003476,
3812
+ "loss": 1.7324,
3813
+ "step": 8203
3814
+ },
3815
+ {
3816
+ "epoch": 632.0,
3817
+ "learning_rate": 0.0034720000000000003,
3818
+ "loss": 1.6107,
3819
+ "step": 8216
3820
+ },
3821
+ {
3822
+ "epoch": 633.0,
3823
+ "learning_rate": 0.003468,
3824
+ "loss": 1.5336,
3825
+ "step": 8229
3826
+ },
3827
+ {
3828
+ "epoch": 634.0,
3829
+ "learning_rate": 0.003464,
3830
+ "loss": 1.5587,
3831
+ "step": 8242
3832
+ },
3833
+ {
3834
+ "epoch": 635.0,
3835
+ "learning_rate": 0.00346,
3836
+ "loss": 1.581,
3837
+ "step": 8255
3838
+ },
3839
+ {
3840
+ "epoch": 636.0,
3841
+ "learning_rate": 0.003456,
3842
+ "loss": 1.5281,
3843
+ "step": 8268
3844
+ },
3845
+ {
3846
+ "epoch": 637.0,
3847
+ "learning_rate": 0.003452,
3848
+ "loss": 1.5198,
3849
+ "step": 8281
3850
+ },
3851
+ {
3852
+ "epoch": 638.0,
3853
+ "learning_rate": 0.003448,
3854
+ "loss": 1.5671,
3855
+ "step": 8294
3856
+ },
3857
+ {
3858
+ "epoch": 639.0,
3859
+ "learning_rate": 0.003444,
3860
+ "loss": 1.5257,
3861
+ "step": 8307
3862
+ },
3863
+ {
3864
+ "epoch": 640.0,
3865
+ "learning_rate": 0.00344,
3866
+ "loss": 1.5525,
3867
+ "step": 8320
3868
+ },
3869
+ {
3870
+ "epoch": 641.0,
3871
+ "learning_rate": 0.003436,
3872
+ "loss": 1.5005,
3873
+ "step": 8333
3874
+ },
3875
+ {
3876
+ "epoch": 642.0,
3877
+ "learning_rate": 0.003432,
3878
+ "loss": 1.4971,
3879
+ "step": 8346
3880
+ },
3881
+ {
3882
+ "epoch": 643.0,
3883
+ "learning_rate": 0.003428,
3884
+ "loss": 1.4738,
3885
+ "step": 8359
3886
+ },
3887
+ {
3888
+ "epoch": 644.0,
3889
+ "learning_rate": 0.003424,
3890
+ "loss": 1.5397,
3891
+ "step": 8372
3892
+ },
3893
+ {
3894
+ "epoch": 645.0,
3895
+ "learning_rate": 0.00342,
3896
+ "loss": 1.5092,
3897
+ "step": 8385
3898
+ },
3899
+ {
3900
+ "epoch": 646.0,
3901
+ "learning_rate": 0.003416,
3902
+ "loss": 1.5638,
3903
+ "step": 8398
3904
+ },
3905
+ {
3906
+ "epoch": 647.0,
3907
+ "learning_rate": 0.003412,
3908
+ "loss": 1.4813,
3909
+ "step": 8411
3910
+ },
3911
+ {
3912
+ "epoch": 648.0,
3913
+ "learning_rate": 0.003408,
3914
+ "loss": 1.4827,
3915
+ "step": 8424
3916
+ },
3917
+ {
3918
+ "epoch": 649.0,
3919
+ "learning_rate": 0.003404,
3920
+ "loss": 1.5285,
3921
+ "step": 8437
3922
+ },
3923
+ {
3924
+ "epoch": 650.0,
3925
+ "learning_rate": 0.0034,
3926
+ "loss": 1.5059,
3927
+ "step": 8450
3928
+ },
3929
+ {
3930
+ "epoch": 651.0,
3931
+ "learning_rate": 0.003396,
3932
+ "loss": 1.5452,
3933
+ "step": 8463
3934
+ },
3935
+ {
3936
+ "epoch": 652.0,
3937
+ "learning_rate": 0.003392,
3938
+ "loss": 1.6823,
3939
+ "step": 8476
3940
+ },
3941
+ {
3942
+ "epoch": 653.0,
3943
+ "learning_rate": 0.003388,
3944
+ "loss": 1.6004,
3945
+ "step": 8489
3946
+ },
3947
+ {
3948
+ "epoch": 654.0,
3949
+ "learning_rate": 0.003384,
3950
+ "loss": 1.648,
3951
+ "step": 8502
3952
+ },
3953
+ {
3954
+ "epoch": 655.0,
3955
+ "learning_rate": 0.0033799999999999998,
3956
+ "loss": 1.6614,
3957
+ "step": 8515
3958
+ },
3959
+ {
3960
+ "epoch": 656.0,
3961
+ "learning_rate": 0.003376,
3962
+ "loss": 1.6969,
3963
+ "step": 8528
3964
+ },
3965
+ {
3966
+ "epoch": 657.0,
3967
+ "learning_rate": 0.003372,
3968
+ "loss": 1.5948,
3969
+ "step": 8541
3970
+ },
3971
+ {
3972
+ "epoch": 658.0,
3973
+ "learning_rate": 0.003368,
3974
+ "loss": 1.5743,
3975
+ "step": 8554
3976
+ },
3977
+ {
3978
+ "epoch": 659.0,
3979
+ "learning_rate": 0.003364,
3980
+ "loss": 1.5521,
3981
+ "step": 8567
3982
+ },
3983
+ {
3984
+ "epoch": 660.0,
3985
+ "learning_rate": 0.00336,
3986
+ "loss": 1.5639,
3987
+ "step": 8580
3988
+ },
3989
+ {
3990
+ "epoch": 661.0,
3991
+ "learning_rate": 0.003356,
3992
+ "loss": 1.5155,
3993
+ "step": 8593
3994
+ },
3995
+ {
3996
+ "epoch": 662.0,
3997
+ "learning_rate": 0.003352,
3998
+ "loss": 1.4494,
3999
+ "step": 8606
4000
+ },
4001
+ {
4002
+ "epoch": 663.0,
4003
+ "learning_rate": 0.003348,
4004
+ "loss": 1.5803,
4005
+ "step": 8619
4006
+ },
4007
+ {
4008
+ "epoch": 664.0,
4009
+ "learning_rate": 0.0033439999999999998,
4010
+ "loss": 1.5896,
4011
+ "step": 8632
4012
+ },
4013
+ {
4014
+ "epoch": 665.0,
4015
+ "learning_rate": 0.00334,
4016
+ "loss": 1.6208,
4017
+ "step": 8645
4018
+ },
4019
+ {
4020
+ "epoch": 666.0,
4021
+ "learning_rate": 0.003336,
4022
+ "loss": 1.6345,
4023
+ "step": 8658
4024
+ },
4025
+ {
4026
+ "epoch": 667.0,
4027
+ "learning_rate": 0.003332,
4028
+ "loss": 1.6087,
4029
+ "step": 8671
4030
+ },
4031
+ {
4032
+ "epoch": 668.0,
4033
+ "learning_rate": 0.003328,
4034
+ "loss": 1.5938,
4035
+ "step": 8684
4036
+ },
4037
+ {
4038
+ "epoch": 669.0,
4039
+ "learning_rate": 0.0033239999999999997,
4040
+ "loss": 1.5391,
4041
+ "step": 8697
4042
+ },
4043
+ {
4044
+ "epoch": 670.0,
4045
+ "learning_rate": 0.00332,
4046
+ "loss": 1.5938,
4047
+ "step": 8710
4048
+ },
4049
+ {
4050
+ "epoch": 671.0,
4051
+ "learning_rate": 0.003316,
4052
+ "loss": 1.6235,
4053
+ "step": 8723
4054
+ },
4055
+ {
4056
+ "epoch": 672.0,
4057
+ "learning_rate": 0.003312,
4058
+ "loss": 1.6882,
4059
+ "step": 8736
4060
+ },
4061
+ {
4062
+ "epoch": 673.0,
4063
+ "learning_rate": 0.0033079999999999997,
4064
+ "loss": 1.5822,
4065
+ "step": 8749
4066
+ },
4067
+ {
4068
+ "epoch": 674.0,
4069
+ "learning_rate": 0.003304,
4070
+ "loss": 1.622,
4071
+ "step": 8762
4072
+ },
4073
+ {
4074
+ "epoch": 675.0,
4075
+ "learning_rate": 0.0033,
4076
+ "loss": 1.587,
4077
+ "step": 8775
4078
+ },
4079
+ {
4080
+ "epoch": 676.0,
4081
+ "learning_rate": 0.003296,
4082
+ "loss": 1.5119,
4083
+ "step": 8788
4084
+ },
4085
+ {
4086
+ "epoch": 677.0,
4087
+ "learning_rate": 0.003292,
4088
+ "loss": 1.467,
4089
+ "step": 8801
4090
+ },
4091
+ {
4092
+ "epoch": 678.0,
4093
+ "learning_rate": 0.0032879999999999997,
4094
+ "loss": 1.438,
4095
+ "step": 8814
4096
+ },
4097
+ {
4098
+ "epoch": 679.0,
4099
+ "learning_rate": 0.003284,
4100
+ "loss": 1.4497,
4101
+ "step": 8827
4102
+ },
4103
+ {
4104
+ "epoch": 680.0,
4105
+ "learning_rate": 0.00328,
4106
+ "loss": 1.4349,
4107
+ "step": 8840
4108
+ },
4109
+ {
4110
+ "epoch": 681.0,
4111
+ "learning_rate": 0.003276,
4112
+ "loss": 1.378,
4113
+ "step": 8853
4114
+ },
4115
+ {
4116
+ "epoch": 682.0,
4117
+ "learning_rate": 0.0032719999999999997,
4118
+ "loss": 1.3847,
4119
+ "step": 8866
4120
+ },
4121
+ {
4122
+ "epoch": 683.0,
4123
+ "learning_rate": 0.003268,
4124
+ "loss": 1.4556,
4125
+ "step": 8879
4126
+ },
4127
+ {
4128
+ "epoch": 684.0,
4129
+ "learning_rate": 0.003264,
4130
+ "loss": 1.4435,
4131
+ "step": 8892
4132
+ },
4133
+ {
4134
+ "epoch": 685.0,
4135
+ "learning_rate": 0.00326,
4136
+ "loss": 1.4494,
4137
+ "step": 8905
4138
+ },
4139
+ {
4140
+ "epoch": 686.0,
4141
+ "learning_rate": 0.0032559999999999998,
4142
+ "loss": 1.4065,
4143
+ "step": 8918
4144
+ },
4145
+ {
4146
+ "epoch": 687.0,
4147
+ "learning_rate": 0.0032519999999999997,
4148
+ "loss": 1.4323,
4149
+ "step": 8931
4150
+ },
4151
+ {
4152
+ "epoch": 688.0,
4153
+ "learning_rate": 0.0032480000000000005,
4154
+ "loss": 1.4698,
4155
+ "step": 8944
4156
+ },
4157
+ {
4158
+ "epoch": 689.0,
4159
+ "learning_rate": 0.0032440000000000004,
4160
+ "loss": 1.4293,
4161
+ "step": 8957
4162
+ },
4163
+ {
4164
+ "epoch": 690.0,
4165
+ "learning_rate": 0.0032400000000000003,
4166
+ "loss": 1.5205,
4167
+ "step": 8970
4168
+ },
4169
+ {
4170
+ "epoch": 691.0,
4171
+ "learning_rate": 0.003236,
4172
+ "loss": 1.5039,
4173
+ "step": 8983
4174
+ },
4175
+ {
4176
+ "epoch": 692.0,
4177
+ "learning_rate": 0.003232,
4178
+ "loss": 1.4715,
4179
+ "step": 8996
4180
+ },
4181
+ {
4182
+ "epoch": 693.0,
4183
+ "learning_rate": 0.0032280000000000004,
4184
+ "loss": 1.4448,
4185
+ "step": 9009
4186
+ },
4187
+ {
4188
+ "epoch": 694.0,
4189
+ "learning_rate": 0.0032240000000000003,
4190
+ "loss": 1.5031,
4191
+ "step": 9022
4192
+ },
4193
+ {
4194
+ "epoch": 695.0,
4195
+ "learning_rate": 0.00322,
4196
+ "loss": 1.5261,
4197
+ "step": 9035
4198
+ },
4199
+ {
4200
+ "epoch": 696.0,
4201
+ "learning_rate": 0.003216,
4202
+ "loss": 1.4389,
4203
+ "step": 9048
4204
+ },
4205
+ {
4206
+ "epoch": 697.0,
4207
+ "learning_rate": 0.0032120000000000004,
4208
+ "loss": 1.4792,
4209
+ "step": 9061
4210
+ },
4211
+ {
4212
+ "epoch": 698.0,
4213
+ "learning_rate": 0.0032080000000000003,
4214
+ "loss": 1.4317,
4215
+ "step": 9074
4216
+ },
4217
+ {
4218
+ "epoch": 699.0,
4219
+ "learning_rate": 0.0032040000000000003,
4220
+ "loss": 1.5776,
4221
+ "step": 9087
4222
+ },
4223
+ {
4224
+ "epoch": 700.0,
4225
+ "learning_rate": 0.0032,
4226
+ "loss": 1.5326,
4227
+ "step": 9100
4228
+ },
4229
+ {
4230
+ "epoch": 701.0,
4231
+ "learning_rate": 0.003196,
4232
+ "loss": 1.5345,
4233
+ "step": 9113
4234
+ },
4235
+ {
4236
+ "epoch": 702.0,
4237
+ "learning_rate": 0.0031920000000000004,
4238
+ "loss": 1.5269,
4239
+ "step": 9126
4240
+ },
4241
+ {
4242
+ "epoch": 703.0,
4243
+ "learning_rate": 0.0031880000000000003,
4244
+ "loss": 1.4819,
4245
+ "step": 9139
4246
+ },
4247
+ {
4248
+ "epoch": 704.0,
4249
+ "learning_rate": 0.003184,
4250
+ "loss": 1.5326,
4251
+ "step": 9152
4252
+ },
4253
+ {
4254
+ "epoch": 705.0,
4255
+ "learning_rate": 0.00318,
4256
+ "loss": 1.4257,
4257
+ "step": 9165
4258
+ },
4259
+ {
4260
+ "epoch": 706.0,
4261
+ "learning_rate": 0.0031760000000000004,
4262
+ "loss": 1.4306,
4263
+ "step": 9178
4264
+ },
4265
+ {
4266
+ "epoch": 707.0,
4267
+ "learning_rate": 0.0031720000000000003,
4268
+ "loss": 1.3884,
4269
+ "step": 9191
4270
+ },
4271
+ {
4272
+ "epoch": 708.0,
4273
+ "learning_rate": 0.0031680000000000002,
4274
+ "loss": 1.3421,
4275
+ "step": 9204
4276
+ },
4277
+ {
4278
+ "epoch": 709.0,
4279
+ "learning_rate": 0.003164,
4280
+ "loss": 1.394,
4281
+ "step": 9217
4282
+ },
4283
+ {
4284
+ "epoch": 710.0,
4285
+ "learning_rate": 0.00316,
4286
+ "loss": 1.3892,
4287
+ "step": 9230
4288
+ },
4289
+ {
4290
+ "epoch": 711.0,
4291
+ "learning_rate": 0.0031560000000000004,
4292
+ "loss": 1.4832,
4293
+ "step": 9243
4294
+ },
4295
+ {
4296
+ "epoch": 712.0,
4297
+ "learning_rate": 0.0031520000000000003,
4298
+ "loss": 1.4088,
4299
+ "step": 9256
4300
+ },
4301
+ {
4302
+ "epoch": 713.0,
4303
+ "learning_rate": 0.003148,
4304
+ "loss": 1.386,
4305
+ "step": 9269
4306
+ },
4307
+ {
4308
+ "epoch": 714.0,
4309
+ "learning_rate": 0.003144,
4310
+ "loss": 1.3992,
4311
+ "step": 9282
4312
+ },
4313
+ {
4314
+ "epoch": 715.0,
4315
+ "learning_rate": 0.00314,
4316
+ "loss": 1.381,
4317
+ "step": 9295
4318
+ },
4319
+ {
4320
+ "epoch": 716.0,
4321
+ "learning_rate": 0.0031360000000000003,
4322
+ "loss": 1.394,
4323
+ "step": 9308
4324
+ },
4325
+ {
4326
+ "epoch": 717.0,
4327
+ "learning_rate": 0.0031320000000000002,
4328
+ "loss": 1.4024,
4329
+ "step": 9321
4330
+ },
4331
+ {
4332
+ "epoch": 718.0,
4333
+ "learning_rate": 0.003128,
4334
+ "loss": 1.3334,
4335
+ "step": 9334
4336
+ },
4337
+ {
4338
+ "epoch": 719.0,
4339
+ "learning_rate": 0.003124,
4340
+ "loss": 1.3467,
4341
+ "step": 9347
4342
+ },
4343
+ {
4344
+ "epoch": 720.0,
4345
+ "learning_rate": 0.0031200000000000004,
4346
+ "loss": 1.284,
4347
+ "step": 9360
4348
+ },
4349
+ {
4350
+ "epoch": 721.0,
4351
+ "learning_rate": 0.0031160000000000003,
4352
+ "loss": 1.2705,
4353
+ "step": 9373
4354
+ },
4355
+ {
4356
+ "epoch": 722.0,
4357
+ "learning_rate": 0.003112,
4358
+ "loss": 1.2919,
4359
+ "step": 9386
4360
+ },
4361
+ {
4362
+ "epoch": 723.0,
4363
+ "learning_rate": 0.003108,
4364
+ "loss": 1.3071,
4365
+ "step": 9399
4366
+ },
4367
+ {
4368
+ "epoch": 724.0,
4369
+ "learning_rate": 0.003104,
4370
+ "loss": 1.3584,
4371
+ "step": 9412
4372
+ },
4373
+ {
4374
+ "epoch": 725.0,
4375
+ "learning_rate": 0.0031000000000000003,
4376
+ "loss": 1.4149,
4377
+ "step": 9425
4378
+ },
4379
+ {
4380
+ "epoch": 726.0,
4381
+ "learning_rate": 0.0030960000000000002,
4382
+ "loss": 1.3721,
4383
+ "step": 9438
4384
+ },
4385
+ {
4386
+ "epoch": 727.0,
4387
+ "learning_rate": 0.003092,
4388
+ "loss": 1.3719,
4389
+ "step": 9451
4390
+ },
4391
+ {
4392
+ "epoch": 728.0,
4393
+ "learning_rate": 0.003088,
4394
+ "loss": 1.3559,
4395
+ "step": 9464
4396
+ },
4397
+ {
4398
+ "epoch": 729.0,
4399
+ "learning_rate": 0.003084,
4400
+ "loss": 1.3108,
4401
+ "step": 9477
4402
+ },
4403
+ {
4404
+ "epoch": 730.0,
4405
+ "learning_rate": 0.0030800000000000003,
4406
+ "loss": 1.316,
4407
+ "step": 9490
4408
+ },
4409
+ {
4410
+ "epoch": 731.0,
4411
+ "learning_rate": 0.003076,
4412
+ "loss": 1.3154,
4413
+ "step": 9503
4414
+ },
4415
+ {
4416
+ "epoch": 732.0,
4417
+ "learning_rate": 0.003072,
4418
+ "loss": 1.327,
4419
+ "step": 9516
4420
+ },
4421
+ {
4422
+ "epoch": 733.0,
4423
+ "learning_rate": 0.003068,
4424
+ "loss": 1.2914,
4425
+ "step": 9529
4426
+ },
4427
+ {
4428
+ "epoch": 734.0,
4429
+ "learning_rate": 0.0030640000000000003,
4430
+ "loss": 1.2891,
4431
+ "step": 9542
4432
+ },
4433
+ {
4434
+ "epoch": 735.0,
4435
+ "learning_rate": 0.0030600000000000002,
4436
+ "loss": 1.2923,
4437
+ "step": 9555
4438
+ },
4439
+ {
4440
+ "epoch": 736.0,
4441
+ "learning_rate": 0.003056,
4442
+ "loss": 1.3608,
4443
+ "step": 9568
4444
+ },
4445
+ {
4446
+ "epoch": 737.0,
4447
+ "learning_rate": 0.003052,
4448
+ "loss": 1.3126,
4449
+ "step": 9581
4450
+ },
4451
+ {
4452
+ "epoch": 738.0,
4453
+ "learning_rate": 0.003048,
4454
+ "loss": 1.3673,
4455
+ "step": 9594
4456
+ },
4457
+ {
4458
+ "epoch": 739.0,
4459
+ "learning_rate": 0.0030440000000000003,
4460
+ "loss": 1.3951,
4461
+ "step": 9607
4462
+ },
4463
+ {
4464
+ "epoch": 740.0,
4465
+ "learning_rate": 0.00304,
4466
+ "loss": 1.3128,
4467
+ "step": 9620
4468
+ },
4469
+ {
4470
+ "epoch": 741.0,
4471
+ "learning_rate": 0.003036,
4472
+ "loss": 1.3117,
4473
+ "step": 9633
4474
+ },
4475
+ {
4476
+ "epoch": 742.0,
4477
+ "learning_rate": 0.003032,
4478
+ "loss": 1.2828,
4479
+ "step": 9646
4480
+ },
4481
+ {
4482
+ "epoch": 743.0,
4483
+ "learning_rate": 0.003028,
4484
+ "loss": 1.3054,
4485
+ "step": 9659
4486
+ },
4487
+ {
4488
+ "epoch": 744.0,
4489
+ "learning_rate": 0.003024,
4490
+ "loss": 1.289,
4491
+ "step": 9672
4492
+ },
4493
+ {
4494
+ "epoch": 745.0,
4495
+ "learning_rate": 0.00302,
4496
+ "loss": 1.3023,
4497
+ "step": 9685
4498
+ },
4499
+ {
4500
+ "epoch": 746.0,
4501
+ "learning_rate": 0.003016,
4502
+ "loss": 1.2972,
4503
+ "step": 9698
4504
+ },
4505
+ {
4506
+ "epoch": 747.0,
4507
+ "learning_rate": 0.003012,
4508
+ "loss": 1.281,
4509
+ "step": 9711
4510
+ },
4511
+ {
4512
+ "epoch": 748.0,
4513
+ "learning_rate": 0.0030080000000000003,
4514
+ "loss": 1.2475,
4515
+ "step": 9724
4516
+ },
4517
+ {
4518
+ "epoch": 749.0,
4519
+ "learning_rate": 0.003004,
4520
+ "loss": 1.2721,
4521
+ "step": 9737
4522
+ },
4523
+ {
4524
+ "epoch": 750.0,
4525
+ "learning_rate": 0.003,
4526
+ "loss": 1.3066,
4527
+ "step": 9750
4528
+ },
4529
+ {
4530
+ "epoch": 751.0,
4531
+ "learning_rate": 0.002996,
4532
+ "loss": 1.3229,
4533
+ "step": 9763
4534
+ },
4535
+ {
4536
+ "epoch": 752.0,
4537
+ "learning_rate": 0.002992,
4538
+ "loss": 1.2095,
4539
+ "step": 9776
4540
+ },
4541
+ {
4542
+ "epoch": 753.0,
4543
+ "learning_rate": 0.002988,
4544
+ "loss": 1.2389,
4545
+ "step": 9789
4546
+ },
4547
+ {
4548
+ "epoch": 754.0,
4549
+ "learning_rate": 0.002984,
4550
+ "loss": 1.2046,
4551
+ "step": 9802
4552
+ },
4553
+ {
4554
+ "epoch": 755.0,
4555
+ "learning_rate": 0.00298,
4556
+ "loss": 1.1953,
4557
+ "step": 9815
4558
+ },
4559
+ {
4560
+ "epoch": 756.0,
4561
+ "learning_rate": 0.002976,
4562
+ "loss": 1.1359,
4563
+ "step": 9828
4564
+ },
4565
+ {
4566
+ "epoch": 757.0,
4567
+ "learning_rate": 0.0029720000000000002,
4568
+ "loss": 1.13,
4569
+ "step": 9841
4570
+ },
4571
+ {
4572
+ "epoch": 758.0,
4573
+ "learning_rate": 0.002968,
4574
+ "loss": 1.1946,
4575
+ "step": 9854
4576
+ },
4577
+ {
4578
+ "epoch": 759.0,
4579
+ "learning_rate": 0.002964,
4580
+ "loss": 1.2325,
4581
+ "step": 9867
4582
+ },
4583
+ {
4584
+ "epoch": 760.0,
4585
+ "learning_rate": 0.00296,
4586
+ "loss": 1.2435,
4587
+ "step": 9880
4588
+ },
4589
+ {
4590
+ "epoch": 761.0,
4591
+ "learning_rate": 0.002956,
4592
+ "loss": 1.2878,
4593
+ "step": 9893
4594
+ },
4595
+ {
4596
+ "epoch": 762.0,
4597
+ "learning_rate": 0.002952,
4598
+ "loss": 1.2123,
4599
+ "step": 9906
4600
+ },
4601
+ {
4602
+ "epoch": 763.0,
4603
+ "learning_rate": 0.002948,
4604
+ "loss": 1.1953,
4605
+ "step": 9919
4606
+ },
4607
+ {
4608
+ "epoch": 764.0,
4609
+ "learning_rate": 0.002944,
4610
+ "loss": 1.2623,
4611
+ "step": 9932
4612
+ },
4613
+ {
4614
+ "epoch": 765.0,
4615
+ "learning_rate": 0.00294,
4616
+ "loss": 1.2676,
4617
+ "step": 9945
4618
+ },
4619
+ {
4620
+ "epoch": 766.0,
4621
+ "learning_rate": 0.002936,
4622
+ "loss": 1.1999,
4623
+ "step": 9958
4624
+ },
4625
+ {
4626
+ "epoch": 767.0,
4627
+ "learning_rate": 0.002932,
4628
+ "loss": 1.2521,
4629
+ "step": 9971
4630
+ },
4631
+ {
4632
+ "epoch": 768.0,
4633
+ "learning_rate": 0.002928,
4634
+ "loss": 1.4043,
4635
+ "step": 9984
4636
+ },
4637
+ {
4638
+ "epoch": 769.0,
4639
+ "learning_rate": 0.002924,
4640
+ "loss": 1.3043,
4641
+ "step": 9997
4642
+ },
4643
+ {
4644
+ "epoch": 770.0,
4645
+ "learning_rate": 0.00292,
4646
+ "loss": 1.1831,
4647
+ "step": 10010
4648
+ },
4649
+ {
4650
+ "epoch": 771.0,
4651
+ "learning_rate": 0.002916,
4652
+ "loss": 1.1813,
4653
+ "step": 10023
4654
+ },
4655
+ {
4656
+ "epoch": 772.0,
4657
+ "learning_rate": 0.002912,
4658
+ "loss": 1.1946,
4659
+ "step": 10036
4660
+ },
4661
+ {
4662
+ "epoch": 773.0,
4663
+ "learning_rate": 0.002908,
4664
+ "loss": 1.2182,
4665
+ "step": 10049
4666
+ },
4667
+ {
4668
+ "epoch": 774.0,
4669
+ "learning_rate": 0.002904,
4670
+ "loss": 1.2491,
4671
+ "step": 10062
4672
+ },
4673
+ {
4674
+ "epoch": 775.0,
4675
+ "learning_rate": 0.0029,
4676
+ "loss": 1.2422,
4677
+ "step": 10075
4678
+ },
4679
+ {
4680
+ "epoch": 776.0,
4681
+ "learning_rate": 0.002896,
4682
+ "loss": 1.2784,
4683
+ "step": 10088
4684
+ },
4685
+ {
4686
+ "epoch": 777.0,
4687
+ "learning_rate": 0.002892,
4688
+ "loss": 1.1924,
4689
+ "step": 10101
4690
+ },
4691
+ {
4692
+ "epoch": 778.0,
4693
+ "learning_rate": 0.002888,
4694
+ "loss": 1.1739,
4695
+ "step": 10114
4696
+ },
4697
+ {
4698
+ "epoch": 779.0,
4699
+ "learning_rate": 0.002884,
4700
+ "loss": 1.2357,
4701
+ "step": 10127
4702
+ },
4703
+ {
4704
+ "epoch": 780.0,
4705
+ "learning_rate": 0.0028799999999999997,
4706
+ "loss": 1.1845,
4707
+ "step": 10140
4708
+ },
4709
+ {
4710
+ "epoch": 781.0,
4711
+ "learning_rate": 0.002876,
4712
+ "loss": 1.1947,
4713
+ "step": 10153
4714
+ },
4715
+ {
4716
+ "epoch": 782.0,
4717
+ "learning_rate": 0.002872,
4718
+ "loss": 1.2135,
4719
+ "step": 10166
4720
+ },
4721
+ {
4722
+ "epoch": 783.0,
4723
+ "learning_rate": 0.002868,
4724
+ "loss": 1.1923,
4725
+ "step": 10179
4726
+ },
4727
+ {
4728
+ "epoch": 784.0,
4729
+ "learning_rate": 0.002864,
4730
+ "loss": 1.1954,
4731
+ "step": 10192
4732
+ },
4733
+ {
4734
+ "epoch": 785.0,
4735
+ "learning_rate": 0.00286,
4736
+ "loss": 1.2599,
4737
+ "step": 10205
4738
+ },
4739
+ {
4740
+ "epoch": 786.0,
4741
+ "learning_rate": 0.002856,
4742
+ "loss": 1.1905,
4743
+ "step": 10218
4744
+ },
4745
+ {
4746
+ "epoch": 787.0,
4747
+ "learning_rate": 0.002852,
4748
+ "loss": 1.1463,
4749
+ "step": 10231
4750
+ },
4751
+ {
4752
+ "epoch": 788.0,
4753
+ "learning_rate": 0.002848,
4754
+ "loss": 1.1417,
4755
+ "step": 10244
4756
+ },
4757
+ {
4758
+ "epoch": 789.0,
4759
+ "learning_rate": 0.0028439999999999997,
4760
+ "loss": 1.1696,
4761
+ "step": 10257
4762
+ },
4763
+ {
4764
+ "epoch": 790.0,
4765
+ "learning_rate": 0.00284,
4766
+ "loss": 1.1227,
4767
+ "step": 10270
4768
+ },
4769
+ {
4770
+ "epoch": 791.0,
4771
+ "learning_rate": 0.002836,
4772
+ "loss": 1.1858,
4773
+ "step": 10283
4774
+ },
4775
+ {
4776
+ "epoch": 792.0,
4777
+ "learning_rate": 0.002832,
4778
+ "loss": 1.1615,
4779
+ "step": 10296
4780
+ },
4781
+ {
4782
+ "epoch": 793.0,
4783
+ "learning_rate": 0.002828,
4784
+ "loss": 1.2113,
4785
+ "step": 10309
4786
+ },
4787
+ {
4788
+ "epoch": 794.0,
4789
+ "learning_rate": 0.0028239999999999997,
4790
+ "loss": 1.1995,
4791
+ "step": 10322
4792
+ },
4793
+ {
4794
+ "epoch": 795.0,
4795
+ "learning_rate": 0.00282,
4796
+ "loss": 1.2497,
4797
+ "step": 10335
4798
+ },
4799
+ {
4800
+ "epoch": 796.0,
4801
+ "learning_rate": 0.002816,
4802
+ "loss": 1.2255,
4803
+ "step": 10348
4804
+ },
4805
+ {
4806
+ "epoch": 797.0,
4807
+ "learning_rate": 0.002812,
4808
+ "loss": 1.2728,
4809
+ "step": 10361
4810
+ },
4811
+ {
4812
+ "epoch": 798.0,
4813
+ "learning_rate": 0.0028079999999999997,
4814
+ "loss": 1.2053,
4815
+ "step": 10374
4816
+ },
4817
+ {
4818
+ "epoch": 799.0,
4819
+ "learning_rate": 0.002804,
4820
+ "loss": 1.1941,
4821
+ "step": 10387
4822
+ },
4823
+ {
4824
+ "epoch": 800.0,
4825
+ "learning_rate": 0.0028,
4826
+ "loss": 1.184,
4827
+ "step": 10400
4828
+ },
4829
+ {
4830
+ "epoch": 801.0,
4831
+ "learning_rate": 0.002796,
4832
+ "loss": 1.1575,
4833
+ "step": 10413
4834
+ },
4835
+ {
4836
+ "epoch": 802.0,
4837
+ "learning_rate": 0.0027919999999999998,
4838
+ "loss": 1.1571,
4839
+ "step": 10426
4840
+ },
4841
+ {
4842
+ "epoch": 803.0,
4843
+ "learning_rate": 0.0027879999999999997,
4844
+ "loss": 1.1619,
4845
+ "step": 10439
4846
+ },
4847
+ {
4848
+ "epoch": 804.0,
4849
+ "learning_rate": 0.002784,
4850
+ "loss": 1.175,
4851
+ "step": 10452
4852
+ },
4853
+ {
4854
+ "epoch": 805.0,
4855
+ "learning_rate": 0.00278,
4856
+ "loss": 1.2,
4857
+ "step": 10465
4858
+ },
4859
+ {
4860
+ "epoch": 806.0,
4861
+ "learning_rate": 0.002776,
4862
+ "loss": 1.1761,
4863
+ "step": 10478
4864
+ },
4865
+ {
4866
+ "epoch": 807.0,
4867
+ "learning_rate": 0.0027719999999999997,
4868
+ "loss": 1.0651,
4869
+ "step": 10491
4870
+ },
4871
+ {
4872
+ "epoch": 808.0,
4873
+ "learning_rate": 0.002768,
4874
+ "loss": 1.0923,
4875
+ "step": 10504
4876
+ },
4877
+ {
4878
+ "epoch": 809.0,
4879
+ "learning_rate": 0.002764,
4880
+ "loss": 1.1537,
4881
+ "step": 10517
4882
+ },
4883
+ {
4884
+ "epoch": 810.0,
4885
+ "learning_rate": 0.00276,
4886
+ "loss": 1.1618,
4887
+ "step": 10530
4888
+ },
4889
+ {
4890
+ "epoch": 811.0,
4891
+ "learning_rate": 0.0027559999999999998,
4892
+ "loss": 1.2048,
4893
+ "step": 10543
4894
+ },
4895
+ {
4896
+ "epoch": 812.0,
4897
+ "learning_rate": 0.0027519999999999997,
4898
+ "loss": 1.1584,
4899
+ "step": 10556
4900
+ },
4901
+ {
4902
+ "epoch": 813.0,
4903
+ "learning_rate": 0.0027480000000000004,
4904
+ "loss": 1.1815,
4905
+ "step": 10569
4906
+ },
4907
+ {
4908
+ "epoch": 814.0,
4909
+ "learning_rate": 0.0027440000000000003,
4910
+ "loss": 1.1204,
4911
+ "step": 10582
4912
+ },
4913
+ {
4914
+ "epoch": 815.0,
4915
+ "learning_rate": 0.0027400000000000002,
4916
+ "loss": 1.1662,
4917
+ "step": 10595
4918
+ },
4919
+ {
4920
+ "epoch": 816.0,
4921
+ "learning_rate": 0.002736,
4922
+ "loss": 1.1275,
4923
+ "step": 10608
4924
+ },
4925
+ {
4926
+ "epoch": 817.0,
4927
+ "learning_rate": 0.002732,
4928
+ "loss": 1.1124,
4929
+ "step": 10621
4930
+ },
4931
+ {
4932
+ "epoch": 818.0,
4933
+ "learning_rate": 0.0027280000000000004,
4934
+ "loss": 1.0765,
4935
+ "step": 10634
4936
+ },
4937
+ {
4938
+ "epoch": 819.0,
4939
+ "learning_rate": 0.0027240000000000003,
4940
+ "loss": 1.1159,
4941
+ "step": 10647
4942
+ },
4943
+ {
4944
+ "epoch": 820.0,
4945
+ "learning_rate": 0.00272,
4946
+ "loss": 1.1124,
4947
+ "step": 10660
4948
+ },
4949
+ {
4950
+ "epoch": 821.0,
4951
+ "learning_rate": 0.002716,
4952
+ "loss": 1.1045,
4953
+ "step": 10673
4954
+ },
4955
+ {
4956
+ "epoch": 822.0,
4957
+ "learning_rate": 0.0027120000000000004,
4958
+ "loss": 1.1152,
4959
+ "step": 10686
4960
+ },
4961
+ {
4962
+ "epoch": 823.0,
4963
+ "learning_rate": 0.0027080000000000003,
4964
+ "loss": 1.0664,
4965
+ "step": 10699
4966
+ },
4967
+ {
4968
+ "epoch": 824.0,
4969
+ "learning_rate": 0.0027040000000000002,
4970
+ "loss": 1.0165,
4971
+ "step": 10712
4972
+ },
4973
+ {
4974
+ "epoch": 825.0,
4975
+ "learning_rate": 0.0027,
4976
+ "loss": 1.004,
4977
+ "step": 10725
4978
+ },
4979
+ {
4980
+ "epoch": 826.0,
4981
+ "learning_rate": 0.002696,
4982
+ "loss": 1.0194,
4983
+ "step": 10738
4984
+ },
4985
+ {
4986
+ "epoch": 827.0,
4987
+ "learning_rate": 0.0026920000000000004,
4988
+ "loss": 1.04,
4989
+ "step": 10751
4990
+ },
4991
+ {
4992
+ "epoch": 828.0,
4993
+ "learning_rate": 0.0026880000000000003,
4994
+ "loss": 1.0393,
4995
+ "step": 10764
4996
+ },
4997
+ {
4998
+ "epoch": 829.0,
4999
+ "learning_rate": 0.002684,
5000
+ "loss": 0.9552,
5001
+ "step": 10777
5002
+ },
5003
+ {
5004
+ "epoch": 830.0,
5005
+ "learning_rate": 0.00268,
5006
+ "loss": 0.9634,
5007
+ "step": 10790
5008
+ },
5009
+ {
5010
+ "epoch": 831.0,
5011
+ "learning_rate": 0.0026760000000000004,
5012
+ "loss": 0.9967,
5013
+ "step": 10803
5014
+ },
5015
+ {
5016
+ "epoch": 832.0,
5017
+ "learning_rate": 0.0026720000000000003,
5018
+ "loss": 0.994,
5019
+ "step": 10816
5020
+ },
5021
+ {
5022
+ "epoch": 833.0,
5023
+ "learning_rate": 0.0026680000000000002,
5024
+ "loss": 0.9906,
5025
+ "step": 10829
5026
+ },
5027
+ {
5028
+ "epoch": 834.0,
5029
+ "learning_rate": 0.002664,
5030
+ "loss": 1.0132,
5031
+ "step": 10842
5032
+ },
5033
+ {
5034
+ "epoch": 835.0,
5035
+ "learning_rate": 0.00266,
5036
+ "loss": 1.0088,
5037
+ "step": 10855
5038
+ },
5039
+ {
5040
+ "epoch": 836.0,
5041
+ "learning_rate": 0.0026560000000000004,
5042
+ "loss": 0.9914,
5043
+ "step": 10868
5044
+ },
5045
+ {
5046
+ "epoch": 837.0,
5047
+ "learning_rate": 0.0026520000000000003,
5048
+ "loss": 0.9947,
5049
+ "step": 10881
5050
+ },
5051
+ {
5052
+ "epoch": 838.0,
5053
+ "learning_rate": 0.002648,
5054
+ "loss": 1.0053,
5055
+ "step": 10894
5056
+ },
5057
+ {
5058
+ "epoch": 839.0,
5059
+ "learning_rate": 0.002644,
5060
+ "loss": 1.0138,
5061
+ "step": 10907
5062
+ },
5063
+ {
5064
+ "epoch": 840.0,
5065
+ "learning_rate": 0.00264,
5066
+ "loss": 0.9785,
5067
+ "step": 10920
5068
+ },
5069
+ {
5070
+ "epoch": 841.0,
5071
+ "learning_rate": 0.0026360000000000003,
5072
+ "loss": 1.0174,
5073
+ "step": 10933
5074
+ },
5075
+ {
5076
+ "epoch": 842.0,
5077
+ "learning_rate": 0.0026320000000000002,
5078
+ "loss": 0.9855,
5079
+ "step": 10946
5080
+ },
5081
+ {
5082
+ "epoch": 843.0,
5083
+ "learning_rate": 0.002628,
5084
+ "loss": 1.0005,
5085
+ "step": 10959
5086
+ },
5087
+ {
5088
+ "epoch": 844.0,
5089
+ "learning_rate": 0.002624,
5090
+ "loss": 0.9966,
5091
+ "step": 10972
5092
+ },
5093
+ {
5094
+ "epoch": 845.0,
5095
+ "learning_rate": 0.0026200000000000004,
5096
+ "loss": 0.9952,
5097
+ "step": 10985
5098
+ },
5099
+ {
5100
+ "epoch": 846.0,
5101
+ "learning_rate": 0.0026160000000000003,
5102
+ "loss": 1.001,
5103
+ "step": 10998
5104
+ },
5105
+ {
5106
+ "epoch": 847.0,
5107
+ "learning_rate": 0.002612,
5108
+ "loss": 1.0235,
5109
+ "step": 11011
5110
+ },
5111
+ {
5112
+ "epoch": 848.0,
5113
+ "learning_rate": 0.002608,
5114
+ "loss": 0.9835,
5115
+ "step": 11024
5116
+ },
5117
+ {
5118
+ "epoch": 849.0,
5119
+ "learning_rate": 0.002604,
5120
+ "loss": 0.9951,
5121
+ "step": 11037
5122
+ },
5123
+ {
5124
+ "epoch": 850.0,
5125
+ "learning_rate": 0.0026000000000000003,
5126
+ "loss": 1.0329,
5127
+ "step": 11050
5128
+ },
5129
+ {
5130
+ "epoch": 851.0,
5131
+ "learning_rate": 0.0025960000000000002,
5132
+ "loss": 1.0021,
5133
+ "step": 11063
5134
+ },
5135
+ {
5136
+ "epoch": 852.0,
5137
+ "learning_rate": 0.002592,
5138
+ "loss": 1.0391,
5139
+ "step": 11076
5140
+ },
5141
+ {
5142
+ "epoch": 853.0,
5143
+ "learning_rate": 0.002588,
5144
+ "loss": 1.0249,
5145
+ "step": 11089
5146
+ },
5147
+ {
5148
+ "epoch": 854.0,
5149
+ "learning_rate": 0.002584,
5150
+ "loss": 0.9974,
5151
+ "step": 11102
5152
+ },
5153
+ {
5154
+ "epoch": 855.0,
5155
+ "learning_rate": 0.0025800000000000003,
5156
+ "loss": 1.0149,
5157
+ "step": 11115
5158
+ },
5159
+ {
5160
+ "epoch": 856.0,
5161
+ "learning_rate": 0.002576,
5162
+ "loss": 1.0002,
5163
+ "step": 11128
5164
+ },
5165
+ {
5166
+ "epoch": 857.0,
5167
+ "learning_rate": 0.002572,
5168
+ "loss": 1.0379,
5169
+ "step": 11141
5170
+ },
5171
+ {
5172
+ "epoch": 858.0,
5173
+ "learning_rate": 0.002568,
5174
+ "loss": 1.0381,
5175
+ "step": 11154
5176
+ },
5177
+ {
5178
+ "epoch": 859.0,
5179
+ "learning_rate": 0.0025640000000000003,
5180
+ "loss": 0.9772,
5181
+ "step": 11167
5182
+ },
5183
+ {
5184
+ "epoch": 860.0,
5185
+ "learning_rate": 0.00256,
5186
+ "loss": 1.0263,
5187
+ "step": 11180
5188
+ },
5189
+ {
5190
+ "epoch": 861.0,
5191
+ "learning_rate": 0.002556,
5192
+ "loss": 0.982,
5193
+ "step": 11193
5194
+ },
5195
+ {
5196
+ "epoch": 862.0,
5197
+ "learning_rate": 0.002552,
5198
+ "loss": 0.9892,
5199
+ "step": 11206
5200
+ },
5201
+ {
5202
+ "epoch": 863.0,
5203
+ "learning_rate": 0.002548,
5204
+ "loss": 0.9708,
5205
+ "step": 11219
5206
+ },
5207
+ {
5208
+ "epoch": 864.0,
5209
+ "learning_rate": 0.0025440000000000003,
5210
+ "loss": 0.9883,
5211
+ "step": 11232
5212
+ },
5213
+ {
5214
+ "epoch": 865.0,
5215
+ "learning_rate": 0.00254,
5216
+ "loss": 0.9446,
5217
+ "step": 11245
5218
+ },
5219
+ {
5220
+ "epoch": 866.0,
5221
+ "learning_rate": 0.002536,
5222
+ "loss": 0.9686,
5223
+ "step": 11258
5224
+ },
5225
+ {
5226
+ "epoch": 867.0,
5227
+ "learning_rate": 0.002532,
5228
+ "loss": 1.0044,
5229
+ "step": 11271
5230
+ },
5231
+ {
5232
+ "epoch": 868.0,
5233
+ "learning_rate": 0.002528,
5234
+ "loss": 1.0128,
5235
+ "step": 11284
5236
+ },
5237
+ {
5238
+ "epoch": 869.0,
5239
+ "learning_rate": 0.002524,
5240
+ "loss": 0.9876,
5241
+ "step": 11297
5242
+ },
5243
+ {
5244
+ "epoch": 870.0,
5245
+ "learning_rate": 0.00252,
5246
+ "loss": 0.9992,
5247
+ "step": 11310
5248
+ },
5249
+ {
5250
+ "epoch": 871.0,
5251
+ "learning_rate": 0.002516,
5252
+ "loss": 1.1017,
5253
+ "step": 11323
5254
+ },
5255
+ {
5256
+ "epoch": 872.0,
5257
+ "learning_rate": 0.002512,
5258
+ "loss": 0.9853,
5259
+ "step": 11336
5260
+ },
5261
+ {
5262
+ "epoch": 873.0,
5263
+ "learning_rate": 0.0025080000000000002,
5264
+ "loss": 0.9495,
5265
+ "step": 11349
5266
+ },
5267
+ {
5268
+ "epoch": 874.0,
5269
+ "learning_rate": 0.002504,
5270
+ "loss": 0.9292,
5271
+ "step": 11362
5272
+ },
5273
+ {
5274
+ "epoch": 875.0,
5275
+ "learning_rate": 0.0025,
5276
+ "loss": 0.9339,
5277
+ "step": 11375
5278
+ },
5279
+ {
5280
+ "epoch": 876.0,
5281
+ "learning_rate": 0.002496,
5282
+ "loss": 0.9309,
5283
+ "step": 11388
5284
+ },
5285
+ {
5286
+ "epoch": 877.0,
5287
+ "learning_rate": 0.002492,
5288
+ "loss": 0.9303,
5289
+ "step": 11401
5290
+ },
5291
+ {
5292
+ "epoch": 878.0,
5293
+ "learning_rate": 0.002488,
5294
+ "loss": 0.8827,
5295
+ "step": 11414
5296
+ },
5297
+ {
5298
+ "epoch": 879.0,
5299
+ "learning_rate": 0.002484,
5300
+ "loss": 0.8898,
5301
+ "step": 11427
5302
+ },
5303
+ {
5304
+ "epoch": 880.0,
5305
+ "learning_rate": 0.00248,
5306
+ "loss": 0.8748,
5307
+ "step": 11440
5308
+ },
5309
+ {
5310
+ "epoch": 881.0,
5311
+ "learning_rate": 0.002476,
5312
+ "loss": 0.921,
5313
+ "step": 11453
5314
+ },
5315
+ {
5316
+ "epoch": 882.0,
5317
+ "learning_rate": 0.0024720000000000002,
5318
+ "loss": 0.912,
5319
+ "step": 11466
5320
+ },
5321
+ {
5322
+ "epoch": 883.0,
5323
+ "learning_rate": 0.002468,
5324
+ "loss": 0.9684,
5325
+ "step": 11479
5326
+ },
5327
+ {
5328
+ "epoch": 884.0,
5329
+ "learning_rate": 0.002464,
5330
+ "loss": 1.0113,
5331
+ "step": 11492
5332
+ },
5333
+ {
5334
+ "epoch": 885.0,
5335
+ "learning_rate": 0.00246,
5336
+ "loss": 1.0043,
5337
+ "step": 11505
5338
+ },
5339
+ {
5340
+ "epoch": 886.0,
5341
+ "learning_rate": 0.002456,
5342
+ "loss": 0.94,
5343
+ "step": 11518
5344
+ },
5345
+ {
5346
+ "epoch": 887.0,
5347
+ "learning_rate": 0.002452,
5348
+ "loss": 0.9166,
5349
+ "step": 11531
5350
+ },
5351
+ {
5352
+ "epoch": 888.0,
5353
+ "learning_rate": 0.002448,
5354
+ "loss": 0.9202,
5355
+ "step": 11544
5356
+ },
5357
+ {
5358
+ "epoch": 889.0,
5359
+ "learning_rate": 0.002444,
5360
+ "loss": 0.9179,
5361
+ "step": 11557
5362
+ },
5363
+ {
5364
+ "epoch": 890.0,
5365
+ "learning_rate": 0.00244,
5366
+ "loss": 0.8928,
5367
+ "step": 11570
5368
+ },
5369
+ {
5370
+ "epoch": 891.0,
5371
+ "learning_rate": 0.002436,
5372
+ "loss": 0.9021,
5373
+ "step": 11583
5374
+ },
5375
+ {
5376
+ "epoch": 892.0,
5377
+ "learning_rate": 0.002432,
5378
+ "loss": 0.9038,
5379
+ "step": 11596
5380
+ },
5381
+ {
5382
+ "epoch": 893.0,
5383
+ "learning_rate": 0.002428,
5384
+ "loss": 0.8446,
5385
+ "step": 11609
5386
+ },
5387
+ {
5388
+ "epoch": 894.0,
5389
+ "learning_rate": 0.002424,
5390
+ "loss": 0.9167,
5391
+ "step": 11622
5392
+ },
5393
+ {
5394
+ "epoch": 895.0,
5395
+ "learning_rate": 0.00242,
5396
+ "loss": 0.8897,
5397
+ "step": 11635
5398
+ },
5399
+ {
5400
+ "epoch": 896.0,
5401
+ "learning_rate": 0.002416,
5402
+ "loss": 0.9227,
5403
+ "step": 11648
5404
+ },
5405
+ {
5406
+ "epoch": 897.0,
5407
+ "learning_rate": 0.002412,
5408
+ "loss": 0.8956,
5409
+ "step": 11661
5410
+ },
5411
+ {
5412
+ "epoch": 898.0,
5413
+ "learning_rate": 0.002408,
5414
+ "loss": 0.8768,
5415
+ "step": 11674
5416
+ },
5417
+ {
5418
+ "epoch": 899.0,
5419
+ "learning_rate": 0.002404,
5420
+ "loss": 0.9134,
5421
+ "step": 11687
5422
+ },
5423
+ {
5424
+ "epoch": 900.0,
5425
+ "learning_rate": 0.0024,
5426
+ "loss": 0.8484,
5427
+ "step": 11700
5428
+ },
5429
+ {
5430
+ "epoch": 901.0,
5431
+ "learning_rate": 0.002396,
5432
+ "loss": 0.8616,
5433
+ "step": 11713
5434
+ },
5435
+ {
5436
+ "epoch": 902.0,
5437
+ "learning_rate": 0.002392,
5438
+ "loss": 0.8669,
5439
+ "step": 11726
5440
+ },
5441
+ {
5442
+ "epoch": 903.0,
5443
+ "learning_rate": 0.002388,
5444
+ "loss": 0.8529,
5445
+ "step": 11739
5446
+ },
5447
+ {
5448
+ "epoch": 904.0,
5449
+ "learning_rate": 0.002384,
5450
+ "loss": 0.8488,
5451
+ "step": 11752
5452
+ },
5453
+ {
5454
+ "epoch": 905.0,
5455
+ "learning_rate": 0.0023799999999999997,
5456
+ "loss": 0.8505,
5457
+ "step": 11765
5458
+ },
5459
+ {
5460
+ "epoch": 906.0,
5461
+ "learning_rate": 0.002376,
5462
+ "loss": 0.8264,
5463
+ "step": 11778
5464
+ },
5465
+ {
5466
+ "epoch": 907.0,
5467
+ "learning_rate": 0.002372,
5468
+ "loss": 0.8382,
5469
+ "step": 11791
5470
+ },
5471
+ {
5472
+ "epoch": 908.0,
5473
+ "learning_rate": 0.002368,
5474
+ "loss": 0.8176,
5475
+ "step": 11804
5476
+ },
5477
+ {
5478
+ "epoch": 909.0,
5479
+ "learning_rate": 0.0023639999999999998,
5480
+ "loss": 0.8122,
5481
+ "step": 11817
5482
+ },
5483
+ {
5484
+ "epoch": 910.0,
5485
+ "learning_rate": 0.00236,
5486
+ "loss": 0.8175,
5487
+ "step": 11830
5488
+ },
5489
+ {
5490
+ "epoch": 911.0,
5491
+ "learning_rate": 0.002356,
5492
+ "loss": 0.8345,
5493
+ "step": 11843
5494
+ },
5495
+ {
5496
+ "epoch": 912.0,
5497
+ "learning_rate": 0.002352,
5498
+ "loss": 0.8102,
5499
+ "step": 11856
5500
+ },
5501
+ {
5502
+ "epoch": 913.0,
5503
+ "learning_rate": 0.002348,
5504
+ "loss": 0.7818,
5505
+ "step": 11869
5506
+ },
5507
+ {
5508
+ "epoch": 914.0,
5509
+ "learning_rate": 0.0023439999999999997,
5510
+ "loss": 0.8027,
5511
+ "step": 11882
5512
+ },
5513
+ {
5514
+ "epoch": 915.0,
5515
+ "learning_rate": 0.00234,
5516
+ "loss": 0.7765,
5517
+ "step": 11895
5518
+ },
5519
+ {
5520
+ "epoch": 916.0,
5521
+ "learning_rate": 0.002336,
5522
+ "loss": 0.8225,
5523
+ "step": 11908
5524
+ },
5525
+ {
5526
+ "epoch": 917.0,
5527
+ "learning_rate": 0.002332,
5528
+ "loss": 0.7882,
5529
+ "step": 11921
5530
+ },
5531
+ {
5532
+ "epoch": 918.0,
5533
+ "learning_rate": 0.0023279999999999998,
5534
+ "loss": 0.7784,
5535
+ "step": 11934
5536
+ },
5537
+ {
5538
+ "epoch": 919.0,
5539
+ "learning_rate": 0.0023239999999999997,
5540
+ "loss": 0.7751,
5541
+ "step": 11947
5542
+ },
5543
+ {
5544
+ "epoch": 920.0,
5545
+ "learning_rate": 0.00232,
5546
+ "loss": 0.7837,
5547
+ "step": 11960
5548
+ },
5549
+ {
5550
+ "epoch": 921.0,
5551
+ "learning_rate": 0.002316,
5552
+ "loss": 0.7588,
5553
+ "step": 11973
5554
+ },
5555
+ {
5556
+ "epoch": 922.0,
5557
+ "learning_rate": 0.002312,
5558
+ "loss": 0.8106,
5559
+ "step": 11986
5560
+ },
5561
+ {
5562
+ "epoch": 923.0,
5563
+ "learning_rate": 0.0023079999999999997,
5564
+ "loss": 0.8359,
5565
+ "step": 11999
5566
+ },
5567
+ {
5568
+ "epoch": 924.0,
5569
+ "learning_rate": 0.002304,
5570
+ "loss": 0.7899,
5571
+ "step": 12012
5572
+ },
5573
+ {
5574
+ "epoch": 925.0,
5575
+ "learning_rate": 0.0023,
5576
+ "loss": 0.7766,
5577
+ "step": 12025
5578
+ },
5579
+ {
5580
+ "epoch": 926.0,
5581
+ "learning_rate": 0.002296,
5582
+ "loss": 0.7978,
5583
+ "step": 12038
5584
+ },
5585
+ {
5586
+ "epoch": 927.0,
5587
+ "learning_rate": 0.0022919999999999998,
5588
+ "loss": 0.8012,
5589
+ "step": 12051
5590
+ },
5591
+ {
5592
+ "epoch": 928.0,
5593
+ "learning_rate": 0.0022879999999999997,
5594
+ "loss": 0.8112,
5595
+ "step": 12064
5596
+ },
5597
+ {
5598
+ "epoch": 929.0,
5599
+ "learning_rate": 0.002284,
5600
+ "loss": 0.8725,
5601
+ "step": 12077
5602
+ },
5603
+ {
5604
+ "epoch": 930.0,
5605
+ "learning_rate": 0.00228,
5606
+ "loss": 0.8415,
5607
+ "step": 12090
5608
+ },
5609
+ {
5610
+ "epoch": 931.0,
5611
+ "learning_rate": 0.002276,
5612
+ "loss": 0.8444,
5613
+ "step": 12103
5614
+ },
5615
+ {
5616
+ "epoch": 932.0,
5617
+ "learning_rate": 0.0022719999999999997,
5618
+ "loss": 0.8459,
5619
+ "step": 12116
5620
+ },
5621
+ {
5622
+ "epoch": 933.0,
5623
+ "learning_rate": 0.002268,
5624
+ "loss": 0.7739,
5625
+ "step": 12129
5626
+ },
5627
+ {
5628
+ "epoch": 934.0,
5629
+ "learning_rate": 0.002264,
5630
+ "loss": 0.8236,
5631
+ "step": 12142
5632
+ },
5633
+ {
5634
+ "epoch": 935.0,
5635
+ "learning_rate": 0.00226,
5636
+ "loss": 0.7746,
5637
+ "step": 12155
5638
+ },
5639
+ {
5640
+ "epoch": 936.0,
5641
+ "learning_rate": 0.0022559999999999998,
5642
+ "loss": 0.807,
5643
+ "step": 12168
5644
+ },
5645
+ {
5646
+ "epoch": 937.0,
5647
+ "learning_rate": 0.0022519999999999997,
5648
+ "loss": 0.8016,
5649
+ "step": 12181
5650
+ },
5651
+ {
5652
+ "epoch": 938.0,
5653
+ "learning_rate": 0.0022480000000000004,
5654
+ "loss": 0.7812,
5655
+ "step": 12194
5656
+ },
5657
+ {
5658
+ "epoch": 939.0,
5659
+ "learning_rate": 0.0022440000000000003,
5660
+ "loss": 0.7796,
5661
+ "step": 12207
5662
+ },
5663
+ {
5664
+ "epoch": 940.0,
5665
+ "learning_rate": 0.0022400000000000002,
5666
+ "loss": 0.7743,
5667
+ "step": 12220
5668
+ },
5669
+ {
5670
+ "epoch": 941.0,
5671
+ "learning_rate": 0.002236,
5672
+ "loss": 0.8141,
5673
+ "step": 12233
5674
+ },
5675
+ {
5676
+ "epoch": 942.0,
5677
+ "learning_rate": 0.002232,
5678
+ "loss": 0.7666,
5679
+ "step": 12246
5680
+ },
5681
+ {
5682
+ "epoch": 943.0,
5683
+ "learning_rate": 0.0022280000000000004,
5684
+ "loss": 0.7668,
5685
+ "step": 12259
5686
+ },
5687
+ {
5688
+ "epoch": 944.0,
5689
+ "learning_rate": 0.0022240000000000003,
5690
+ "loss": 0.7469,
5691
+ "step": 12272
5692
+ },
5693
+ {
5694
+ "epoch": 945.0,
5695
+ "learning_rate": 0.00222,
5696
+ "loss": 0.8032,
5697
+ "step": 12285
5698
+ },
5699
+ {
5700
+ "epoch": 946.0,
5701
+ "learning_rate": 0.002216,
5702
+ "loss": 0.767,
5703
+ "step": 12298
5704
+ },
5705
+ {
5706
+ "epoch": 947.0,
5707
+ "learning_rate": 0.0022120000000000004,
5708
+ "loss": 0.7862,
5709
+ "step": 12311
5710
+ },
5711
+ {
5712
+ "epoch": 948.0,
5713
+ "learning_rate": 0.0022080000000000003,
5714
+ "loss": 0.762,
5715
+ "step": 12324
5716
+ },
5717
+ {
5718
+ "epoch": 949.0,
5719
+ "learning_rate": 0.0022040000000000002,
5720
+ "loss": 0.762,
5721
+ "step": 12337
5722
+ },
5723
+ {
5724
+ "epoch": 950.0,
5725
+ "learning_rate": 0.0022,
5726
+ "loss": 0.7546,
5727
+ "step": 12350
5728
+ },
5729
+ {
5730
+ "epoch": 951.0,
5731
+ "learning_rate": 0.002196,
5732
+ "loss": 0.721,
5733
+ "step": 12363
5734
+ },
5735
+ {
5736
+ "epoch": 952.0,
5737
+ "learning_rate": 0.0021920000000000004,
5738
+ "loss": 0.7442,
5739
+ "step": 12376
5740
+ },
5741
+ {
5742
+ "epoch": 953.0,
5743
+ "learning_rate": 0.0021880000000000003,
5744
+ "loss": 0.7331,
5745
+ "step": 12389
5746
+ },
5747
+ {
5748
+ "epoch": 954.0,
5749
+ "learning_rate": 0.002184,
5750
+ "loss": 0.7299,
5751
+ "step": 12402
5752
+ },
5753
+ {
5754
+ "epoch": 955.0,
5755
+ "learning_rate": 0.00218,
5756
+ "loss": 0.7114,
5757
+ "step": 12415
5758
+ },
5759
+ {
5760
+ "epoch": 956.0,
5761
+ "learning_rate": 0.0021760000000000004,
5762
+ "loss": 0.7443,
5763
+ "step": 12428
5764
+ },
5765
+ {
5766
+ "epoch": 957.0,
5767
+ "learning_rate": 0.0021720000000000003,
5768
+ "loss": 0.7247,
5769
+ "step": 12441
5770
+ },
5771
+ {
5772
+ "epoch": 958.0,
5773
+ "learning_rate": 0.0021680000000000002,
5774
+ "loss": 0.6941,
5775
+ "step": 12454
5776
+ },
5777
+ {
5778
+ "epoch": 959.0,
5779
+ "learning_rate": 0.002164,
5780
+ "loss": 0.6838,
5781
+ "step": 12467
5782
+ },
5783
+ {
5784
+ "epoch": 960.0,
5785
+ "learning_rate": 0.00216,
5786
+ "loss": 0.6838,
5787
+ "step": 12480
5788
+ },
5789
+ {
5790
+ "epoch": 961.0,
5791
+ "learning_rate": 0.0021560000000000004,
5792
+ "loss": 0.7048,
5793
+ "step": 12493
5794
+ },
5795
+ {
5796
+ "epoch": 962.0,
5797
+ "learning_rate": 0.0021520000000000003,
5798
+ "loss": 0.7083,
5799
+ "step": 12506
5800
+ },
5801
+ {
5802
+ "epoch": 963.0,
5803
+ "learning_rate": 0.002148,
5804
+ "loss": 0.7166,
5805
+ "step": 12519
5806
+ },
5807
+ {
5808
+ "epoch": 964.0,
5809
+ "learning_rate": 0.002144,
5810
+ "loss": 0.7128,
5811
+ "step": 12532
5812
+ },
5813
+ {
5814
+ "epoch": 965.0,
5815
+ "learning_rate": 0.00214,
5816
+ "loss": 0.7257,
5817
+ "step": 12545
5818
+ },
5819
+ {
5820
+ "epoch": 966.0,
5821
+ "learning_rate": 0.0021360000000000003,
5822
+ "loss": 0.7145,
5823
+ "step": 12558
5824
+ },
5825
+ {
5826
+ "epoch": 967.0,
5827
+ "learning_rate": 0.002132,
5828
+ "loss": 0.7173,
5829
+ "step": 12571
5830
+ },
5831
+ {
5832
+ "epoch": 968.0,
5833
+ "learning_rate": 0.002128,
5834
+ "loss": 0.7162,
5835
+ "step": 12584
5836
+ },
5837
+ {
5838
+ "epoch": 969.0,
5839
+ "learning_rate": 0.002124,
5840
+ "loss": 0.6849,
5841
+ "step": 12597
5842
+ },
5843
+ {
5844
+ "epoch": 970.0,
5845
+ "learning_rate": 0.0021200000000000004,
5846
+ "loss": 0.6859,
5847
+ "step": 12610
5848
+ },
5849
+ {
5850
+ "epoch": 971.0,
5851
+ "learning_rate": 0.0021160000000000003,
5852
+ "loss": 0.6802,
5853
+ "step": 12623
5854
+ },
5855
+ {
5856
+ "epoch": 972.0,
5857
+ "learning_rate": 0.002112,
5858
+ "loss": 0.6965,
5859
+ "step": 12636
5860
+ },
5861
+ {
5862
+ "epoch": 973.0,
5863
+ "learning_rate": 0.002108,
5864
+ "loss": 0.6941,
5865
+ "step": 12649
5866
+ },
5867
+ {
5868
+ "epoch": 974.0,
5869
+ "learning_rate": 0.002104,
5870
+ "loss": 0.6928,
5871
+ "step": 12662
5872
+ },
5873
+ {
5874
+ "epoch": 975.0,
5875
+ "learning_rate": 0.0021000000000000003,
5876
+ "loss": 0.6764,
5877
+ "step": 12675
5878
+ },
5879
+ {
5880
+ "epoch": 976.0,
5881
+ "learning_rate": 0.002096,
5882
+ "loss": 0.6569,
5883
+ "step": 12688
5884
+ },
5885
+ {
5886
+ "epoch": 977.0,
5887
+ "learning_rate": 0.002092,
5888
+ "loss": 0.6618,
5889
+ "step": 12701
5890
+ },
5891
+ {
5892
+ "epoch": 978.0,
5893
+ "learning_rate": 0.002088,
5894
+ "loss": 0.6719,
5895
+ "step": 12714
5896
+ },
5897
+ {
5898
+ "epoch": 979.0,
5899
+ "learning_rate": 0.002084,
5900
+ "loss": 0.6584,
5901
+ "step": 12727
5902
+ },
5903
+ {
5904
+ "epoch": 980.0,
5905
+ "learning_rate": 0.0020800000000000003,
5906
+ "loss": 0.6911,
5907
+ "step": 12740
5908
+ },
5909
+ {
5910
+ "epoch": 981.0,
5911
+ "learning_rate": 0.002076,
5912
+ "loss": 0.688,
5913
+ "step": 12753
5914
+ },
5915
+ {
5916
+ "epoch": 982.0,
5917
+ "learning_rate": 0.002072,
5918
+ "loss": 0.6741,
5919
+ "step": 12766
5920
+ },
5921
+ {
5922
+ "epoch": 983.0,
5923
+ "learning_rate": 0.002068,
5924
+ "loss": 0.6962,
5925
+ "step": 12779
5926
+ },
5927
+ {
5928
+ "epoch": 984.0,
5929
+ "learning_rate": 0.0020640000000000003,
5930
+ "loss": 0.6811,
5931
+ "step": 12792
5932
+ },
5933
+ {
5934
+ "epoch": 985.0,
5935
+ "learning_rate": 0.00206,
5936
+ "loss": 0.6717,
5937
+ "step": 12805
5938
+ },
5939
+ {
5940
+ "epoch": 986.0,
5941
+ "learning_rate": 0.002056,
5942
+ "loss": 0.6733,
5943
+ "step": 12818
5944
+ },
5945
+ {
5946
+ "epoch": 987.0,
5947
+ "learning_rate": 0.002052,
5948
+ "loss": 0.6813,
5949
+ "step": 12831
5950
+ },
5951
+ {
5952
+ "epoch": 988.0,
5953
+ "learning_rate": 0.002048,
5954
+ "loss": 0.6472,
5955
+ "step": 12844
5956
+ },
5957
+ {
5958
+ "epoch": 989.0,
5959
+ "learning_rate": 0.0020440000000000002,
5960
+ "loss": 0.6508,
5961
+ "step": 12857
5962
+ },
5963
+ {
5964
+ "epoch": 990.0,
5965
+ "learning_rate": 0.00204,
5966
+ "loss": 0.6576,
5967
+ "step": 12870
5968
+ },
5969
+ {
5970
+ "epoch": 991.0,
5971
+ "learning_rate": 0.002036,
5972
+ "loss": 0.6428,
5973
+ "step": 12883
5974
+ },
5975
+ {
5976
+ "epoch": 992.0,
5977
+ "learning_rate": 0.002032,
5978
+ "loss": 0.6505,
5979
+ "step": 12896
5980
+ },
5981
+ {
5982
+ "epoch": 993.0,
5983
+ "learning_rate": 0.002028,
5984
+ "loss": 0.6578,
5985
+ "step": 12909
5986
+ },
5987
+ {
5988
+ "epoch": 994.0,
5989
+ "learning_rate": 0.002024,
5990
+ "loss": 0.6689,
5991
+ "step": 12922
5992
+ },
5993
+ {
5994
+ "epoch": 995.0,
5995
+ "learning_rate": 0.00202,
5996
+ "loss": 0.6625,
5997
+ "step": 12935
5998
+ },
5999
+ {
6000
+ "epoch": 996.0,
6001
+ "learning_rate": 0.002016,
6002
+ "loss": 0.6894,
6003
+ "step": 12948
6004
+ },
6005
+ {
6006
+ "epoch": 997.0,
6007
+ "learning_rate": 0.002012,
6008
+ "loss": 0.6669,
6009
+ "step": 12961
6010
+ },
6011
+ {
6012
+ "epoch": 998.0,
6013
+ "learning_rate": 0.0020080000000000002,
6014
+ "loss": 0.6698,
6015
+ "step": 12974
6016
+ },
6017
+ {
6018
+ "epoch": 999.0,
6019
+ "learning_rate": 0.002004,
6020
+ "loss": 0.6861,
6021
+ "step": 12987
6022
+ },
6023
+ {
6024
+ "epoch": 1000.0,
6025
+ "learning_rate": 0.002,
6026
+ "loss": 0.7089,
6027
+ "step": 13000
6028
+ },
6029
+ {
6030
+ "epoch": 1000.0,
6031
+ "step": 13000,
6032
+ "total_flos": 569573692465152.0,
6033
+ "train_loss": 0.655489089525663,
6034
+ "train_runtime": 67792.7328,
6035
+ "train_samples_per_second": 1.475,
6036
+ "train_steps_per_second": 0.192
6037
  }
6038
  ],
6039
  "logging_steps": 500,
6040
+ "max_steps": 13000,
6041
+ "num_train_epochs": 1000,
6042
  "save_steps": 500,
6043
+ "total_flos": 569573692465152.0,
6044
  "trial_name": null,
6045
  "trial_params": null
6046
  }