metadata

license: mit
base_model: roberta-base
tags:
  - generated_from_keras_callback
model-index:
  - name: Ryukijano/masked-lm-tpu
    results: []

Ryukijano/masked-lm-tpu

This model is a fine-tuned version of roberta-base on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 0.0001, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 0.0001, 'decay_steps': 111625, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 5875, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.001}
training_precision: float32

Train Loss	Train Accuracy	Validation Loss	Validation Accuracy	Epoch
10.2437	0.0000	10.1909	0.0000	0
10.1151	0.0001	9.9763	0.0016	1
9.8665	0.0107	9.6535	0.0215	2
9.5331	0.0230	9.2992	0.0223	3
9.2000	0.0231	8.9944	0.0222	4
8.9195	0.0229	8.7450	0.0224	5
8.6997	0.0231	8.6124	0.0219	6
8.5689	0.0229	8.4904	0.0222	7
8.4525	0.0230	8.3865	0.0223	8
8.3594	0.0230	8.3069	0.0221	9
8.2662	0.0231	8.2092	0.0224	10
8.1956	0.0231	8.1208	0.0222	11
8.1285	0.0229	8.0806	0.0219	12
8.0345	0.0234	8.0030	0.0220	13
7.9960	0.0228	7.9144	0.0224	14
7.9065	0.0231	7.8661	0.0221	15
7.8449	0.0229	7.7873	0.0219	16
7.7673	0.0232	7.6903	0.0229	17
7.6868	0.0242	7.6129	0.0243	18
7.6206	0.0250	7.5579	0.0246	19
7.5231	0.0258	7.4564	0.0254	20
7.4589	0.0262	7.4136	0.0255	21
7.3658	0.0269	7.2941	0.0265	22