metadata

license: mit
base_model: roberta-base
tags:
  - generated_from_keras_callback
model-index:
  - name: Ryukijano/masked-lm-tpu
    results: []

Ryukijano/masked-lm-tpu

This model is a fine-tuned version of roberta-base on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 0.0001, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 0.0001, 'decay_steps': 111625, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 5875, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.001}
training_precision: float32

Train Loss	Train Accuracy	Validation Loss	Validation Accuracy	Epoch
10.2437	0.0000	10.1909	0.0000	0
10.1151	0.0001	9.9763	0.0016	1
9.8665	0.0107	9.6535	0.0215	2
9.5331	0.0230	9.2992	0.0223	3
9.2000	0.0231	8.9944	0.0222	4
8.9195	0.0229	8.7450	0.0224	5
8.6997	0.0231	8.6124	0.0219	6
8.5689	0.0229	8.4904	0.0222	7
8.4525	0.0230	8.3865	0.0223	8
8.3594	0.0230	8.3069	0.0221	9
8.2662	0.0231	8.2092	0.0224	10
8.1956	0.0231	8.1208	0.0222	11
8.1285	0.0229	8.0806	0.0219	12
8.0345	0.0234	8.0030	0.0220	13
7.9960	0.0228	7.9144	0.0224	14
7.9065	0.0231	7.8661	0.0221	15