metadata

license: mit
tags:
  - generated_from_keras_callback
model-index:
  - name: Zemulax/masked-lm-tpu
    results: []

Zemulax/masked-lm-tpu

This model is a fine-tuned version of roberta-base on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 0.0001, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 0.0001, 'decay_steps': 223250, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 11750, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.001}
training_precision: float32

Train Loss	Train Accuracy	Validation Loss	Validation Accuracy	Epoch
10.2868	0.0	10.2891	0.0	0
10.2817	0.0000	10.2764	0.0	1
10.2772	0.0000	10.2667	0.0000	2
10.2604	0.0000	10.2521	0.0	3
10.2421	0.0000	10.2282	0.0000	4
10.2219	0.0	10.2010	0.0	5
10.1957	0.0	10.1669	0.0	6
10.1667	0.0000	10.1388	0.0000	7
10.1278	0.0000	10.0908	0.0000	8
10.0848	0.0000	10.0405	0.0001	9
10.0496	0.0002	9.9921	0.0007	10
9.9940	0.0010	9.9422	0.0039	11
9.9424	0.0035	9.8765	0.0110	12
9.8826	0.0092	9.8156	0.0182	13
9.8225	0.0155	9.7461	0.0209	14
9.7670	0.0201	9.6768	0.0222	15
9.7065	0.0219	9.6127	0.0222	16
9.6352	0.0227	9.5445	0.0220	17
9.5757	0.0226	9.4795	0.0219	18
9.4894	0.0232	9.3985	0.0222	19
9.4277	0.0234	9.3386	0.0222	20
9.3676	0.0229	9.2753	0.0220	21
9.2980	0.0229	9.2170	0.0219	22
9.2361	0.0233	9.1518	0.0219	23
9.1515	0.0236	9.0827	0.0223	24
9.1171	0.0228	9.0406	0.0218	25
9.0447	0.0234	8.9867	0.0218	26
9.0119	0.0229	8.9307	0.0221	27
8.9625	0.0229	8.8969	0.0221	28