metadata

license: mit
tags:
  - generated_from_keras_callback
model-index:
  - name: tf-tpu/roberta-base-epochs-500-no-wd
    results: []

tf-tpu/roberta-base-epochs-500-no-wd

This model is a fine-tuned version of roberta-base on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 0.0001, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 0.0001, 'decay_steps': 278825, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 14675, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.001}
training_precision: mixed_bfloat16

Train Loss	Train Accuracy	Validation Loss	Validation Accuracy	Epoch
8.3284	0.0211	7.1523	0.0266	0
6.3670	0.0318	5.7812	0.0342	1
5.6051	0.0380	5.4414	0.0420	2
5.3602	0.0433	5.2734	0.0432	3
5.2285	0.0444	5.1562	0.0442	4
5.1371	0.0446	5.1133	0.0436	5
5.0673	0.0446	5.0703	0.0442	6
5.0132	0.0447	4.9883	0.0442	7
4.9642	0.0448	4.9219	0.0441	8
4.9217	0.0448	4.9258	0.0440	9
4.8871	0.0448	4.8867	0.0439	10
4.8548	0.0449	4.8672	0.0439	11
4.8277	0.0449	4.8047	0.0445	12
4.8033	0.0449	4.8477	0.0437	13
4.7807	0.0449	4.7617	0.0439	14
4.7592	0.0449	4.7773	0.0437	15
4.7388	0.0449	4.7539	0.0441	16
4.7225	0.0449	4.7266	0.0439	17
4.7052	0.0449	4.6914	0.0450	18
4.6917	0.0449	4.7188	0.0444	19
4.6789	0.0449	4.6914	0.0444	20
4.6689	0.0449	4.7031	0.0439	21
4.6570	0.0449	4.7031	0.0437	22
4.6486	0.0450	4.6758	0.0446	23
4.6393	0.0449	4.6914	0.0441	24
4.5898	0.0449	4.4688	0.0452	25
4.3024	0.0472	3.8730	0.0551	26
3.1689	0.0693	2.4375	0.0835	27
2.3780	0.0844	2.0498	0.0922	28
2.0789	0.0907	1.8604	0.0958	29