metadata

license: mit
tags:
  - generated_from_keras_callback
model-index:
  - name: tf-tpu/roberta-base-epochs-500-no-wd
    results: []

tf-tpu/roberta-base-epochs-500-no-wd

This model is a fine-tuned version of roberta-base on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 0.0001, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 0.0001, 'decay_steps': 278825, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 14675, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.001}
training_precision: mixed_bfloat16

Train Loss	Train Accuracy	Validation Loss	Validation Accuracy	Epoch
8.3284	0.0211	7.1523	0.0266	0
6.3670	0.0318	5.7812	0.0342	1
5.6051	0.0380	5.4414	0.0420	2
5.3602	0.0433	5.2734	0.0432	3
5.2285	0.0444	5.1562	0.0442	4
5.1371	0.0446	5.1133	0.0436	5
5.0673	0.0446	5.0703	0.0442	6
5.0132	0.0447	4.9883	0.0442	7
4.9642	0.0448	4.9219	0.0441	8
4.9217	0.0448	4.9258	0.0440	9
4.8871	0.0448	4.8867	0.0439	10
4.8548	0.0449	4.8672	0.0439	11
4.8277	0.0449	4.8047	0.0445	12
4.8033	0.0449	4.8477	0.0437	13
4.7807	0.0449	4.7617	0.0439	14
4.7592	0.0449	4.7773	0.0437	15
4.7388	0.0449	4.7539	0.0441	16
4.7225	0.0449	4.7266	0.0439	17
4.7052	0.0449	4.6914	0.0450	18
4.6917	0.0449	4.7188	0.0444	19
4.6789	0.0449	4.6914	0.0444	20
4.6689	0.0449	4.7031	0.0439	21
4.6570	0.0449	4.7031	0.0437	22
4.6486	0.0450	4.6758	0.0446	23