metadata

license: mit
tags:
  - generated_from_keras_callback
model-index:
  - name: tf-tpu/roberta-base-epochs-500-no-wd
    results: []

tf-tpu/roberta-base-epochs-500-no-wd

This model is a fine-tuned version of roberta-base on an unknown dataset. It achieves the following results on the evaluation set:

Train Loss: 1.0984
Train Accuracy: 0.1121
Validation Loss: 1.0366
Validation Accuracy: 0.1139
Epoch: 67

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 0.0001, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 0.0001, 'decay_steps': 278825, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, 'passive_serialization': True}, 'warmup_steps': 14675, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.001}
training_precision: mixed_bfloat16

Training results

Train Loss	Train Accuracy	Validation Loss	Validation Accuracy	Epoch
8.3284	0.0211	7.1523	0.0266	0
6.3670	0.0318	5.7812	0.0342	1
5.6051	0.0380	5.4414	0.0420	2
5.3602	0.0433	5.2734	0.0432	3
5.2285	0.0444	5.1562	0.0442	4
5.1371	0.0446	5.1133	0.0436	5
5.0673	0.0446	5.0703	0.0442	6
5.0132	0.0447	4.9883	0.0442	7
4.9642	0.0448	4.9219	0.0441	8
4.9217	0.0448	4.9258	0.0440	9
4.8871	0.0448	4.8867	0.0439	10
4.8548	0.0449	4.8672	0.0439	11
4.8277	0.0449	4.8047	0.0445	12
4.8033	0.0449	4.8477	0.0437	13
4.7807	0.0449	4.7617	0.0439	14
4.7592	0.0449	4.7773	0.0437	15
4.7388	0.0449	4.7539	0.0441	16
4.7225	0.0449	4.7266	0.0439	17
4.7052	0.0449	4.6914	0.0450	18
4.6917	0.0449	4.7188	0.0444	19
4.6789	0.0449	4.6914	0.0444	20
4.6689	0.0449	4.7031	0.0439	21
4.6570	0.0449	4.7031	0.0437	22
4.6486	0.0450	4.6758	0.0446	23
4.6393	0.0449	4.6914	0.0441	24
4.5898	0.0449	4.4688	0.0452	25
4.3024	0.0472	3.8730	0.0551	26
3.1689	0.0693	2.4375	0.0835	27
2.3780	0.0844	2.0498	0.0922	28
2.0789	0.0907	1.8604	0.0958	29
1.9204	0.0940	1.7549	0.0982	30
1.8162	0.0961	1.6836	0.0983	31
1.7370	0.0978	1.5869	0.1014	32
1.6723	0.0991	1.5381	0.1029	33
1.6215	0.1002	1.5283	0.1015	34
1.5753	0.1012	1.4736	0.1037	35
1.5295	0.1022	1.4238	0.1052	36
1.4944	0.1030	1.4141	0.1059	37
1.4631	0.1037	1.3721	0.1053	38
1.4363	0.1043	1.3467	0.1060	39
1.4098	0.1049	1.3213	0.1076	40
1.3867	0.1054	1.3018	0.1071	41
1.3658	0.1058	1.2832	0.1083	42
1.3469	0.1063	1.2637	0.1081	43
1.3288	0.1067	1.2598	0.1082	44
1.3111	0.1071	1.2334	0.1096	45
1.2962	0.1075	1.2490	0.1084	46
1.2816	0.1078	1.2168	0.1093	47
1.2672	0.1081	1.2070	0.1090	48
1.2537	0.1084	1.1680	0.1106	49
1.2411	0.1087	1.1904	0.1094	50
1.2285	0.1090	1.1709	0.1103	51
1.2180	0.1093	1.1602	0.1122	52
1.2075	0.1095	1.1396	0.1117	53
1.1973	0.1098	1.1191	0.1124	54
1.1876	0.1100	1.1260	0.1123	55
1.1782	0.1102	1.1289	0.1111	56
1.1698	0.1104	1.1211	0.1117	57
1.1596	0.1106	1.0977	0.1125	58
1.1530	0.1108	1.1172	0.1118	59
1.1462	0.1110	1.0703	0.1126	60
1.1370	0.1112	1.0830	0.1140	61
1.1309	0.1113	1.0762	0.1119	62
1.1234	0.1115	1.0625	0.1137	63
1.1162	0.1117	1.0781	0.1127	64
1.1114	0.1118	1.0474	0.1138	65
1.1036	0.1120	1.0703	0.1134	66
1.0984	0.1121	1.0366	0.1139	67

Framework versions

Transformers 4.27.0.dev0
TensorFlow 2.9.1
Tokenizers 0.13.2