distilbert_add_pre-training-complete

This model is a fine-tuned version of distilbert-base-uncased on the wikitext wikitext-103-raw-v1 dataset. It achieves the following results on the evaluation set:

Loss: 5.0239
Accuracy: 0.2307

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 64
eval_batch_size: 64
seed: 10
distributed_type: multi-GPU
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
training_steps: 300000
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
6.295	1.0	3573	6.0701	0.1522
6.0482	2.0	7146	5.9533	0.1565
5.9799	3.0	10719	5.9008	0.1584
5.9378	4.0	14292	5.8997	0.1545
5.9057	5.0	17865	5.8905	0.1536
5.8811	6.0	21438	5.8646	0.1550
5.8617	7.0	25011	5.8322	0.1534
5.844	8.0	28584	5.8563	0.1523
5.8297	9.0	32157	5.8352	0.1548
5.8175	10.0	35730	5.8136	0.1558
5.8056	11.0	39303	5.8147	0.1526
5.7921	12.0	42876	5.8020	0.1548
5.7777	13.0	46449	5.7891	0.1545
5.7596	14.0	50022	5.7370	0.1587
5.7414	15.0	53595	5.7396	0.1604
5.7243	16.0	57168	5.7490	0.1564
5.6997	17.0	60741	5.7135	0.1561
5.6698	18.0	64314	5.6858	0.1620
5.6398	19.0	67887	5.6735	0.1644
5.6135	20.0	71460	5.6174	0.1681
5.5899	21.0	75033	5.6191	0.1684
5.5699	22.0	78606	5.5977	0.1669
5.5487	23.0	82179	5.6139	0.1669
5.529	24.0	85752	5.5272	0.1741
5.512	25.0	89325	5.5271	0.1727
5.4939	26.0	92898	5.5190	0.1721
5.4765	27.0	96471	5.4824	0.1770
5.4604	28.0	100044	5.5159	0.1747
5.4422	29.0	103617	5.4577	0.1807
5.4243	30.0	107190	5.4546	0.1772
5.408	31.0	110763	5.4297	0.1837
5.3915	32.0	114336	5.4089	0.1866
5.3766	33.0	117909	5.3996	0.1848
5.3594	34.0	121482	5.3974	0.1841
5.3451	35.0	125055	5.3718	0.1908
5.3294	36.0	128628	5.3706	0.1878
5.3155	37.0	132201	5.3677	0.1903
5.2996	38.0	135774	5.2970	0.1994
5.287	39.0	139347	5.3127	0.1977
5.2735	40.0	142920	5.3145	0.1955
5.26	41.0	146493	5.2985	0.2017
5.2487	42.0	150066	5.2661	0.2025
5.2362	43.0	153639	5.2712	0.2031
5.2248	44.0	157212	5.2452	0.2049
5.2115	45.0	160785	5.2325	0.2054
5.1998	46.0	164358	5.2233	0.2075
5.188	47.0	167931	5.1994	0.2118
5.1779	48.0	171504	5.2436	0.2069
5.1664	49.0	175077	5.2203	0.2129
5.1546	50.0	178650	5.1820	0.2134
5.1431	51.0	182223	5.2029	0.2122
5.133	52.0	185796	5.1458	0.2140
5.1226	53.0	189369	5.1757	0.2163
5.1138	54.0	192942	5.1380	0.2193
5.1046	55.0	196515	5.1498	0.2178
5.0984	56.0	200088	5.1094	0.2194
5.0907	57.0	203661	5.1354	0.2202
5.0812	58.0	207234	5.0662	0.2256
5.0748	59.0	210807	5.1163	0.2181
5.067	60.0	214380	5.1193	0.2199
5.0609	61.0	217953	5.0919	0.2224
5.0536	62.0	221526	5.0899	0.2239
5.0491	63.0	225099	5.1125	0.2224
5.0433	64.0	228672	5.0892	0.2226
5.0373	65.0	232245	5.0644	0.2260
5.032	66.0	235818	5.0623	0.2253
5.0283	67.0	239391	5.1004	0.2213
5.0223	68.0	242964	5.0573	0.2279
5.0184	69.0	246537	5.0488	0.2271
5.014	70.0	250110	5.0482	0.2280
5.0102	71.0	253683	5.0600	0.2269
5.0079	72.0	257256	5.0271	0.2279
5.0029	73.0	260829	5.0629	0.2267
4.9994	74.0	264402	5.0304	0.2297
4.9978	75.0	267975	5.0485	0.2269
4.9945	76.0	271548	5.0380	0.2306
4.9917	77.0	275121	5.0590	0.2265
4.9913	78.0	278694	5.0585	0.2262
4.987	79.0	282267	5.0339	0.2278
4.9862	80.0	285840	5.0214	0.2305
4.9841	81.0	289413	5.0393	0.2271
4.983	82.0	292986	5.0200	0.2298
4.9816	83.0	296559	5.0289	0.2300
4.9801	83.96	300000	4.9972	0.2332

Framework versions

Transformers 4.26.0
Pytorch 1.14.0a0+410ce96
Datasets 2.9.0
Tokenizers 0.13.2

gokuls
/

distilbert_add_pre-training-complete

distilbert_add_pre-training-complete

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train gokuls/distilbert_add_pre-training-complete

Evaluation results