ViT pre-trained from scratch on CIFAR10

This model is a ViT (with the same arch as Google's vit-base-patch16-224 pre-trained from scratch on the cifar10 dataset for masked image modeling.

It achieves the following results on the evaluation set:

Loss: 0.0891

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 1337
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 100.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.289	1.0	2657	0.2941
0.2858	2.0	5314	0.2809
0.2693	3.0	7971	0.2738
0.2578	4.0	10628	0.2546
0.2211	5.0	13285	0.2153
0.1799	6.0	15942	0.1795
0.158	7.0	18599	0.1623
0.1481	8.0	21256	0.1453
0.1391	9.0	23913	0.1368
0.1348	10.0	26570	0.1354
0.129	11.0	29227	0.1249
0.126	12.0	31884	0.1229
0.1216	13.0	34541	0.1184
0.1175	14.0	37198	0.1185
0.1137	15.0	39855	0.1146
0.1125	16.0	42512	0.1117
0.1112	17.0	45169	0.1100
0.1108	18.0	47826	0.1089
0.1061	19.0	50483	0.1070
0.1073	20.0	53140	0.1076
0.1066	21.0	55797	0.1061
0.1065	22.0	58454	0.1056
0.1045	23.0	61111	0.1037
0.1052	24.0	63768	0.1055
0.102	25.0	66425	0.1028
0.1025	26.0	69082	0.1034
0.1037	27.0	71739	0.1025
0.1022	28.0	74396	0.1014
0.1026	29.0	77053	0.1011
0.1022	30.0	79710	0.1001
0.0997	31.0	82367	0.1007
0.0998	32.0	85024	0.1016
0.1019	33.0	87681	0.1008
0.0999	34.0	90338	0.1000
0.0998	35.0	92995	0.0993
0.0994	36.0	95652	0.0992
0.0966	37.0	98309	0.0991
0.0997	38.0	100966	0.0970
0.0991	39.0	103623	0.0979
0.099	40.0	106280	0.0983
0.0974	41.0	108937	0.0980
0.0974	42.0	111594	0.0971
0.0972	43.0	114251	0.0970
0.0991	44.0	116908	0.0970
0.0979	45.0	119565	0.0972
0.097	46.0	122222	0.0970
0.0936	47.0	124879	0.0967
0.0948	48.0	127536	0.0967
0.0974	49.0	130193	0.0954
0.0958	50.0	132850	0.0954
0.0948	51.0	135507	0.0955
0.095	52.0	138164	0.0953
0.0939	53.0	140821	0.0945
0.0961	54.0	143478	0.0948
0.0964	55.0	146135	0.0955
0.0934	56.0	148792	0.0948
0.0965	57.0	151449	0.0943
0.0966	58.0	154106	0.0941
0.0926	59.0	156763	0.0938
0.0928	60.0	159420	0.0942
0.093	61.0	162077	0.0936
0.0939	62.0	164734	0.0939
0.0936	63.0	167391	0.0936
0.093	64.0	170048	0.0929
0.0929	65.0	172705	0.0930
0.0917	66.0	175362	0.0925
0.0948	67.0	178019	0.0932
0.0931	68.0	180676	0.0927
0.0911	69.0	183333	0.0922
0.0923	70.0	185990	0.0924
0.0923	71.0	188647	0.0923
0.0929	72.0	191304	0.0919
0.0916	73.0	193961	0.0923
0.0927	74.0	196618	0.0921
0.0907	75.0	199275	0.0922
0.0927	76.0	201932	0.0919
0.0925	77.0	204589	0.0913
0.0921	78.0	207246	0.0917
0.0895	79.0	209903	0.0912
0.0916	80.0	212560	0.0914
0.09	81.0	215217	0.0909
0.0916	82.0	217874	0.0908
0.0902	83.0	220531	0.0907
0.0911	84.0	223188	0.0910
0.091	85.0	225845	0.0903
0.0903	86.0	228502	0.0905
0.0907	87.0	231159	0.0901
0.0908	88.0	233816	0.0907
0.0911	89.0	236473	0.0902
0.0905	90.0	239130	0.0906
0.089	91.0	241787	0.0901
0.0908	92.0	244444	0.0896
0.0894	93.0	247101	0.0892
0.0899	94.0	249758	0.0893
0.0899	95.0	252415	0.0897
0.0904	96.0	255072	0.0898
0.0906	97.0	257729	0.0894
0.0892	98.0	260386	0.0894
0.0881	99.0	263043	0.0892
0.09	100.0	265700	0.0894

Framework versions

Transformers 4.19.0.dev0
Pytorch 1.10.0+cu111
Datasets 2.0.0
Tokenizers 0.11.6

mrm8488
/

vit-base-patch16-224-pretrained-cifar10

ViT pre-trained from scratch on CIFAR10

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train mrm8488/vit-base-patch16-224-pretrained-cifar10

Evaluation results