kennethge123
/

sst5-gpt2-kd

Model card Files Files and versions Community

Plainly Optimized Network

Dataset: BIGBENCH

Trainer Hyperparameters:

lr = 5e-05
per_device_batch_size = 1
gradient_accumulation_steps = 4
weight_decay = 1e-09
seed = 42

eval_loss	eval_mse	epoch
58.741	0.055	1.0
60.624	0.058	2.0
60.765	0.057	3.0
55.858	0.051	4.0
57.271	0.053	5.0
56.004	0.051	6.0
60.246	0.056	7.0
55.218	0.049	8.0
55.261	0.049	9.0
54.730	0.049	10.0
58.137	0.052	11.0
53.927	0.048	12.0
56.143	0.051	13.0
54.604	0.049	14.0
53.596	0.048	15.0
54.241	0.049	16.0
55.500	0.050	17.0
53.256	0.047	18.0
53.139	0.047	19.0

Downloads last month: 7

Inference API

Unable to determine this model’s pipeline type. Check the docs .