gpt2-geez

This model is a fine-tuned version of gpt2 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 8.7806

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 100
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss
7.805	1.0	2869	8.1560
7.6668	2.0	5738	8.0964
7.5902	3.0	8607	8.0223
7.5213	4.0	11476	7.9265
7.4008	5.0	14345	7.8397
7.301	6.0	17214	7.7674
7.1974	7.0	20083	7.7010
7.1083	8.0	22952	7.6304
6.9829	9.0	25821	7.5783
6.8634	10.0	28690	7.5059
6.7617	11.0	31559	7.4591
6.699	12.0	34428	7.4385
6.6222	13.0	37297	7.4152
6.4996	14.0	40166	7.3716
6.4138	15.0	43035	7.3621
6.3134	16.0	45904	7.3350
6.2517	17.0	48773	7.3317
6.1405	18.0	51642	7.3333
6.0658	19.0	54511	7.3313
5.9379	20.0	57380	7.3308
5.8857	21.0	60249	7.3176
5.8123	22.0	63118	7.3555
5.7219	23.0	65987	7.3272
5.6109	24.0	68856	7.3490
5.5721	25.0	71725	7.3804
5.4767	26.0	74594	7.3616
5.3536	27.0	77463	7.4173
5.3088	28.0	80332	7.4068
5.2084	29.0	83201	7.4598
5.1875	30.0	86070	7.4445
5.1105	31.0	88939	7.4917
5.0036	32.0	91808	7.5289
4.9554	33.0	94677	7.5701
4.8937	34.0	97546	7.6252
4.8128	35.0	100415	7.5901
4.7318	36.0	103284	7.6583
4.6531	37.0	106153	7.6874
4.6181	38.0	109022	7.7548
4.5611	39.0	111891	7.7664
4.4673	40.0	114760	7.8109
4.4184	41.0	117629	7.7604
4.3436	42.0	120498	7.8470
4.329	43.0	123367	7.9043
4.2249	44.0	126236	7.9154
4.1761	45.0	129105	7.9494
4.153	46.0	131974	7.9806
4.09	47.0	134843	7.9693
4.0814	48.0	137712	8.0332
3.9889	49.0	140581	8.0437
3.8982	50.0	143450	8.1102
3.8621	51.0	146319	8.1181
3.8337	52.0	149188	8.1632
3.797	53.0	152057	8.1996
3.7656	54.0	154926	8.2277
3.7031	55.0	157795	8.2382
3.6823	56.0	160664	8.2876
3.621	57.0	163533	8.3095
3.5373	58.0	166402	8.3176
3.5675	59.0	169271	8.3374
3.5522	60.0	172140	8.3418
3.4695	61.0	175009	8.3852
3.4313	62.0	177878	8.3725
3.3989	63.0	180747	8.4252
3.3297	64.0	183616	8.4471
3.331	65.0	186485	8.4471
3.2577	66.0	189354	8.4660
3.2561	67.0	192223	8.4727
3.257	68.0	195092	8.5081
3.2167	69.0	197961	8.5476
3.1696	70.0	200830	8.5399
3.0959	71.0	203699	8.5425
3.0822	72.0	206568	8.5941
3.0605	73.0	209437	8.6037
3.092	74.0	212306	8.6128
3.0725	75.0	215175	8.5998
3.0599	76.0	218044	8.6316
2.9968	77.0	220913	8.6512
2.9697	78.0	223782	8.6503
2.9571	79.0	226651	8.6605
2.9867	80.0	229520	8.6775
2.89	81.0	232389	8.6773
2.9005	82.0	235258	8.6927
2.9131	83.0	238127	8.6921
2.8856	84.0	240996	8.7090
2.8438	85.0	243865	8.7086
2.8588	86.0	246734	8.7205
2.8226	87.0	249603	8.7406
2.8125	88.0	252472	8.7360
2.7896	89.0	255341	8.7401
2.8169	90.0	258210	8.7440
2.7947	91.0	261079	8.7519
2.7763	92.0	263948	8.7605
2.7666	93.0	266817	8.7577
2.8084	94.0	269686	8.7659
2.7636	95.0	272555	8.7705
2.7361	96.0	275424	8.7794
2.7511	97.0	278293	8.7810
2.7264	98.0	281162	8.7782
2.7505	99.0	284031	8.7818
2.7111	100.0	286900	8.7806

Framework versions

Transformers 4.48.3
Pytorch 2.6.0+cu126
Datasets 3.2.0
Tokenizers 0.21.0

Mequanent
/

gpt2-geez

You need to agree to share your contact information to access this model

gpt2-geez

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Mequanent/gpt2-geez

Evaluation results