LLM_Teached_Bart_From_Scratch

This model is a fine-tuned version of facebook/bart-large on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.6350
Rouge1: 0.4471
Rouge2: 0.2259
Rougel: 0.3846
Rougelsum: 0.3845
Gen Len: 19.9087
Precision: 0.9156
Recall: 0.8915
F1: 0.9033

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 24
eval_batch_size: 16
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 96
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 30
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	F1	Gen Len	Validation Loss	Precision	Recall	Rouge1	Rouge2	Rougel	Rougelsum
1.836	1.0	521	0.8971	19.9745	1.5560	0.9105	0.8843	0.4155	0.2028	0.3561	0.3559
1.5951	2.0	1042	0.8997	19.9353	1.5004	0.9115	0.8886	0.4333	0.2136	0.3695	0.3694
1.469	3.0	1563	0.9001	19.9385	1.4691	0.912	0.8888	0.4355	0.2176	0.3729	0.3728
1.373	4.0	2084	0.9003	19.9647	1.4658	0.9137	0.8877	0.4311	0.2164	0.3706	0.3704
1.2902	5.0	2605	0.9008	19.9498	1.4542	0.9136	0.8887	0.4368	0.2218	0.3762	0.376
1.222	6.0	3126	0.9018	19.9425	1.4584	0.914	0.8902	0.4407	0.223	0.3802	0.3798
1.1655	7.0	3647	0.9019	19.9327	1.4709	0.9145	0.89	0.4404	0.2246	0.3806	0.3803
1.11	8.0	4168	0.9026	19.9084	1.4724	0.9153	0.8906	0.4435	0.2269	0.383	0.3828
1.0629	9.0	4689	0.9028	19.928	1.4853	0.9155	0.8908	0.4431	0.2273	0.3832	0.383
1.023	10.0	5210	0.9021	19.944	1.5033	0.9152	0.8897	0.4409	0.2247	0.3819	0.3818
0.9862	11.0	5731	0.9034	19.9124	1.5074	0.9158	0.8916	0.4479	0.2278	0.3862	0.386
0.957	12.0	6252	0.903	19.9033	1.5184	0.9159	0.8909	0.4461	0.2264	0.3846	0.3847
0.9315	13.0	6773	0.9031	19.9084	1.5269	0.9156	0.8912	0.4473	0.2284	0.386	0.3858
0.9093	14.0	7294	0.9029	19.9135	1.5311	0.9155	0.8909	0.4453	0.2273	0.3846	0.3843
0.8927	15.0	7815	0.9029	19.9065	1.5351	0.9156	0.8909	0.4457	0.2267	0.3842	0.384
0.8773	16.0	8336	0.9025	19.9425	1.5440	0.9151	0.8905	0.4427	0.225	0.382	0.382
0.8806	17.0	8857	0.9036	19.8851	1.5510	0.9159	0.8919	0.4495	0.2279	0.3868	0.3869
0.8683	18.0	9378	0.9038	19.8829	1.5679	0.9161	0.8921	0.4473	0.2282	0.3856	0.3857
0.8413	19.0	9899	0.9035	19.9135	1.5745	0.9159	0.8918	0.4492	0.2282	0.3861	0.3864
0.8257	20.0	10420	0.9031	19.8996	1.5835	0.9153	0.8915	0.4471	0.2266	0.3852	0.3853
0.8097	21.0	10941	0.9034	19.9073	1.5957	0.9156	0.8919	0.4472	0.2271	0.3856	0.3856
0.7926	22.0	11462	0.9034	19.892	1.5956	0.9159	0.8916	0.4479	0.2282	0.3855	0.3857
0.7841	23.0	11983	0.9028	19.912	1.5990	0.9155	0.8908	0.4444	0.2261	0.3833	0.3834
0.7669	24.0	12504	1.6097	0.4491	0.2284	0.3872	0.387	19.9007	0.9162	0.892	0.9037
0.7733	25.0	13025	1.6060	0.4442	0.2257	0.3827	0.3828	19.9178	0.9154	0.8906	0.9027
0.7631	26.0	13546	1.6187	0.4472	0.2276	0.3861	0.3861	19.9175	0.9154	0.8915	0.9031
0.7505	27.0	14067	1.6208	0.4463	0.227	0.3852	0.3851	19.8967	0.9155	0.8914	0.9031
0.7413	28.0	14588	1.6237	0.4468	0.2273	0.3854	0.3853	19.9153	0.9159	0.8912	0.9032
0.7348	29.0	15109	1.6312	0.4482	0.2268	0.3858	0.3858	19.8938	0.9158	0.8918	0.9035
0.7286	30.0	15630	1.6350	0.4471	0.2259	0.3846	0.3845	19.9087	0.9156	0.8915	0.9033

Framework versions

Transformers 4.36.0
Pytorch 2.0.1+cu117
Datasets 2.14.5
Tokenizers 0.15.0

GlycerinLOL
/

LLM_Teached_Bart_From_Scratch

LLM_Teached_Bart_From_Scratch

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from

Evaluation results

LLM_Teached_Bart_From_Scratch

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from facebook/bart-large

Evaluation results

Finetuned from