led-large-16384-govreport

This model is a fine-tuned version of allenai/led-base-16384 on the govreport-summarization dataset. It achieves the following results on the evaluation set:

Loss: 2.1142
Rouge1: 0.5445
Rouge2: 0.2225
Rougel: 0.2578
Rougelsum: 0.2579

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 64
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum
1.8152	3.65	500	1.7956	0.5095	0.2040	0.2382	0.2381
1.6981	3.66	1000	1.7624	0.5194	0.2107	0.2437	0.2437
1.7048	5.49	1500	1.7448	0.5253	0.2149	0.2467	0.2467
1.6469	7.32	2000	1.7416	0.5299	0.2177	0.2499	0.2500
1.6465	9.15	2500	1.7318	0.5299	0.2160	0.2476	0.2478
1.578	10.98	3000	1.7254	0.5321	0.2192	0.2529	0.2530
1.5631	12.81	3500	1.7189	0.5309	0.2170	0.2520	0.2520
1.5641	14.63	4000	1.7152	0.5343	0.2198	0.2550	0.2550
1.4753	16.48	4500	1.7181	0.5305	0.2179	0.2539	0.2542
1.4792	18.3	5000	1.7152	0.5375	0.2258	0.2586	0.2588
1.4206	20.13	5500	1.7142	0.5366	0.2216	0.2555	0.2556
1.4273	21.96	6000	1.7128	0.5364	0.2232	0.2573	0.2573
1.4078	23.78	6500	1.7114	0.5344	0.2200	0.2562	0.2563
1.355	25.61	7000	1.7153	0.5354	0.2212	0.2564	0.2564
1.409	27.44	7500	1.7119	0.5363	0.2217	0.2568	0.2570
1.3817	29.26	8000	1.7166	0.5369	0.2229	0.2582	0.2582
1.3072	31.13	8500	1.7302	0.5379	0.2249	0.2604	0.2603
1.3172	32.96	9000	1.7121	0.5377	0.2236	0.2588	0.2587
1.277	34.78	9500	1.7255	0.5368	0.2221	0.2584	0.2583
1.1849	36.61	10000	1.7438	0.5382	0.2244	0.2611	0.2612
1.1565	38.44	10500	1.7540	0.5414	0.2258	0.2612	0.2612
1.1415	40.26	11000	1.7707	0.5401	0.2251	0.2618	0.2618
1.085	42.09	11500	1.7791	0.5401	0.2235	0.2595	0.2595
1.088	43.92	12000	1.7869	0.5422	0.2265	0.2616	0.2615
1.0678	45.74	12500	1.8058	0.5420	0.2253	0.2607	0.2607
1.0815	47.57	13000	1.8186	0.5405	0.2248	0.2615	0.2615
1.0456	49.4	13500	1.8346	0.5430	0.2262	0.2619	0.2618
0.9553	51.22	14000	1.8449	0.5387	0.2239	0.2614	0.2613
0.958	53.05	14500	1.8716	0.5438	0.2274	0.2618	0.2618
0.9213	54.88	15000	1.8780	0.5438	0.2249	0.2612	0.2612
0.876	56.77	15500	1.8904	0.5439	0.2253	0.2621	0.2621
0.8967	58.6	16000	1.9085	0.5439	0.2264	0.2634	0.2633
0.9138	60.43	16500	1.9089	0.5428	0.2242	0.2597	0.2597
0.848	62.25	17000	1.9153	0.5441	0.2242	0.2600	0.2599
0.7804	64.08	17500	1.9311	0.5422	0.2241	0.2603	0.2604
0.8326	65.91	18000	1.9391	0.5446	0.2242	0.2604	0.2602
0.8164	67.73	18500	1.9607	0.5430	0.2245	0.2607	0.2607
0.8129	69.56	19000	1.9731	0.5456	0.2277	0.2633	0.2633
0.8049	71.39	19500	1.9804	0.5433	0.2248	0.2618	0.2619
0.7605	73.21	20000	2.0060	0.5449	0.2256	0.2607	0.2606
0.7595	75.04	20500	2.0085	0.5425	0.2227	0.2590	0.2590
0.7837	76.87	21000	2.0073	0.5441	0.2243	0.2608	0.2609
0.7458	78.69	21500	2.0210	0.5447	0.2260	0.2619	0.2621
0.7235	80.52	22000	2.0273	0.5445	0.2253	0.2610	0.2611
0.7405	82.35	22500	2.0405	0.5438	0.2243	0.2600	0.2599
0.7323	84.17	23000	2.0385	0.5466	0.2256	0.2607	0.2608
0.7333	86.0	23500	2.0386	0.5447	0.2248	0.2608	0.2609
0.7067	87.83	24000	2.0582	0.5449	0.2243	0.2601	0.2600
0.7073	89.65	24500	2.0615	0.5455	0.2253	0.2604	0.2603
0.6903	91.48	25000	2.0657	0.5482	0.2273	0.2627	0.2626
0.7203	93.31	25500	2.0574	0.5452	0.2241	0.2596	0.2597
0.6765	95.13	26000	2.0692	0.5437	0.2249	0.2608	0.2608
0.6959	96.96	26500	2.0696	0.5442	0.2246	0.2614	0.2614
0.6918	98.79	27000	2.0701	0.5444	0.2252	0.2615	0.2615

Framework versions

Transformers 4.30.2
Pytorch 1.10.0+cu102
Datasets 2.13.1
Tokenizers 0.13.3

Xmm
/

led-large-16384-govreport

led-large-16384-govreport

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results