esuriddick
/

led-base-16384-finetuned-govreport

text2text-generation

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

led-base-16384-finetuned-govreport / README.md

esuriddick's picture

Update README.md

9ce5f5b 10 months ago

|

raw history blame

No virus

2.89 kB

	---
	license: apache-2.0
	tags:
	- generated_from_trainer
	datasets:
	- pszemraj/govreport-summarization-8192
	model-index:
	- name: led-base-16384-finetuned-govreport
	results: []
	language:
	- en
	pipeline_tag: summarization
	---

	# led-base-16384-finetuned-govreport

	This model is a fine-tuned version of [allenai/led-base-16384](https://huggingface.co/allenai/led-base-16384) on the [pszemraj/govreport-summarization-8192](https://huggingface.co/datasets/pszemraj/govreport-summarization-8192) dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.2887

	The amount of processing time and memory required to assess the ROUGE metrics on the validation and test sets were not supported by Kaggle at this moment in time.

	## Model description

	As described in [Longformer: The Long-Document Transformer](https://arxiv.org/pdf/2004.05150.pdf) by Iz Beltagy, Matthew E. Peters, Arman Cohan, [Allenai's Longformer Encoder-Decoder (LED)](https://github.com/allenai/longformer#longformer) was initialized from [bart-base](https://huggingface.co/facebook/bart-base) since both models share the exact same architecture. To be able to process 16K tokens, bart-base's position embedding matrix was simply copied 16 times.

	This model is especially interesting for long-range summarization and question answering.

	## Intended uses & limitations

	[pszemraj/govreport-summarization-8192](https://huggingface.co/datasets/pszemraj/govreport-summarization-8192) is a pre-processed version of the dataset [ccdv/govreport-summarization](https://huggingface.co/datasets/ccdv/govreport-summarization), which is a dataset for summarization of long documents adapted from this [repository](https://github.com/luyang-huang96/LongDocSum) and this [paper](https://arxiv.org/pdf/2104.02112.pdf).

	The Allenai's LED model was fine-tuned to this dataset, allowing the summarization of documents up to 16384 tokens.

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 2

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 1.1492 \| 0.24 \| 250 \| 1.4233 \|
	\| 1.0077 \| 0.49 \| 500 \| 1.3813 \|
	\| 1.0069 \| 0.73 \| 750 \| 1.3499 \|
	\| 0.9639 \| 0.98 \| 1000 \| 1.3216 \|
	\| 0.7996 \| 1.22 \| 1250 \| 1.3172 \|
	\| 0.9395 \| 1.46 \| 1500 \| 1.3003 \|
	\| 0.913 \| 1.71 \| 1750 \| 1.2919 \|
	\| 0.8843 \| 1.95 \| 2000 \| 1.2887 \|


	### Framework versions

	- Transformers 4.30.2
	- Pytorch 2.0.0
	- Datasets 2.1.0
	- Tokenizers 0.13.3