File size: 4,079 Bytes
335d366
2e42f53
 
 
 
 
 
 
ec81c0f
2e42f53
ec81c0f
 
 
2e42f53
7faa9d5
2e42f53
 
335d366
2e42f53
b179c47
2e42f53
df91210
3b515a1
 
2e42f53
905c83b
2e42f53
905c83b
2e42f53
 
 
3b515a1
257fc96
 
be4c3ae
3b515a1
257fc96
 
3b515a1
 
 
 
 
 
 
 
 
 
 
257fc96
3b515a1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2e42f53
 
 
df91210
a40b6f3
 
 
 
2e42f53
 
 
 
b179c47
 
 
2e42f53
 
b179c47
 
2e42f53
 
b179c47
 
2e42f53
 
 
 
b179c47
 
 
e1f3331
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
language:
- en
license:
- apache-2.0
- bsd-3-clause
tags:
- summarization
- extractive
- summary
- abstractive
- multi-task
- document summary
datasets:
- jordiclive/scored_summarization_datasets
metrics:
- rouge
---

# Multi-purpose Summarizer (Fine-tuned 3B google/flan-t5-xl on several Summarization datasets)

 <a href="https://colab.research.google.com/drive/1EYfnIoG-r5lL2-3oiO_YdYEVKB0pAa9h">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

A fine-tuned version of [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl) on various summarization datasets (xsum, wikihow, cnn_dailymail/3.0.0, samsum, scitldr/AIC, billsum, TLDR)

Goal: a model that can be used for a general-purpose summarizer for academic and general usage. Control over the type of summary can be given by varying the instruction prepended to the source document. The result works well on lots of text, although trained with a max source length of 512 tokens and 150 max summary length. 

---

## Usage 
Check the colab notebook for desired usage
**The model expects a prompt prepended to the source document to indicate the type of summary**, examples of prompts used to train the model here:
Prompts should be formatted with a colon at the end so that the input to the model is formatted as e.g. "Summarize the following: {input_text}". Note this model was trained with far fewer prompts than models like `jordiclive/flan-t5-11b-summarizer-filtered` so new prompts might not generalize as well.
```

. 
prompts = {
    "article": "Produce an article summary of the following news article:",
    "one_sentence": "Given the following news article, summarize the article in one sentence:",
    "conversation": "Briefly summarize in third person the following conversation:",
    "scitldr": "Given the following scientific article, provide a TL;DR summary:",
    "bill": "Summarize the following proposed legislation (bill):",
    "outlines": "Produce an article summary including outlines of each paragraph of the following article:",
}
```
After `pip install transformers` run the following code:

This pipeline will run slower and not have some of the tokenization parameters as the colab.
```python
from transformers import pipeline

summarizer = pipeline("summarization", "jordiclive/flan-t5-3b-summarizer", torch_dtype=torch.bfloat16)

raw_document = 'You must be 18 years old to live or work in New York State...'
prompt = "Produce an article summary of the following news article:"
results = summarizer(
        f"{prompt} {raw_document}",
        num_beams=5,
        min_length=5,
        no_repeat_ngram_size=3,
        truncation=True,
        max_length=512,
    )
```

---

## Training procedure

- Training was done in BF16, deepspeed stage 2 for 6 epochs with ROUGE-2 monitored on the validation set.

## Hardware
- GPU count	8 NVIDIA A100-SXM4-40GB
- CPU count	48
### Training hyperparameters


The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 5
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 2
- effective_train_batch_size: 80
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- warmup_steps: 2000
- num_epochs: 10


### Framework versions

- Transformers 4.24.0
- Pytorch 1.9.1+cu111
- Deepspeed 0.7.4
- Pytorch-lightning 1.8.1


### Citation
```
@misc{jordiclive_flan_t5_3b_summarizer_2023,
  title={{Multi-purpose Summarizer (Fine-tuned google/flan-t5-xl on several Summarization datasets)}},
  author={{Jordan Clive}},
  howpublished={\url{https://huggingface.co/jordiclive/flan-t5-3b-summarizer}},
  year={2023},
  note={Apache 2.0 and BSD-3-Clause License. Fine-tuned on various summarization datasets including xsum, wikihow, cnn_dailymail/3.0.0, samsum, scitldr/AIC, billsum, TLDR. Designed for academic and general usage with control over summary type by varying the instruction prepended to the source document.},
  url={https://huggingface.co/jordiclive/flan-t5-3b-summarizer},
}
```