jordiclive commited on
Commit
4992416
1 Parent(s): 56242e4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -0
README.md ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license:
5
+ - apache-2.0
6
+ - bsd-3-clause
7
+ tags:
8
+ - summarization
9
+ - extractive
10
+ - summary
11
+ - abstractive
12
+ - multi-task
13
+ - document summary
14
+ datasets:
15
+ - jordiclive/scored_summarization_datasets
16
+ - jordiclive/wikipedia-summary-dataset
17
+ metrics:
18
+ - rouge
19
+ ---
20
+
21
+ # Multi-purpose Summarizer (Fine-tuned 3B google/flan-t5-xl on several Summarization datasets)
22
+
23
+ <a href="https://colab.research.google.com/drive/1fNOfy7oHYETI_KzJSz8JrhYohFBBl0HY">
24
+ <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
25
+ </a>
26
+
27
+ A fine-tuned version of [google/flan-t5-xxl](https://huggingface.co/google/flan-t5-xxl) on various summarization datasets (xsum, wikihow, cnn_dailymail/3.0.0, samsum, scitldr/AIC, billsum, TLDR, wikipedia-summary)
28
+
29
+ 70% of the data was also filtered with the use of the [contriever](https://github.com/facebookresearch/contriever) with a cosine similarity between text and summary of 0.6 as threshold.
30
+
31
+ Goal: a model that can be used for a general-purpose summarizer for academic and general usage. Control over the type of summary can be given by varying the instruction prepended to the source document. The result works well on lots of text, although trained with a max source length of 512 tokens and 150 max summary length.
32
+
33
+ ---
34
+
35
+ ## Usage
36
+ Check the colab notebook for desired usage.
37
+ **The model expects a prompt prepended to the source document to indicate the type of summary**, this model was trained with a large (100s) variety of prompts:
38
+ ```
39
+
40
+ .
41
+ example_prompts = {
42
+ "social": "Produce a short summary of the following social media post:",
43
+ "ten": "Summarize the following article in 10-20 words:",
44
+ "5": "Summarize the following article in 0-5 words:",
45
+ "100": "Summarize the following article in about 100 words:",
46
+ "summary": "Write a ~ 100 word summary of the following text:",
47
+ "short": "Provide a short summary of the following article:",
48
+ }
49
+
50
+ The model has also learned for the length of the summary to be specified in words by a range "x-y words" or e.g. "~/approximately/about/ x words."
51
+
52
+ Prompts should be formatted with a colon at the end so that the input to the model is formatted as e.g. "Summarize the following: \n\n <input text>"
53
+
54
+ ```
55
+ After `pip install transformers` run the following code:
56
+
57
+ This pipeline will run slower and not have some of the tokenization parameters as the colab.
58
+ ```python
59
+ from transformers import pipeline
60
+
61
+ summarizer = pipeline("summarization", "jordiclive/flan-t5-3b-summarizer", torch_dtype=torch.bfloat16)
62
+
63
+ raw_document = 'You must be 18 years old to live or work in New York State...'
64
+ prompt = "Summarize the following article in 10-20 words:"
65
+ results = summarizer(
66
+ f"{prompt} \n\n {raw_document}",
67
+ num_beams=5,
68
+ min_length=5,
69
+ no_repeat_ngram_size=3,
70
+ truncation=True,
71
+ max_length=512,
72
+ )
73
+ ```
74
+
75
+ ---
76
+
77
+ ## Training procedure
78
+
79
+ - Training was done in BF16, deepspeed stage 2 with CPU offload for 1 epoch with val loss monitored.
80
+
81
+ ## Hardware
82
+ - GPU count 8 NVIDIA A100-SXM4-80GB
83
+ - CPU count 48
84
+ ### Training hyperparameters
85
+
86
+
87
+ The following hyperparameters were used during training:
88
+ - learning_rate: 3e-05
89
+ - train_batch_size: 4
90
+ - eval_batch_size: 4
91
+ - seed: 42
92
+ - distributed_type: multi-GPU
93
+ - gradient_accumulation_steps: 2
94
+ - effective_train_batch_size: 64
95
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
96
+ - lr_scheduler_type: linear
97
+ - warmup_steps: 2000
98
+ - num_epochs: 4
99
+
100
+
101
+ ### Framework versions
102
+
103
+ - Transformers 4.24.0
104
+ - Pytorch 1.9.1+cu111
105
+ - Deepspeed 0.7.4
106
+ - Pytorch-lightning 1.8.1