pszemraj commited on
Commit
63cf97c
1 Parent(s): 726151b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -0
README.md ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - summarization
4
+ - summary
5
+ - booksum
6
+ - long-document
7
+ - long-form
8
+ license:
9
+ - apache-2.0
10
+ - bsd-3-clause
11
+ datasets:
12
+ - kmfoda/booksum
13
+ metrics:
14
+ - rouge
15
+ inference: False
16
+
17
+ ---
18
+
19
+ # long-t5-tglobal-xl + BookSum
20
+
21
+ - summarize long text and get a SparkNotes-esque summary of arbitrary topics!
22
+ - generalizes reasonably well to academic & narrative text. This is the XL checkpoint, which **from a human-evaluation perspective, produces even better summaries**.
23
+ - A simple example/use case with the `base` model on ASR is [here](https://longt5-booksum-example.netlify.app/).
24
+
25
+ ## Model description
26
+
27
+ A fine-tuned version of [google/long-t5-tglobal-xl](https://huggingface.co/google/long-t5-tglobal-xl) on the `kmfoda/booksum` dataset.
28
+
29
+ Read the paper by Guo et al. here: [LongT5: Efficient Text-To-Text Transformer for Long Sequences](https://arxiv.org/pdf/2112.07916.pdf)
30
+
31
+ ## How-To in Python
32
+
33
+ > `LLM.int8()` appears to be compatible with summarization and does not degrade the quality of the outputs; this is a crucial enabler for using this model on standard GPUs. A PR for this is in-progress [here](https://github.com/huggingface/transformers/pull/20341), and this model card will be updated with instructions once done :)
34
+
35
+ Install/update transformers `pip install -U transformers`
36
+
37
+ Summarize text with pipeline:
38
+
39
+ ```python
40
+ import torch
41
+ from transformers import pipeline
42
+
43
+ summarizer = pipeline(
44
+ "summarization",
45
+ "pszemraj/long-t5-tglobal-xl-16384-book-summary",
46
+ device=0 if torch.cuda.is_available() else -1,
47
+ )
48
+ long_text = "Here is a lot of text I don't want to read. Replace me"
49
+
50
+ result = summarizer(long_text)
51
+ print(result[0]["summary_text"])
52
+ ```
53
+
54
+ Pass [other parameters related to beam search textgen](https://huggingface.co/blog/how-to-generate) when calling `summarizer` to get even higher quality results.
55
+
56
+
57
+ ## Intended uses & limitations
58
+
59
+ - while this model seems to improve upon factual consistency, **do not take summaries to be foolproof and check things that seem odd**.
60
+ - specifically: negation statements (i.e. model says: _This thing does not have <ATTRIBUTE>_ where instead it should have said _This thing has a lot of <ATTRIBUTE>_).
61
+ - I'm sure someone will write a paper on this eventually (if there isn't one already), but you can usually fact-check this by paying attention to the surrounding sentences of a claim by the model.
62
+
63
+ ## Training and evaluation data
64
+
65
+ - `kmfoda/booksum` dataset on HuggingFace - read [the original paper here](https://arxiv.org/abs/2105.08209).
66
+ - **Initial fine-tuning** only used input text with 12288 tokens input or less and 1024 tokens output or less for memory reasons. Per brief analysis, summaries in the 12288-16384 range in this dataset are in the **small** minority
67
+ - In addition, this initial training combined the training and validation sets and trained on these in aggregate to increase the functional dataset size. **Therefore, take the validation set results with a grain of salt; primary metrics should be (always) the test set.**
68
+ - **final phases of fine-tuning** used the standard conventions of 16384 input/1024 output keeping everything (truncating longer sequences). This did not appear to change the loss/performance much.
69
+
70
+ ## Eval Results
71
+
72
+ Official results with the [model evaluator](https://huggingface.co/spaces/autoevaluate/model-evaluator) will be computed and posted here.
73
+
74
+ **Please read the note above as due to training methods it looks better than the test set results will be**. The model achieves the following results on the evaluation set:
75
+ - eval_loss: 1.2756
76
+ - eval_rouge1: 41.8013
77
+ - eval_rouge2: 12.0895
78
+ - eval_rougeL: 21.6007
79
+ - eval_rougeLsum: 39.5382
80
+ - eval_gen_len: 387.2945
81
+ - eval_runtime: 13908.4995
82
+ - eval_samples_per_second: 0.107
83
+ - eval_steps_per_second: 0.027
84
+
85
+ ---
86
+
87
+ ## FAQ
88
+
89
+ ### How can I run inference with this on CPU?
90
+
91
+ lol
92
+
93
+ ---
94
+
95
+ ## Training procedure
96
+
97
+ ### Updates
98
+
99
+ Updates to this model/model card will be posted here as relevant. The model seems fairly converged, but if updates/improvements can be made using `kmfoda/booksum`, this repo will be updated.
100
+
101
+ ### Training hyperparameters
102
+
103
+ The following hyperparameters were used during training:
104
+ - learning_rate: 0.0006
105
+ - train_batch_size: 1
106
+ - eval_batch_size: 1
107
+ - seed: 10350
108
+ - distributed_type: multi-GPU
109
+ - num_devices: 4
110
+ - gradient_accumulation_steps: 32
111
+ - total_train_batch_size: 128
112
+ - total_eval_batch_size: 4
113
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
114
+ - lr_scheduler_type: constant
115
+ - num_epochs: 1.0
116
+
117
+ \*_Prior training sessions used roughly similar parameters (learning rates were higher); multiple sessions were required as this takes eons to train
118
+
119
+
120
+ ### Framework versions
121
+
122
+ - Transformers 4.25.0.dev0
123
+ - Pytorch 1.13.0+cu117
124
+ - Datasets 2.6.1
125
+ - Tokenizers 0.13.1
126
+