pszemraj commited on
Commit
47f9b27
1 Parent(s): bb45ce3

add training details

Browse files
Files changed (1) hide show
  1. README.md +42 -6
README.md CHANGED
@@ -128,15 +128,23 @@ result = summarizer(
128
 
129
  ```
130
 
 
 
 
131
  ## Training and evaluation data
132
 
133
  - the [booksum](https://arxiv.org/abs/2105.08209) dataset
134
- - During training, the input text was the text of the chapter, and the output was the summary text
135
 
136
  ## Training procedure
137
 
 
 
 
138
  ### Training hyperparameters
139
 
 
 
140
  The following hyperparameters were used during training:
141
  - learning_rate: 5e-05
142
  - train_batch_size: 1
@@ -149,13 +157,41 @@ The following hyperparameters were used during training:
149
  - lr_scheduler_type: linear
150
  - num_epochs: 3
151
 
152
- ### Training results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
153
 
 
 
 
 
 
 
 
 
 
 
 
 
154
 
155
 
156
  ### Framework versions
157
 
158
- - Transformers 4.16.2
159
- - Pytorch 1.10.0+cu113
160
- - Datasets 1.18.3
161
- - Tokenizers 0.11.0
128
 
129
  ```
130
 
131
+
132
+ **Important:** To generate the best quality summaries, you should use the global attention mask when decoding, as demonstrated in [this community notebook here](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing), see the definition of `generate_answer(batch)`.
133
+
134
  ## Training and evaluation data
135
 
136
  - the [booksum](https://arxiv.org/abs/2105.08209) dataset
137
+ - During training, the input text was the text of the `chapter`, and the output was `summary_text`
138
 
139
  ## Training procedure
140
 
141
+ - Training completed on the BookSum dataset for 13 total epochs
142
+ - **The final four epochs combined the training and validation sets as 'train' in an effort to increase generalization.**
143
+
144
  ### Training hyperparameters
145
 
146
+ #### Initial Three Epochs
147
+
148
  The following hyperparameters were used during training:
149
  - learning_rate: 5e-05
150
  - train_batch_size: 1
157
  - lr_scheduler_type: linear
158
  - num_epochs: 3
159
 
160
+ #### In-between Epochs
161
+
162
+ Unfortunately, don't have all records on-hand for middle epochs, the following should be representative:
163
+
164
+ - learning_rate: 4e-05
165
+ - train_batch_size: 2
166
+ - eval_batch_size: 2
167
+ - seed: 42
168
+ - distributed_type: multi-GPU
169
+ - gradient_accumulation_steps: 16
170
+ - total_train_batch_size: 32
171
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
172
+ - lr_scheduler_type: cosine
173
+ - lr_scheduler_warmup_ratio: 0.05
174
+ - num_epochs: 6 (in addition to prior model)
175
+
176
+ #### Final Two Epochs
177
 
178
+ The following hyperparameters were used during training:
179
+ - learning_rate: 2e-05
180
+ - train_batch_size: 1
181
+ - eval_batch_size: 1
182
+ - seed: 42
183
+ - distributed_type: multi-GPU
184
+ - gradient_accumulation_steps: 16
185
+ - total_train_batch_size: 16
186
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
187
+ - lr_scheduler_type: cosine
188
+ - lr_scheduler_warmup_ratio: 0.03
189
+ - num_epochs: 2 (in addition to prior model)
190
 
191
 
192
  ### Framework versions
193
 
194
+ - Transformers 4.19.2
195
+ - Pytorch 1.11.0+cu113
196
+ - Datasets 2.2.2
197
+ - Tokenizers 0.12.1