pszemraj commited on
Commit
ec8ddc3
1 Parent(s): f7e2dad

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md CHANGED
@@ -1,3 +1,63 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - stacked-summaries/stacked-samsum-1024
5
+ language:
6
+ - en
7
+ metrics:
8
+ - rouge
9
+ tags:
10
+ - stacked summaries
11
+ - samsum
12
  ---
13
+
14
+
15
+ # flan-t5-small-stacked-samsum-1024
16
+
17
+ This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) on the `stacked-summaries/stacked-samsum-1024` dataset.
18
+ It achieves the following results on the evaluation set:
19
+ - Loss: 1.7573
20
+ - Rouge1: 46.6072
21
+ - Rouge2: 19.9754
22
+ - Rougel: 35.2715
23
+ - Rougelsum: 43.3599
24
+ - Gen Len: 72.64
25
+
26
+ ## Model description
27
+
28
+ Trained on a summarization task with _potentially_ several doc-summary pairs stacked on top of each other.
29
+
30
+ You can separate it's predictions by using it's special token `[NEXT_CONCEPT]` to split the outputs in "separate topics".
31
+
32
+ ## Intended uses & limitations
33
+
34
+ - this is meant to be used as a baseline/reference for comparison to the larger models
35
+
36
+ ## Training and evaluation data
37
+
38
+ See `stacked-summaries/stacked-samsum-1024`
39
+
40
+ ## Training procedure
41
+
42
+ ### Training hyperparameters
43
+
44
+ The following hyperparameters were used during training:
45
+ - learning_rate: 0.0001
46
+ - train_batch_size: 16
47
+ - eval_batch_size: 16
48
+ - seed: 22138
49
+ - distributed_type: multi-GPU
50
+ - gradient_accumulation_steps: 8
51
+ - total_train_batch_size: 128
52
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
53
+ - lr_scheduler_type: cosine
54
+ - lr_scheduler_warmup_ratio: 0.05
55
+ - num_epochs: 3.0
56
+
57
+ ### Training results
58
+
59
+ | Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len |
60
+ |:-------------:|:-----:|:----:|:---------------:|:-------:|:-------:|:-------:|:---------:|:-------:|
61
+ | 1.9011 | 1.0 | 230 | 1.7986 | 45.4597 | 19.6956 | 34.6878 | 42.3724 | 74.16 |
62
+ | 1.8297 | 2.0 | 460 | 1.7609 | 46.0427 | 20.2299 | 35.2076 | 43.0549 | 70.56 |
63
+ | 1.7637 | 3.0 | 690 | 1.7573 | 46.6072 | 19.9754 | 35.2715 | 43.3599 | 72.64 |