pszemraj commited on
Commit
b63d8fa
1 Parent(s): c2100d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -6
README.md CHANGED
@@ -12,9 +12,10 @@ model-index:
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
  should probably proofread and complete it, then remove this comment. -->
14
 
15
- # flan-t5-large-stacked-XSUM-1024-WIP-2p8-850-stacked-xsum-1024-evaluated
 
 
16
 
17
- This model is a fine-tuned version of [pszemraj/flan-t5-large-stacked-XSUM-1024-WIP-2p8-850](https://huggingface.co/pszemraj/flan-t5-large-stacked-XSUM-1024-WIP-2p8-850) on the stacked-summaries/stacked-xsum-1024 dataset.
18
  It achieves the following results on the evaluation set:
19
  - eval_loss: 1.3314
20
  - eval_rouge1: 46.5061
@@ -25,17 +26,28 @@ It achieves the following results on the evaluation set:
25
  - eval_runtime: 9456.3608
26
  - eval_samples_per_second: 1.896
27
  - eval_steps_per_second: 0.119
28
- - step: 0
29
 
 
30
  ## Model description
31
 
32
- More information needed
33
 
34
  ## Intended uses & limitations
35
 
36
- More information needed
37
 
38
  ## Training and evaluation data
39
 
40
- More information needed
 
 
 
 
 
 
 
 
 
 
41
 
 
 
12
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
  should probably proofread and complete it, then remove this comment. -->
14
 
15
+ # flan-t5-large-stacked-XSUM-1024
16
+
17
+ This model is a fine-tuned version of [google/flan-t5-large](https://huggingface.co/google/flan-t5-large) on the stacked-summaries/stacked-xsum-1024 dataset.
18
 
 
19
  It achieves the following results on the evaluation set:
20
  - eval_loss: 1.3314
21
  - eval_rouge1: 46.5061
 
26
  - eval_runtime: 9456.3608
27
  - eval_samples_per_second: 1.896
28
  - eval_steps_per_second: 0.119
 
29
 
30
+ > Note that the evaluation set is `stacked-summaries/stacked-xsum-1024` and not `xsum` itself
31
  ## Model description
32
 
33
+ This model card presents a model trained on a stacked dataset, which aims to improve summarization by testing the benefits of "task-oriented pretraining." The model is designed to learn how to effectively condense and distill information from text by stacking summaries and separating them into independent concepts. By doing so, the model can learn to identify essential information without simply mimicking the style of the dataset summaries.
34
 
35
  ## Intended uses & limitations
36
 
37
+ - max input length (in tokens): 1024
38
 
39
  ## Training and evaluation data
40
 
41
+ Refer to `stacked-summaries/stacked-xsum-1024`
42
+
43
+ Trained for approx 3 epochs before ROUGE scores stabilized on most recent run:
44
+
45
+
46
+ ### scores
47
+
48
+ ![stable-scores](https://i.imgur.com/4tvhHVy.png)
49
+
50
+
51
+ ### gradients
52
 
53
+ ![gradients](https://i.imgur.com/V6zcmAb.png)