pszemraj commited on
Commit
40e3877
·
verified ·
1 Parent(s): f30c6d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -9
README.md CHANGED
@@ -4,25 +4,37 @@ language:
4
  - en
5
  license: apache-2.0
6
  base_model: google/flan-t5-xl
7
- model-index:
8
- - name: flan-t5-xl-summary-map-reduce-1024
9
- results: []
10
  datasets:
11
  - pszemraj/summary-map-reduce-v1
12
  pipeline_tag: text2text-generation
 
 
 
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
-
18
  # flan-t5-xl-summary-map-reduce-1024
19
 
20
- This model is a fine-tuned version of [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl) on the pszemraj/summary-map-reduce-v1 dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  It achieves the following results on the evaluation set:
22
  - Loss: 0.6039
23
  - Num Input Tokens Seen: 7138765
24
 
25
-
26
  ## Training procedure
27
 
28
  ### Training hyperparameters
@@ -37,4 +49,4 @@ The following hyperparameters were used during training:
37
  - optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
38
  - lr_scheduler_type: cosine
39
  - lr_scheduler_warmup_ratio: 0.05
40
- - num_epochs: 2.0
 
4
  - en
5
  license: apache-2.0
6
  base_model: google/flan-t5-xl
 
 
 
7
  datasets:
8
  - pszemraj/summary-map-reduce-v1
9
  pipeline_tag: text2text-generation
10
+ tags:
11
+ - map-reduce
12
+ - summarization
13
  ---
14
 
 
 
 
15
  # flan-t5-xl-summary-map-reduce-1024
16
 
17
+ A larger t2t model trained to complete the "reduce" step (_consolidation step_) of map-reduce summarization.
18
+
19
+ ## About
20
+
21
+ Refer to [this wiki page](https://github.com/pszemraj/textsum/wiki/consolidating-summaries) page or the [smaller BART model card](https://hf.co/pszemraj/bart-large-summary-map-reduce) for explanations and usage examples. Comparatively, this model seems to
22
+
23
+ - produce more eloquent final reduced summaries
24
+ - more "gullible"/sensitive to noise in the input summaries
25
+ - i.e. a hallucinated one-off term/name/entity is likely to be mentioned/appear in the reduced summary
26
+ - agnostic to whitespace in input (_by definition, since the t5 tokenizer normalizes whitespace_)
27
+
28
+ Therefore, it's recommended to compare sample outputs of this model and [the BART version](https://hf.co/pszemraj/bart-large-summary-map-reduce) on your data to see which is better for your use case.
29
+
30
+ ## Details
31
+
32
+ This model is a fine-tuned version of [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl) on the pszemraj/summary-map-reduce-v1 dataset at 1024 context length in/out.
33
+
34
  It achieves the following results on the evaluation set:
35
  - Loss: 0.6039
36
  - Num Input Tokens Seen: 7138765
37
 
 
38
  ## Training procedure
39
 
40
  ### Training hyperparameters
 
49
  - optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
50
  - lr_scheduler_type: cosine
51
  - lr_scheduler_warmup_ratio: 0.05
52
+ - num_epochs: 2.0