haining commited on
Commit
81078bb
1 Parent(s): 19792a1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -11
README.md CHANGED
@@ -107,22 +107,35 @@ print(tokenizer.decode(decoded_ids[0], skip_special_tokens=True))
107
 
108
  ## Data
109
 
110
- TBA.
 
 
 
 
 
 
111
 
112
- <!-- For SAS-baseline, we finetuned Flan-T5 model with the Scientific Abstract-Significance (SAS) corpus.
113
-
114
- | Scientific Abstract-Significance | # Training/Dev/Test Samples | # Training Tokens | # Validation Tokens | # Test Tokens | Automated Readability Index (std.) |
115
- |----------------------------------|-----------------------------|-------------------|---------------------|---------------|------------------------------------|
116
- | Abstract | 3030/200/200 | 707,071 | 45,697 | 46,985 | 18.68 (2.85) |
117
- | Significance | 3030/200/200 | 375,433 | 24,901 | 24,426 | 17.89 (3.05) |
118
- -->
119
 
120
 
121
  ## Setup
122
 
123
- TBA.
124
- <!-- We finetuned the base model with a standard language modeling objective: the abstracts are sources and the significance statements are targets. We inform the model with a task-spcific prefix ("summarize, simplify, and contextualize: ") during training. The training took roughly 9 hours on two NVIDIA RTX A5000 (24GB memory each) GPUs. We saved the checkpoint with the lowest validation loss for inference. We used the AdamW optimizer and a learning rate of 3e-5 with fully sharded data parallel strategy. The model (\~780M parameter) was trained on Nov. 20, 2022.
125
- Notice, the readability of the significance statements is generally lower than the abstracts', but not by a large margin. Our incoming SAS-full model will leverage more corpora for scientific (re)contextualization, summarization, and simplification. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
126
 
127
 
128
  # Evaluation
 
107
 
108
  ## Data
109
 
110
+ | Corpus | # Training/Dev/Test Samples | # Training Tokens (source, target) | # Validation Tokens (source, target) | # Test Tokens (source, target) | Note |
111
+ |----------------------------------|-----------------------------|------------------------------------|--------------------------------------|--------------------------------|----------------------------------------------------------------------------------------------------------------------------------------|
112
+ | Scientific Abstract-Significance | 3030/200/200 | 707071, 375433 | 45697, 24901 | 46985, 24426 | |
113
+ | Editor Abstract | 732/91/92 | 154808, 194721 | 19675, 24421 | 19539, 24332 | |
114
+ | Wiki Auto | 28364/1000/1000 | 18239990, 12547272 | 643157, 444034 | 642549, 444883 | We used the ACL version, adopted from Huggingface datasets. The validation and test samples are split from the corpus and kept frozen. |
115
+ | CNN/DailyMail | 287113/13368/11490 | - | - | - | We used the 2.0 version, adopted from Huggingface datasets. |
116
+ |
117
 
 
 
 
 
 
 
 
118
 
119
 
120
  ## Setup
121
 
122
+ We finetuned the base model (flan-t5-large) on multiple relevant tasks with standard language modeling loss. During training, the source text of each task is prepended with an task-specific instruction and mapped to the corresponding target text. For example, "simplify: " is added before a wiki text, and the whole text is fed into the model to line up with the corresponding simple wiki text. The tuning process has two steps.
123
+
124
+ | Task | Corpus | Instruction | Optimal samples |
125
+ |------------------------------------|----------------------------------|--------------------------------------------|-----------------|
126
+ | Scientific Abstract Simplification | Scientific Abstract-Significance | "summarize, simplify, and contextualize: " | 39200 |
127
+ | Recontextualization | Editor Abstract | "contextualize: " | 2200 |
128
+ | Simplification | Wiki Auto | "simplify: " | 57000 |
129
+ | Summarization | CNN/DailyMail | "summarize: " | 165000 |
130
+ |------------------------------------|----------------------------------|--------------------------------------------|-----------------|
131
+ | Total | Challenge-proportional Mixture | n/a | 263400 |
132
+
133
+
134
+ - Multi-instruction tuning: In the stage, we first created a task mixture using "challenge-proportional mixing" method. In a seperate pilot studie, for each task, we finetuned it on a base model and observed the number of samples when validation loss starts to rise. We mixed the samples of each task proportional to its optimal number of samples. A corpus is exhausted before upsampling if the number of total samples is smaller than its optimal number. We finetune with the task mixture (263,400 samples) with the aforementioned template.
135
+
136
+ - Retuning: In this stage, we continued finetuning the checkpoint solely with the Scientific Abstract-Significance corpus till optimal validation loss was observed.
137
+
138
+ The multi-instruction tuning and the retuning took roughly 63 hours and 8 hours, respectively, on two NVIDIA RTX A5000 (24GB memory each) GPUs. We saved the checkpoint with the lowest validation loss for inference. We used the AdamW optimizer and a learning rate of 3e-5 with fully sharded data parallel strategy across training stages.
139
 
140
 
141
  # Evaluation