haining commited on
Commit
a9cdc79
1 Parent(s): 81078bb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -30
README.md CHANGED
@@ -53,18 +53,16 @@ You can choose an existing example or paste in any (perhaps full-of-jargon) abst
53
 
54
  Open science has significantly lowered the barriers to scientific papers.
55
  However, reachable research does not mean accessible knowledge. Scientific papers are usually replete with jargon and hard to read. A lay audience would rather trust little stories on social media than read scientific papers. They are not to blame, we human like stories.
56
- So why do not we "translate" arcane scientific abstracts into simpler yet relevant scientific stories?
57
  Some renowned journals have already taken accessibility into consideration. For example, PNAS asks authors to submit Significance Statements targeting "an undergraduate-educated scientist." Science also includes an editor abstract for a quick dive.
58
 
59
- We therefore propose to *rewrite scientific abstracts into understandable scientific stories using AI*.
60
- To this end, we introduce a new corpus comprising PNAS abstract-significance pairs.
61
- We finetune an encoder-decoder Transformer model (a variant of Flan-T5) with the corpus.
62
- Our baseline model (SAS-baseline) shows promising capacity in simplifying and summarizing scientific abstracts.
 
63
  We hope our work can pave the last mile of scientific understanding and let people better enjoy the fruits of open science.
64
 
65
- As an ongoing effort, we are working on re-contextualizating abstracts for better storytelling and avoiding certain jargon tokens during inference time for better readability.
66
-
67
- <!-- We hypothesize the last mile of scientific understanding is cognitive. -->
68
 
69
  - **Model type:** Language model
70
  - **Developed by:**
@@ -109,12 +107,10 @@ print(tokenizer.decode(decoded_ids[0], skip_special_tokens=True))
109
 
110
  | Corpus | # Training/Dev/Test Samples | # Training Tokens (source, target) | # Validation Tokens (source, target) | # Test Tokens (source, target) | Note |
111
  |----------------------------------|-----------------------------|------------------------------------|--------------------------------------|--------------------------------|----------------------------------------------------------------------------------------------------------------------------------------|
112
- | Scientific Abstract-Significance | 3030/200/200 | 707071, 375433 | 45697, 24901 | 46985, 24426 | |
113
- | Editor Abstract | 732/91/92 | 154808, 194721 | 19675, 24421 | 19539, 24332 | |
114
- | Wiki Auto | 28364/1000/1000 | 18239990, 12547272 | 643157, 444034 | 642549, 444883 | We used the ACL version, adopted from Huggingface datasets. The validation and test samples are split from the corpus and kept frozen. |
115
- | CNN/DailyMail | 287113/13368/11490 | - | - | - | We used the 2.0 version, adopted from Huggingface datasets. |
116
- |
117
-
118
 
119
 
120
  ## Setup
@@ -123,12 +119,11 @@ We finetuned the base model (flan-t5-large) on multiple relevant tasks with stan
123
 
124
  | Task | Corpus | Instruction | Optimal samples |
125
  |------------------------------------|----------------------------------|--------------------------------------------|-----------------|
126
- | Scientific Abstract Simplification | Scientific Abstract-Significance | "summarize, simplify, and contextualize: " | 39200 |
127
- | Recontextualization | Editor Abstract | "contextualize: " | 2200 |
128
- | Simplification | Wiki Auto | "simplify: " | 57000 |
129
- | Summarization | CNN/DailyMail | "summarize: " | 165000 |
130
- |------------------------------------|----------------------------------|--------------------------------------------|-----------------|
131
- | Total | Challenge-proportional Mixture | n/a | 263400 |
132
 
133
 
134
  - Multi-instruction tuning: In the stage, we first created a task mixture using "challenge-proportional mixing" method. In a seperate pilot studie, for each task, we finetuned it on a base model and observed the number of samples when validation loss starts to rise. We mixed the samples of each task proportional to its optimal number of samples. A corpus is exhausted before upsampling if the number of total samples is smaller than its optimal number. We finetune with the task mixture (263,400 samples) with the aforementioned template.
@@ -159,16 +154,16 @@ Implementations of SacreBLEU, BERT Score, ROUGLE, METEOR, and SARI are from Hugg
159
  We tested our model on the SAS test set (200 samples). We generate 10 lay summaries based on each sample's abstract. During generation, we used top-p sampling with p=0.9. The mean performance is reported below.
160
 
161
 
162
- | Metrics | SAS |
163
- |----------------|-------------------|
164
- | SacreBLEU↑ | 25.60 |
165
- | BERT Score F1↑ | 90.14 |
166
- | ROUGLE-1↑ | 52.28 |
167
- | ROUGLE-2↑ | 29.61 |
168
- | ROUGLE-L↑ | 38.02 |
169
- | METEOR↑ | 43.75 |
170
- | SARI↑ | 51.96 |
171
- | ARI↓ | 17.04 |
172
  Note: 1. Some generated texts are too short (less than 100 words) to calcualte meaningful ARI. We therefore concatenated adjecent five texts and compute ARI for the 400 longer texts (instead of original 2,000 texts). 2. BERT score, ROUGE, and METEOR are multiplied by 100.
173
 
174
 
 
53
 
54
  Open science has significantly lowered the barriers to scientific papers.
55
  However, reachable research does not mean accessible knowledge. Scientific papers are usually replete with jargon and hard to read. A lay audience would rather trust little stories on social media than read scientific papers. They are not to blame, we human like stories.
56
+ So why do not we "translate" arcane scientific abstracts into simpler yet relevant scientific stories🤗?
57
  Some renowned journals have already taken accessibility into consideration. For example, PNAS asks authors to submit Significance Statements targeting "an undergraduate-educated scientist." Science also includes an editor abstract for a quick dive.
58
 
59
+ In this project, we propose to *rewrite scientific abstracts into understandable scientific stories using AI*.
60
+ To this end, we introduce two new corpora: one comprises PNAS abstract-significance pairs and the other contains editor abstracts from Science.
61
+ We finetune the scientifc abstract simplification task using an encoder-decoder Transformer model (a variant of Flan-T5).
62
+ Our model is first tuned with multiple discrete instructions by mixing four relevant tasks in a challenge-proportional manner.
63
+ Then we continue tuning the model solely with the abstract-significance corpus.
64
  We hope our work can pave the last mile of scientific understanding and let people better enjoy the fruits of open science.
65
 
 
 
 
66
 
67
  - **Model type:** Language model
68
  - **Developed by:**
 
107
 
108
  | Corpus | # Training/Dev/Test Samples | # Training Tokens (source, target) | # Validation Tokens (source, target) | # Test Tokens (source, target) | Note |
109
  |----------------------------------|-----------------------------|------------------------------------|--------------------------------------|--------------------------------|----------------------------------------------------------------------------------------------------------------------------------------|
110
+ | Scientific Abstract-Significance | 3,030/200/200 | 707,071, 375,433 | 45,697, 24,901 | 46,985, 24,426 | - |
111
+ | Editor Abstract | 732/91/92 | 154,808, 194,721 | 19,675, 24,421 | 19,539, 24,332 | - |
112
+ | Wiki Auto | 28,364/1,000/1,000 | 18,239,990, 12,547,272 | 643,157, 444,034 | 642549, 444883 | We used the ACL version, adopted from Huggingface datasets. The validation and test samples are split from the corpus and kept frozen. |
113
+ | CNN/DailyMail | 287,113/13,368/11,490 | - | - | - | We used the 2.0 version, adopted from Huggingface datasets. |
 
 
114
 
115
 
116
  ## Setup
 
119
 
120
  | Task | Corpus | Instruction | Optimal samples |
121
  |------------------------------------|----------------------------------|--------------------------------------------|-----------------|
122
+ | Scientific Abstract Simplification | Scientific Abstract-Significance | "summarize, simplify, and contextualize: " | 39,200 |
123
+ | Recontextualization | Editor Abstract | "contextualize: " | 2,200 |
124
+ | Simplification | Wiki Auto | "simplify: " | 57,000 |
125
+ | Summarization | CNN/DailyMail | "summarize: " | 165,000 |
126
+ | Total | Challenge-proportional Mixture | n/a | 263,400 |
 
127
 
128
 
129
  - Multi-instruction tuning: In the stage, we first created a task mixture using "challenge-proportional mixing" method. In a seperate pilot studie, for each task, we finetuned it on a base model and observed the number of samples when validation loss starts to rise. We mixed the samples of each task proportional to its optimal number of samples. A corpus is exhausted before upsampling if the number of total samples is smaller than its optimal number. We finetune with the task mixture (263,400 samples) with the aforementioned template.
 
154
  We tested our model on the SAS test set (200 samples). We generate 10 lay summaries based on each sample's abstract. During generation, we used top-p sampling with p=0.9. The mean performance is reported below.
155
 
156
 
157
+ | Metrics | SAS |
158
+ |----------------|---------|
159
+ | SacreBLEU↑ | 25.60 |
160
+ | BERT Score F1↑ | 90.14 |
161
+ | ROUGLE-1↑ | 52.28 |
162
+ | ROUGLE-2↑ | 29.61 |
163
+ | ROUGLE-L↑ | 38.02 |
164
+ | METEOR↑ | 43.75 |
165
+ | SARI↑ | 51.96 |
166
+ | ARI↓ | 17.04 |
167
  Note: 1. Some generated texts are too short (less than 100 words) to calcualte meaningful ARI. We therefore concatenated adjecent five texts and compute ARI for the 400 longer texts (instead of original 2,000 texts). 2. BERT score, ROUGE, and METEOR are multiplied by 100.
168
 
169