haining
/

sas_baseline

@@ -101,14 +101,14 @@ For SAS-baseline, we finetuned Flan-T5 model with the Scientific Abstract-Signif
 | Scientific Abstract-Significance | # Training/Dev/Test Samples | # Training Tokens | # Validation Tokens | # Test Tokens | Automated Readability Index (std.) |
 |----------------------------------|-----------------------------|-------------------|---------------------|---------------|------------------------------------|
-| Abstract                         | 3030/200/200                | 707071            | 45697               | 46985         | 18.68 (2.85)                       |
-| Significance                     | 3030/200/200                | 375433            | 24901               | 24426         | 17.89 (3.05)                       |
 ## Setup
-We finetuned the base model with a standard language modeling objective: the abstracts are sources and the significance statements are targets. We inform the model with a task-spcific prefix ("summarize, simplify, and contextualize: ") during training. The training took roughly 9 hours on two Nvidia A5000 (24GB memory each) GPUs. We saved the checkpoint with the lowest validation loss for inference. We used the AdamW optimizer and a learning rate of 3e-5 with fully sharded data parallel strategy. The model (\~780M parameter) was trained on Nov. 20, 2022.
 Notice, the readability of the signifiance statements is generally lower than the abstracts', but not by a large margin. Our incoming SAS-full model will leverage more corpora for scientific (re)contexutualization, summarization, and simplification.
@@ -130,17 +130,27 @@ Implementations of sacreBLEU, BERT Score, ROUGLE, METEOR, and SARI are from Hugg
 ## Results
-TODO.
 # Contact
-The project is under active maintenance. Please [contact us](mailto:hw56@indiana.edu) for any questions or suggestions.
 # Disclaimer
-The model (SAS-baseline) is created for and focused on making scientific abstracts more accessible. It should not be used or trusted outside of its scope. There is **NO** guarantee that the generated text is perfectly aligned with the research. Resort to human experts or original papers when a decision is critical.
 # Acknowledgement

 | Scientific Abstract-Significance | # Training/Dev/Test Samples | # Training Tokens | # Validation Tokens | # Test Tokens | Automated Readability Index (std.) |
 |----------------------------------|-----------------------------|-------------------|---------------------|---------------|------------------------------------|
+| Abstract                         | 3030/200/200                | 707,071            | 45,697               | 46,985         | 18.68 (2.85)                       |
+| Significance                     | 3030/200/200                | 375,433            | 24,901               | 24,426         | 17.89 (3.05)                       |
 ## Setup
+We finetuned the base model with a standard language modeling objective: the abstracts are sources and the significance statements are targets. We inform the model with a task-spcific prefix ("summarize, simplify, and contextualize: ") during training. The training took roughly 9 hours on two NVIDIA RTX A5000 (24GB memory each) GPUs. We saved the checkpoint with the lowest validation loss for inference. We used the AdamW optimizer and a learning rate of 3e-5 with fully sharded data parallel strategy. The model (\~780M parameter) was trained on Nov. 20, 2022.
 Notice, the readability of the signifiance statements is generally lower than the abstracts', but not by a large margin. Our incoming SAS-full model will leverage more corpora for scientific (re)contexutualization, summarization, and simplification.
 ## Results
+| Metrics        | SAS-baseline |
+|----------------|--------------|
+| sacreBLEU↑     | 20.97        |
+| BERT Score F1↑ | 0.89         |
+| ROUGLE-1↑      | 0.48         |
+| ROUGLE-2↑      | 0.23         |
+| ROUGLE-L↑      | 0.32         |
+| METEOR↑        | 0.39         |
+| SARI↑          | 46.83        |
+| ARI↓*          | 17.12 (1.97) |
+* Note: Half of the generated texts are too short (less than 100 words) to calcualte meaningful ARI. We therefore concatenated adjecent two texts and compute ARI for the 100 texts (instead of original 200 texts).
 # Contact
+Please [contact us](mailto:hw56@indiana.edu) for any questions or suggestions.
 # Disclaimer
+The model (SAS-baseline) is created for making scientific abstracts more accessible. Its outputs should not be used or trusted outside of its scope. There is **NO** guarantee that the generated text is perfectly aligned with the research. Resort to human experts or original papers when a decision is critical.
 # Acknowledgement