abertsch/bart-base-govreport · Reproduce the bart-base + test unlimiformer on the gov-report datasets

Jun 26, 2023

Hi! I used the model finetuned on the Gov-Report training dataset in this repository to evaluate the RUGUE-2 on validation and test datasets. However, I have not reached the reported results of Table 3 in the paper, 15.3% vs 19.6%. I have found a solution https://github.com/abertsch72/unlimiformer/issues/17, but it still needs to fine-tune the fine-tuned model. I am confused.

abertsch

Owner Jun 28, 2023

Hi, and sorry for the late response-- I didn't know about the discussion feature here :)

If you use the model without cloning the unlimiformer repo and using the --test_unlimiformer argument, it will not use the Unlimiformer method and the results will be lower. The set of arguments that Uri provides in the issue you linked does not finetune the model (It provides some training arguments like learning rate and train batch size, but these are ignored because it does not set do_train=True). If you run with that set of arguments, do you still see a difference in ROUGE from the table?

Akagi951620

Jul 11, 2023

I used the public github repo and reproduce the results of GovReport and ScreenSumm. However, the result of the BookSum dataset has not been reproduced. Thank you very much.

abertsch

Owner Jul 21, 2023

Thanks for replicating, I appreciate it! Can you share what arguments you used for BookSum and I'll look into this further?