Text2Text Generation
Transformers
PyTorch
bart
feature-extraction
Inference Endpoints

Reproduce the bart-base + test unlimiformer on the gov-report datasets

#1
by Akagi951620 - opened

Hi! I used the model finetuned on the Gov-Report training dataset in this repository to evaluate the RUGUE-2 on validation and test datasets. However, I have not reached the reported results of Table 3 in the paper, 15.3% vs 19.6%. I have found a solution https://github.com/abertsch72/unlimiformer/issues/17, but it still needs to fine-tune the fine-tuned model. I am confused.

Hi, and sorry for the late response-- I didn't know about the discussion feature here :)

If you use the model without cloning the unlimiformer repo and using the --test_unlimiformer argument, it will not use the Unlimiformer method and the results will be lower. The set of arguments that Uri provides in the issue you linked does not finetune the model (It provides some training arguments like learning rate and train batch size, but these are ignored because it does not set do_train=True). If you run with that set of arguments, do you still see a difference in ROUGE from the table?

I used the public github repo and reproduce the results of GovReport and ScreenSumm. However, the result of the BookSum dataset has not been reproduced. Thank you very much.

Thanks for replicating, I appreciate it! Can you share what arguments you used for BookSum and I'll look into this further?

Sign up or log in to comment