Finetuning on custom dataset

#4
by mdk25 - opened

Hi!

I am trying to figure out how to finetune this model on some custom data that I have, with some new relations. However it is still not very clear to me on how to accomplish this. Is there any code or information available on how to do this?

Thanks.

Babelscape org

Hi there! Thanks for your interest in training REBEL.

There are many ways you can do so, you could just build your own dataset in the same format as REBEL, ie:

Input_text: "Lorem ipsum"
output_text: " Lorem ipsum relation name"

I suggest reading the paper to understand how linearization works but it is fairly easy once understood. You can also take a look at the repo where there are examples on how to create a HF dataset for commonly used RE datasets (https://github.com/Babelscape/rebel/tree/main/datasets).

Then to train the system you can use your preferred method. In the repo you will find a pytorch lightning setup, albeit it may be a bit outdated at this point. You can use HF trainer or any other seq2seq training setup and make sure to finetune on top of "Babelscape/rebel-large" instead of let's say "facebook/bart-large". Make sure to follow the same input/output format as in REBEL and it should work. Of course, the more "similar" your relation types are to the ones seen at pretraining, the better it will work.

Be aware of issues such as catastrophic forgetting, ie. the model will learn to generate triplets with your new relations but will forget about the ones it currently generates unless you also provide them at finetuning time.

Best of luck,
Pere-Lluis.

PereLluis13 changed discussion status to closed

Sign up or log in to comment