can protgpt2 be used in "Fill-Mask" tasks?

#10
by likun - opened

Thanks for the excellent work!

I am wondering if that protgpt2 can be used in "Fill-Mask" tasks?

to be more specific, say I have a sequence:

"MTYKLIINGKTLKGETTTEAVDA"

now, i'd like to mutate T2 site, that is filling the T2 site blank with protgpt2.

"M ? YKLIINGKTLKGETTTEAVDA"

i have tied " pipeline('fill-mask', model="nferruz/ProtGPT2")"

got:

""fill-mask", self.model.base_model_prefix, "The tokenizer does not define a `mask_token"

this is my first time using a NLP model, sorry about the naive question.

thanks.

Hi Likun,

As it is, ProtGPT2 cannot be used in a fill-mask problem since it was trained with an autoregressive objective (predict next token). It could be done with some fine-tuning, but I haven’t done this yet.

For your problem, you could directly use a denoising autoencoding model, like ESM1 and ESM2, or ProtT5. They are many more, and they all publicly available. Let me know if you have questions if you give it a try!

Noelia

thanks! this info is really helpful!

Likun

likun changed discussion status to closed

Sign up or log in to comment