fine-tuning the model with orphan sequences

by rqh - opened

Dear Authors,

Thank you for the excellent work with ZymCTRL. I'm trying to fine-tune the model so that it generates homologous sequences of a target sequence. However, the target sequence is an orphan sequence, and only a dozen high-identity sequences can be used as dataset for fine-tuning. Is it possible to fine tune the model with this dataset? If yes, then could you please guide with how to implement this?

Thank you so much!

AI for protein design org

hi rqh,

You can fine-tune your dataset with the info in the documentation. There's no rule of thumb for how many sequences are the minimum, although it would be good to have at least 100. I'd still suggest you give it a try even if you only have, let's say, 20. One thing you can do is to fine-tune each sequence and its reverse.

Hope this helps!

Sign up or log in to comment