Is the datasets for foundational model pre-training publicly accessible?

#10
by JayceCeleste - opened

Hello, thanks for the great paper, and publishing the model here!

I noticed you mentioned in the paper that "To investigate the impact of species diversity on genome foundational models, we’ve compiled and made publicly available two datasets for foundational model pre-training: the human genome and the multi-species genome. "

I tried to find it but failed, only to find the GUE dataset.

Could you please provide a link for it? Thanks : )

Sign up or log in to comment