Expanding the maximum input size (2048 tokens) of a pre-trained Geneformer?
#262
by
patrick-yu
- opened
Just wondering if there's any way to expand the maximum 2048 token length for Geneformer (e.g. for bigger inputs/datasets)?
Or perhaps is there some easy way to use/pretrain a different (e.g. BERT-like) model that accommodates >2048 tokens in the input but still utilizes some of the same learned weights from the pretrained (6L/12L) Geneformer?
Thanks in advance!
Thank you for your interest in Geneformer! Yes, you can use the pretraining code in the example on this repository and increase the maximum input size to pretrain a model with a larger input size with Genecorpus-30M.
ctheodoris
changed discussion status to
closed