This is roberta-base trained on DNA promoter sequences of plants and fine-tuned on gene expression values (normalized to tpm) in 8 tissues of maize cultivars corresponding to their individual promoter sequences. Currently, this model is trained on a subset of the total data (FloraBERT-"small"). There are 47 million parameters in this model.
References: