metrics:
- accuracy
- mse
library_name: transformers
tags:
- biology
This is roberta-base trained on DNA promoter sequences of plants and fine-tuned on gene expression values (normalized to tpm) in 8 tissues of maize cultivars corresponding to their individual promoter sequences. Currently, this model is trained on 11.7 million Plant DNA promoter sequences. There are 47 million parameters in this model.
References:
To get predictions from DNA promoter sequences of plants from console / command-line directly, add your text file containing the sequences (1 sequence per line) to the data folder and call the main() function from prediction.py with your file name. For example:
- Update
main("test.txt")
with your file name - Now, run
python prediction.py
The results will be visible in tabular format in the console. For example,
tassel | base | anther | middle | ear | shoot | tip | root |
---|---|---|---|---|---|---|---|
8.65 | 7.901 | 2.004 | 8.4001 | 7.523 | 6.23 | 9.0112 | 8.221 |
The values in the table correspond to TPM values for the tissues in the plants. TPM values are normalized gene expression values.
Both models can also be further used for more pretraining and finetuning. (Check references for further information)