Merge branch 'main' of https://huggingface.co/Gurveer05/FloraBERT
Browse files
README.md
CHANGED
@@ -11,4 +11,20 @@ Currently, this model is trained on **7.1 million Plant DNA promoter sequences**
|
|
11 |
|
12 |
References:
|
13 |
- [GitHub Repository](https://github.com/gurveervirk/florabert/)
|
14 |
-
- [Kaggle Dataset](https://www.kaggle.com/datasets/gurveersinghvirk/florabert-base)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
|
12 |
References:
|
13 |
- [GitHub Repository](https://github.com/gurveervirk/florabert/)
|
14 |
+
- [Kaggle Dataset](https://www.kaggle.com/datasets/gurveersinghvirk/florabert-base)
|
15 |
+
|
16 |
+
To get predictions from **DNA promoter sequences of plants**, add your text file containing the sequences (1 sequence per line) to the data folder and call the main() function from prediction.py with your file name.
|
17 |
+
For example
|
18 |
+
|
19 |
+
- Update ```main("test.txt")``` with your file name
|
20 |
+
- Now, run ```python prediction.py```
|
21 |
+
|
22 |
+
The results will be visible in tabular format in the console.
|
23 |
+
For example,
|
24 |
+
| tassel | base | anther | middle | ear | shoot | tip | root |
|
25 |
+
|--------|--------|--------|--------|--------|--------|--------|--------|
|
26 |
+
| 0.4235 | 0.3031 | 0.3657 | 0.3663 | 0.2787 | 0.3809 | 0.4167 | 0.2861 |
|
27 |
+
|
28 |
+
The values in the table correspond to TPM values for the tissues in the plants. TPM values are normalized gene expression values.
|
29 |
+
|
30 |
+
Both models can also be further used for more pretraining and finetuning. (Check references for further information)
|