monsoon-nlp
commited on
Commit
•
f5619fd
1
Parent(s):
90d1148
Update README.md
Browse files
README.md
CHANGED
@@ -11,7 +11,11 @@ language:
|
|
11 |
|
12 |
# tinyllama-proteinpretrain-quinoa
|
13 |
|
14 |
-
|
15 |
protein sequences) of GreenBeing-Proteins dataset.
|
16 |
|
|
|
|
|
|
|
|
|
17 |
More details TBD
|
|
|
11 |
|
12 |
# tinyllama-proteinpretrain-quinoa
|
13 |
|
14 |
+
Full model finetuning of TinyLLaMA-1.1B on the "research" split (quinoa
|
15 |
protein sequences) of GreenBeing-Proteins dataset.
|
16 |
|
17 |
+
Notes: pretraining only on sequences leads the model to only generate protein sequences, eventually repeating VVVV ot KKKK.
|
18 |
+
- This model may be replaced with mixed training (bio/chem text and protein).
|
19 |
+
- This model might need "biotokens" to represent the amino acids instead of using the existing tokenizer.
|
20 |
+
|
21 |
More details TBD
|