monsoon-nlp commited on
Commit
f5619fd
1 Parent(s): 90d1148

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -11,7 +11,11 @@ language:
11
 
12
  # tinyllama-proteinpretrain-quinoa
13
 
14
- Continued pretraining of TinyLLaMA-1.1B on the "research" split (quinoa
15
  protein sequences) of GreenBeing-Proteins dataset.
16
 
 
 
 
 
17
  More details TBD
 
11
 
12
  # tinyllama-proteinpretrain-quinoa
13
 
14
+ Full model finetuning of TinyLLaMA-1.1B on the "research" split (quinoa
15
  protein sequences) of GreenBeing-Proteins dataset.
16
 
17
+ Notes: pretraining only on sequences leads the model to only generate protein sequences, eventually repeating VVVV ot KKKK.
18
+ - This model may be replaced with mixed training (bio/chem text and protein).
19
+ - This model might need "biotokens" to represent the amino acids instead of using the existing tokenizer.
20
+
21
  More details TBD