monsoon-nlp
/

tinyllama-proteinpretrain-quinoa

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

monsoon-nlp commited on Apr 4

Commit

f5619fd

•

1 Parent(s): 90d1148

Update README.md

Files changed (1) hide show

README.md +5 -1

README.md CHANGED Viewed

@@ -11,7 +11,11 @@ language:
 # tinyllama-proteinpretrain-quinoa
-Continued pretraining of TinyLLaMA-1.1B on the "research" split (quinoa
 protein sequences) of GreenBeing-Proteins dataset.
 More details TBD

 # tinyllama-proteinpretrain-quinoa
+Full model finetuning of TinyLLaMA-1.1B on the "research" split (quinoa
 protein sequences) of GreenBeing-Proteins dataset.
+Notes: pretraining only on sequences leads the model to only generate protein sequences, eventually repeating VVVV ot KKKK.
+- This model may be replaced with mixed training (bio/chem text and protein).
+- This model might need "biotokens" to represent the amino acids instead of using the existing tokenizer.
 More details TBD