tinyllama-proteinpretrain-quinoa

Full model finetuning of TinyLLaMA-1.1B on the "research" split (quinoa protein sequences) of GreenBeing-Proteins dataset.

Notes: pretraining only on sequences leads the model to only generate protein sequences, eventually repeating VVVV ot KKKK.

This model may be replaced with mixed training (bio/chem text and protein).
This model might need "biotokens" to represent the amino acids instead of using the existing tokenizer.

More details TBD

Safetensors

Model size

1.1B params

Tensor type

F32

Inference Providers NEW

This model is not currently available via any of the supported Inference Providers.

Model tree for monsoon-nlp/tinyllama-proteinpretrain-quinoa

Base model

Finetuned

(92)

this model