Birchlabs
/

llama-13b-stepwise-embeddings

Model card Files Files and versions Community

llama-13b-stepwise-embeddings / README.md

Birchlabs's picture

Update README.md

e8d35e5 about 1 year ago

|

No virus

886 Bytes

	---
	license: apache-2.0
	---

	Fine-tuned input (`embed_tokens: Embedding`) and output (`lm_head: Linear`) embeddings layers, for use with [`Birchlabs/llama-13b-stepwise-adapter`](Birchlabs/llama-13b-stepwise-adapter).

	Prior to finetuning: we grew the vocabulary of the tokenizer and embeddings layers. The new embeddings were average-initialized, and needed training, so we trained them. These are the weights from that training.

	Ordinarily a QLoRA finetune of an LLM would not finetune the `embed_tokens: Embedding` (you'd need to get a bit creative, because not only have the dimensions changed, but also I don't believe any way has been established to train _adapters_ over `Embedding`s).
	Nor apparently would it finetune `lm_head: Linear`. This is harder than it sounds (i.e. you can't handle it the same way you adapt the other Linear layers), because the dimensions have grown.