license: apache-2.0
Fine-tuned input (embed_tokens: Embedding
) and output (lm_head: Linear
) embeddings layers, for use with Birchlabs/llama-13b-stepwise-adapter
.
Prior to finetuning: we grew the vocabulary of the tokenizer and embeddings layers. The new embeddings were average-initialized, and needed training, so we trained them. These are the weights from that training.
Ordinarily a QLoRA finetune of an LLM would not finetune the embed_tokens: Embedding
(you'd need to get a bit creative, because not only have the dimensions changed, but also I don't believe any way has been established to train adapters over Embedding
s).
Nor apparently would it finetune lm_head: Linear
. This is harder than it sounds (i.e. you can't handle it the same way you adapt the other Linear layers), because the dimensions have grown.