Birchlabs's picture
Update README.md
e8d35e5
|
raw
history blame
No virus
886 Bytes
---
license: apache-2.0
---
Fine-tuned input (`embed_tokens: Embedding`) and output (`lm_head: Linear`) embeddings layers, for use with [`Birchlabs/llama-13b-stepwise-adapter`](Birchlabs/llama-13b-stepwise-adapter).
Prior to finetuning: we grew the vocabulary of the tokenizer and embeddings layers. The new embeddings were average-initialized, and needed training, so we trained them. These are the weights from that training.
Ordinarily a QLoRA finetune of an LLM would not finetune the `embed_tokens: Embedding` (you'd need to get a bit creative, because not only have the dimensions changed, but also I don't believe any way has been established to train _adapters_ over `Embedding`s).
Nor apparently would it finetune `lm_head: Linear`. This is harder than it sounds (i.e. you can't handle it the same way you adapt the other Linear layers), because the dimensions have grown.