Birchlabs commited on
Commit
e8d35e5
1 Parent(s): cb187f5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -0
README.md CHANGED
@@ -1,3 +1,10 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ Fine-tuned input (`embed_tokens: Embedding`) and output (`lm_head: Linear`) embeddings layers, for use with [`Birchlabs/llama-13b-stepwise-adapter`](Birchlabs/llama-13b-stepwise-adapter).
6
+
7
+ Prior to finetuning: we grew the vocabulary of the tokenizer and embeddings layers. The new embeddings were average-initialized, and needed training, so we trained them. These are the weights from that training.
8
+
9
+ Ordinarily a QLoRA finetune of an LLM would not finetune the `embed_tokens: Embedding` (you'd need to get a bit creative, because not only have the dimensions changed, but also I don't believe any way has been established to train _adapters_ over `Embedding`s).
10
+ Nor apparently would it finetune `lm_head: Linear`. This is harder than it sounds (i.e. you can't handle it the same way you adapt the other Linear layers), because the dimensions have grown.