Update README.md
Browse files
README.md
CHANGED
@@ -23,6 +23,15 @@ the same) and initializing it as follows:
|
|
23 |
- every L3 token that decodes and re-encodes to multiple Qwen2 token is initialized with the mean of those embeddings
|
24 |
- there are no L3 tokens that cannot be translated to one or more Qwen2 tokens (both vocabularies are complete).
|
25 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
Swapping the vocabulary with the above method yields a mostly coherent but still very confused model. It especially
|
27 |
struggles with numbers, and of course the embeddings for the Llama-3 control tokens do not have the significance they
|
28 |
would in an instruct-tuned model.
|
|
|
23 |
- every L3 token that decodes and re-encodes to multiple Qwen2 token is initialized with the mean of those embeddings
|
24 |
- there are no L3 tokens that cannot be translated to one or more Qwen2 tokens (both vocabularies are complete).
|
25 |
|
26 |
+
```python
|
27 |
+
for idx in range(target_vocab_size):
|
28 |
+
decode = tokenizer_target.decode(torch.tensor(idx, dtype = torch.long), decode_special_tokens = True)
|
29 |
+
encode = tokenizer_source.encode(decode, add_special_tokens = False, return_tensors = "pt")
|
30 |
+
new_emb[idx] = old_emb[encode.flatten()].mean(dim = 0)
|
31 |
+
new_head[idx] = old_head[encode.flatten()].mean(dim = 0)
|
32 |
+
```
|
33 |
+
Full script is [here](https://huggingface.co/turboderp/Qwama-0.5B-Instruct/blob/main/vocab_transplant.py).
|
34 |
+
|
35 |
Swapping the vocabulary with the above method yields a mostly coherent but still very confused model. It especially
|
36 |
struggles with numbers, and of course the embeddings for the Llama-3 control tokens do not have the significance they
|
37 |
would in an instruct-tuned model.
|