File size: 1,072 Bytes
75f77c3 bb7a527 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
---
license: other
---
Llama 2 Chronos 13b x Llama 1 Chronos 33b x Alpaca
This is a frankenllama model based on the technique in https://huggingface.co/chargoddard/llama2-22b
I built my base 22b model by using https://huggingface.co/Oniichat/llama2-base-chronos-13b-merge as a base, and https://huggingface.co/elinas/chronos-33b as a donor.
I then trained a qlora on the Alpaca dataset with the default peft configuration from https://github.com/facebookresearch/llama-recipes/blob/main/quickstart.ipynb
This is the result of baking in that adapter.
This configuration only targets `q_proj` and `v_proj` and uses `r=8`. I was expecting to need to add more targets and increase `r` to get significant improvements, but I was surprised by the quality of its context awareness, and I'm starting to think that maybe a 32mb lora is all it takes to get decent results in 22b.
I will keep playing with other peft configurations and see where that gets me next.
If anyone wants the chronos 22b base model (requires fine tuning) or the adapter, lmk in community discussions. |