There is no official 22b model, this is just a weird experiment, and any potential benefits of doing this have not been validated

https://huggingface.co/chargoddard/llama2-22b-blocktriangular trained one one epoch of 52k rows of Stanford Alpaca. About 11 hours on a 3090.

I had trouble with training using the other 22b method with BLOCK_DIAGONAL=True as done in https://huggingface.co/chargoddard/llama2-22b, but with this method, this is the first time I've been able to target all modules without breaking the output.

target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "up_proj", "gate_proj", "down_proj"]

Trained at 5e-5 with r=32. For more info see https://wandb.ai/nkpz/huggingface/runs/3oy5nbtv/workspace?workspace=user-nkpz

It's been responding coherently enough that I would need to run some objective benchmarks to determine if this is better/worse than stock llama 13b

Downloads last month
33
Safetensors
Model size
21.8B params
Tensor type
F32
·
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.