Errors loading in OOba

#13
by vdruts - opened

Any help?

Is this a Pytorch version incompatibility?

--

Traceback (most recent call last):
File "C:\Users\vdrut\Deep\text-diffusion-webui\text-generation-webui\server.py", line 275, in
shared.model, shared.tokenizer = load_model(shared.model_name)
File "C:\Users\vdrut\Deep\text-diffusion-webui\text-generation-webui\modules\models.py", line 102, in load_model
model = load_quantized(model_name)
File "C:\Users\vdrut\Deep\text-diffusion-webui\text-generation-webui\modules\GPTQ_loader.py", line 114, in load_quantized
model = load_quant(str(path_to_model), str(pt_path), shared.args.wbits, shared.args.groupsize, kernel_switch_threshold=threshold)
File "C:\Users\vdrut\Deep\text-diffusion-webui\text-generation-webui\modules\GPTQ_loader.py", line 45, in _load_quant
model.load_state_dict(torch.load(checkpoint))
File "C:\Users\vdrut\Deep\text-diffusion-webui\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LlamaForCausalLM:
Missing key(s) in state_dict: "model.layers.0.self_attn.k_proj.qzeros", "model.layers.0.self_attn.o_proj.qzeros", "model.layers.0.self_attn.q_proj.qzeros", "model.layers.0.self_attn.v_proj.qzeros", "model.layers.0.mlp.down_proj.qzeros", "model.layers.0.mlp.gate_proj.qzeros", "model.layers.0.mlp.up_proj.qzeros", "model.layers.1.self_attn.k_proj.qzeros", "model.layers.1.self_attn.o_proj.qzeros", "model.layers.1.self_attn.q_proj.qzeros", "model.layers.1.self_attn.v_proj.qzeros", "model.layers.1.mlp.down_proj.qzeros", "model.layers.1.mlp.gate_proj.qzeros", "model.layers.1.mlp.up_proj.qzeros", "model.layers.2.self_attn.k_proj.qzeros", "model.layers.2.self_attn.o_proj.qzeros", "model.layers.2.self_attn.q_proj.qzeros", "model.layers.2.self_attn.v_proj.qzeros", "model.layers.2.mlp.down_proj.qzeros", "model.layers.2.mlp.gate_proj.qzeros", "model.layers.2.mlp.up_proj.qzeros", "model.layers.3.self_attn.k_proj.qzeros", "model.layers.3.self_attn.o_proj.qzeros", "model.layers.3.self_attn.q_proj.qzeros", "model.layers.3.self_attn.v_proj.qzeros", "model.layers.3.mlp.down_proj.qzeros", "model.layers.3.mlp.gate_proj.qzeros", "model.layers.3.mlp.up_proj.qzeros", "model.layers.4.self_attn.k_proj.qzeros", "model.layers.4.self_attn.o_proj.qzeros", "model.layers.4.self_attn.q_proj.qzeros", "model.layers.4.self_attn.v_proj.qzeros", "model.layers.4.mlp.down_proj.qzeros", "model.layers.4.mlp.gate_proj.qzeros", "model.layers.4.mlp.up_proj.qzeros", "model.layers.5.self_attn.k_proj.qzeros", "model.layers.5.self_attn.o_proj.qzeros", "model.layers.5.self_attn.q_proj.qzeros", "model.layers.5.self_attn.v_proj.qzeros", "model.layers.5.mlp.down_proj.qzeros", "model.layers.5.mlp.gate_proj.qzeros", "model.layers.5.mlp.up_proj.qzeros", "model.layers.6.self_attn.k_proj.qzeros", "model.layers.6.self_attn.o_proj.qzeros", "model.layers.6.self_attn.q_proj.qzeros", "model.layers.6.self_attn.v_proj.qzeros", "model.layers.6.mlp.down_proj.qzeros", "model.layers.6.mlp.gate_proj.qzeros", "model.layers.6.mlp.up_proj.qzeros", "model.layers.7.self_attn.k_proj.qzeros", "model.layers.7.self_attn.o_proj.qzeros", "model.layers.7.self_attn.q_proj.qzeros", "model.layers.7.self_attn.v_proj.qzeros", "model.layers.7.mlp.down_proj.qzeros", "model.layers.7.mlp.gate_proj.qzeros", "model.layers.7.mlp.up_proj.qzeros", "model.layers.8.self_attn.k_proj.qzeros", "model.layers.8.self_attn.o_proj.qzeros", "model.layers.8.self_attn.q_proj.qzeros", "model.layers.8.self_attn.v_proj.qzeros", "model.layers.8.mlp.down_proj.qzeros", "model.layers.8.mlp.gate_proj.qzeros", "model.layers.8.mlp.up_proj.qzeros", "model.layers.9.self_attn.k_proj.qzeros", "model.layers.9.self_attn.o_proj.qzeros", "model.layers.9.self_attn.q_proj.qzeros", "model.layers.9.self_attn.v_proj.qzeros", "model.layers.9.mlp.down_proj.qzeros", "model.layers.9.mlp.gate_proj.qzeros", "model.layers.9.mlp.up_proj.qzeros", "model.layers.10.self_attn.k_proj.qzeros", "model.layers.10.self_attn.o_proj.qzeros", "model.layers.10.self_attn.q_proj.qzeros", "model.layers.10.self_attn.v_proj.qzeros", "model.layers.10.mlp.down_proj.qzeros", "model.layers.10.mlp.gate_proj.qzeros", "model.layers.10.mlp.up_proj.qzeros", "model.layers.11.self_attn.k_proj.qzeros", "model.layers.11.self_attn.o_proj.qzeros", "model.layers.11.self_attn.q_proj.qzeros", "model.layers.11.self_attn.v_proj.qzeros", "model.layers.11.mlp.down_proj.qzeros", "model.layers.11.mlp.gate_proj.qzeros", "model.layers.11.mlp.up_proj.qzeros", "model.layers.12.self_attn.k_proj.qzeros", "model.layers.12.self_attn.o_proj.qzeros", "model.layers.12.self_attn.q_proj.qzeros", "model.layers.12.self_attn.v_proj.qzeros", "model.layers.12.mlp.down_proj.qzeros", "model.layers.12.mlp.gate_proj.qzeros", "model.layers.12.mlp.up_proj.qzeros", "model.layers.13.self_attn.k_proj.qzeros", "model.layers.13.self_attn.o_proj.qzeros", "model.layers.13.self_attn.q_proj.qzeros", "model.layers.13.self_attn.v_proj.qzeros", "model.layers.13.mlp.down_proj.qzeros", "model.layers.13.mlp.gate_proj.qzeros", "model.layers.13.mlp.up_proj.qzeros", "model.layers.14.self_attn.k_proj.qzeros", "model.layers.14.self_attn.o_proj.qzeros", "model.layers.14.self_attn.q_proj.qzeros", "model.layers.14.self_attn.v_proj.qzeros", "model.layers.14.mlp.down_proj.qzeros", "model.layers.14.mlp.gate_proj.qzeros", "model.layers.14.mlp.up_proj.qzeros", "model.layers.15.self_attn.k_proj.qzeros", "model.layers.15.self_attn.o_proj.qzeros", "model.layers.15.self_attn.q_proj.qzeros", "model.layers.15.self_attn.v_proj.qzeros", "model.layers.15.mlp.down_proj.qzeros", "model.layers.15.mlp.gate_proj.qzeros", "model.layers.15.mlp.up_proj.qzeros", "model.layers.16.self_attn.k_proj.qzeros", "model.layers.16.self_attn.o_proj.qzeros", "model.layers.16.self_attn.q_proj.qzeros", "model.layers.16.self_attn.v_proj.qzeros", "model.layers.16.mlp.down_proj.qzeros", "model.layers.16.mlp.gate_proj.qzeros", "model.layers.16.mlp.up_proj.qzeros", "model.layers.17.self_attn.k_proj.qzeros", "model.layers.17.self_attn.o_proj.qzeros", "model.layers.17.self_attn.q_proj.qzeros", "model.layers.17.self_attn.v_proj.qzeros", "model.layers.17.mlp.down_proj.qzeros", "model.layers.17.mlp.gate_proj.qzeros", "model.layers.17.mlp.up_proj.qzeros", "model.layers.18.self_attn.k_proj.qzeros", "model.layers.18.self_attn.o_proj.qzeros", "model.layers.18.self_attn.q_proj.qzeros", "model.layers.18.self_attn.v_proj.qzeros", "model.layers.18.mlp.down_proj.qzeros", "model.layers.18.mlp.gate_proj.qzeros", "model.layers.18.mlp.up_proj.qzeros", "model.layers.19.self_attn.k_proj.qzeros", "model.layers.19.self_attn.o_proj.qzeros", "model.layers.19.self_attn.q_proj.qzeros", "model.layers.19.self_attn.v_proj.qzeros", "model.layers.19.mlp.down_proj.qzeros", "model.layers.19.mlp.gate_proj.qzeros", "model.layers.19.mlp.up_proj.qzeros", "model.layers.20.self_attn.k_proj.qzeros", "model.layers.20.self_attn.o_proj.qzeros", "model.layers.20.self_attn.q_proj.qzeros", "model.layers.20.self_attn.v_proj.qzeros", "model.layers.20.mlp.down_proj.qzeros", "model.layers.20.mlp.gate_proj.qzeros", "model.layers.20.mlp.up_proj.qzeros", "model.layers.21.self_attn.k_proj.qzeros", "model.layers.21.self_attn.o_proj.qzeros", "model.layers.21.self_attn.q_proj.qzeros", "model.layers.21.self_attn.v_proj.qzeros", "model.layers.21.mlp.down_proj.qzeros", "model.layers.21.mlp.gate_proj.qzeros", "model.layers.21.mlp.up_proj.qzeros", "model.layers.22.self_attn.k_proj.qzeros", "model.layers.22.self_attn.o_proj.qzeros", "model.layers.22.self_attn.q_proj.qzeros", "model.layers.22.self_attn.v_proj.qzeros", "model.layers.22.mlp.down_proj.qzeros", "model.layers.22.mlp.gate_proj.qzeros", "model.layers.22.mlp.up_proj.qzeros", "model.layers.23.self_attn.k_proj.qzeros", "model.layers.23.self_attn.o_proj.qzeros", "model.layers.23.self_attn.q_proj.qzeros", "model.layers.23.self_attn.v_proj.qzeros", "model.layers.23.mlp.down_proj.qzeros", "model.layers.23.mlp.gate_proj.qzeros", "model.layers.23.mlp.up_proj.qzeros", "model.layers.24.self_attn.k_proj.qzeros", "model.layers.24.self_attn.o_proj.qzeros", "model.layers.24.self_attn.q_proj.qzeros", "model.layers.24.self_attn.v_proj.qzeros", "model.layers.24.mlp.down_proj.qzeros", "model.layers.24.mlp.gate_proj.qzeros", "model.layers.24.mlp.up_proj.qzeros", "model.layers.25.self_attn.k_proj.qzeros", "model.layers.25.self_attn.o_proj.qzeros", "model.layers.25.self_attn.q_proj.qzeros", "model.layers.25.self_attn.v_proj.qzeros", "model.layers.25.mlp.down_proj.qzeros", "model.layers.25.mlp.gate_proj.qzeros", "model.layers.25.mlp.up_proj.qzeros", "model.layers.26.self_attn.k_proj.qzeros", "model.layers.26.self_attn.o_proj.qzeros", "model.layers.26.self_attn.q_proj.qzeros", "model.layers.26.self_attn.v_proj.qzeros", "model.layers.26.mlp.down_proj.qzeros", "model.layers.26.mlp.gate_proj.qzeros", "model.layers.26.mlp.up_proj.qzeros", "model.layers.27.self_attn.k_proj.qzeros", "model.layers.27.self_attn.o_proj.qzeros", "model.layers.27.self_attn.q_proj.qzeros", "model.layers.27.self_attn.v_proj.qzeros", "model.layers.27.mlp.down_proj.qzeros", "model.layers.27.mlp.gate_proj.qzeros", "model.layers.27.mlp.up_proj.qzeros", "model.layers.28.self_attn.k_proj.qzeros", "model.layers.28.self_attn.o_proj.qzeros", "model.layers.28.self_attn.q_proj.qzeros", "model.layers.28.self_attn.v_proj.qzeros", "model.layers.28.mlp.down_proj.qzeros", "model.layers.28.mlp.gate_proj.qzeros", "model.layers.28.mlp.up_proj.qzeros", "model.layers.29.self_attn.k_proj.qzeros", "model.layers.29.self_attn.o_proj.qzeros", "model.layers.29.self_attn.q_proj.qzeros", "model.layers.29.self_attn.v_proj.qzeros", "model.layers.29.mlp.down_proj.qzeros", "model.layers.29.mlp.gate_proj.qzeros", "model.layers.29.mlp.up_proj.qzeros", "model.layers.30.self_attn.k_proj.qzeros", "model.layers.30.self_attn.o_proj.qzeros", "model.layers.30.self_attn.q_proj.qzeros", "model.layers.30.self_attn.v_proj.qzeros", "model.layers.30.mlp.down_proj.qzeros", "model.layers.30.mlp.gate_proj.qzeros", "model.layers.30.mlp.up_proj.qzeros", "model.layers.31.self_attn.k_proj.qzeros", "model.layers.31.self_attn.o_proj.qzeros", "model.layers.31.self_attn.q_proj.qzeros", "model.layers.31.self_attn.v_proj.qzeros", "model.layers.31.mlp.down_proj.qzeros", "model.layers.31.mlp.gate_proj.qzeros", "model.layers.31.mlp.up_proj.qzeros", "model.layers.32.self_attn.k_proj.qzeros", "model.layers.32.self_attn.o_proj.qzeros", "model.layers.32.self_attn.q_proj.qzeros", "model.layers.32.self_attn.v_proj.qzeros", "model.layers.32.mlp.down_proj.qzeros", "model.layers.32.mlp.gate_proj.qzeros", "model.layers.32.mlp.up_proj.qzeros", "model.layers.33.self_attn.k_proj.qzeros", "model.layers.33.self_attn.o_proj.qzeros", "model.layers.33.self_attn.q_proj.qzeros", "model.layers.33.self_attn.v_proj.qzeros", "model.layers.33.mlp.down_proj.qzeros", "model.layers.33.mlp.gate_proj.qzeros", "model.layers.33.mlp.up_proj.qzeros", "model.layers.34.self_attn.k_proj.qzeros", "model.layers.34.self_attn.o_proj.qzeros", "model.layers.34.self_attn.q_proj.qzeros", "model.layers.34.self_attn.v_proj.qzeros", "model.layers.34.mlp.down_proj.qzeros", "model.layers.34.mlp.gate_proj.qzeros", "model.layers.34.mlp.up_proj.qzeros", "model.layers.35.self_attn.k_proj.qzeros", "model.layers.35.self_attn.o_proj.qzeros", "model.layers.35.self_attn.q_proj.qzeros", "model.layers.35.self_attn.v_proj.qzeros", "model.layers.35.mlp.down_proj.qzeros", "model.layers.35.mlp.gate_proj.qzeros", "model.layers.35.mlp.up_proj.qzeros", "model.layers.36.self_attn.k_proj.qzeros", "model.layers.36.self_attn.o_proj.qzeros", "model.layers.36.self_attn.q_proj.qzeros", "model.layers.36.self_attn.v_proj.qzeros", "model.layers.36.mlp.down_proj.qzeros", "model.layers.36.mlp.gate_proj.qzeros", "model.layers.36.mlp.up_proj.qzeros", "model.layers.37.self_attn.k_proj.qzeros", "model.layers.37.self_attn.o_proj.qzeros", "model.layers.37.self_attn.q_proj.qzeros", "model.layers.37.self_attn.v_proj.qzeros", "model.layers.37.mlp.down_proj.qzeros", "model.layers.37.mlp.gate_proj.qzeros", "model.layers.37.mlp.up_proj.qzeros", "model.layers.38.self_attn.k_proj.qzeros", "model.layers.38.self_attn.o_proj.qzeros", "model.layers.38.self_attn.q_proj.qzeros", "model.layers.38.self_attn.v_proj.qzeros", "model.layers.38.mlp.down_proj.qzeros", "model.layers.38.mlp.gate_proj.qzeros", "model.layers.38.mlp.up_proj.qzeros", "model.layers.39.self_attn.k_proj.qzeros", "model.layers.39.self_attn.o_proj.qzeros", "model.layers.39.self_attn.q_proj.qzeros", "model.layers.39.self_attn.v_proj.qzeros", "model.layers.39.mlp.down_proj.qzeros", "model.layers.39.mlp.gate_proj.qzeros", "model.layers.39.mlp.up_proj.qzeros", "model.layers.40.self_attn.k_proj.qzeros", "model.layers.40.self_attn.o_proj.qzeros", "model.layers.40.self_attn.q_proj.qzeros", "model.layers.40.self_attn.v_proj.qzeros", "model.layers.40.mlp.down_proj.qzeros", "model.layers.40.mlp.gate_proj.qzeros", "model.layers.40.mlp.up_proj.qzeros", "model.layers.41.self_attn.k_proj.qzeros", "model.layers.41.self_attn.o_proj.qzeros", "model.layers.41.self_attn.q_proj.qzeros", "model.layers.41.self_attn.v_proj.qzeros", "model.layers.41.mlp.down_proj.qzeros", "model.layers.41.mlp.gate_proj.qzeros", "model.layers.41.mlp.up_proj.qzeros", "model.layers.42.self_attn.k_proj.qzeros", "model.layers.42.self_attn.o_proj.qzeros", "model.layers.42.self_attn.q_proj.qzeros", "model.layers.42.self_attn.v_proj.qzeros", "model.layers.42.mlp.down_proj.qzeros", "model.layers.42.mlp.gate_proj.qzeros", "model.layers.42.mlp.up_proj.qzeros", "model.layers.43.self_attn.k_proj.qzeros", "model.layers.43.self_attn.o_proj.qzeros", "model.layers.43.self_attn.q_proj.qzeros", "model.layers.43.self_attn.v_proj.qzeros", "model.layers.43.mlp.down_proj.qzeros", "model.layers.43.mlp.gate_proj.qzeros", "model.layers.43.mlp.up_proj.qzeros", "model.layers.44.self_attn.k_proj.qzeros", "model.layers.44.self_attn.o_proj.qzeros", "model.layers.44.self_attn.q_proj.qzeros", "model.layers.44.self_attn.v_proj.qzeros", "model.layers.44.mlp.down_proj.qzeros", "model.layers.44.mlp.gate_proj.qzeros", "model.layers.44.mlp.up_proj.qzeros", "model.layers.45.self_attn.k_proj.qzeros", "model.layers.45.self_attn.o_proj.qzeros", "model.layers.45.self_attn.q_proj.qzeros", "model.layers.45.self_attn.v_proj.qzeros", "model.layers.45.mlp.down_proj.qzeros", "model.layers.45.mlp.gate_proj.qzeros", "model.layers.45.mlp.up_proj.qzeros", "model.layers.46.self_attn.k_proj.qzeros", "model.layers.46.self_attn.o_proj.qzeros", "model.layers.46.self_attn.q_proj.qzeros", "model.layers.46.self_attn.v_proj.qzeros", "model.layers.46.mlp.down_proj.qzeros", "model.layers.46.mlp.gate_proj.qzeros", "model.layers.46.mlp.up_proj.qzeros", "model.layers.47.self_attn.k_proj.qzeros", "model.layers.47.self_attn.o_proj.qzeros", "model.layers.47.self_attn.q_proj.qzeros", "model.layers.47.self_attn.v_proj.qzeros", "model.layers.47.mlp.down_proj.qzeros", "model.layers.47.mlp.gate_proj.qzeros", "model.layers.47.mlp.up_proj.qzeros", "model.layers.48.self_attn.k_proj.qzeros", "model.layers.48.self_attn.o_proj.qzeros", "model.layers.48.self_attn.q_proj.qzeros", "model.layers.48.self_attn.v_proj.qzeros", "model.layers.48.mlp.down_proj.qzeros", "model.layers.48.mlp.gate_proj.qzeros", "model.layers.48.mlp.up_proj.qzeros", "model.layers.49.self_attn.k_proj.qzeros", "model.layers.49.self_attn.o_proj.qzeros", "model.layers.49.self_attn.q_proj.qzeros", "model.layers.49.self_attn.v_proj.qzeros", "model.layers.49.mlp.down_proj.qzeros", "model.layers.49.mlp.gate_proj.qzeros", "model.layers.49.mlp.up_proj.qzeros", "model.layers.50.self_attn.k_proj.qzeros", "model.layers.50.self_attn.o_proj.qzeros", "model.layers.50.self_attn.q_proj.qzeros", "model.layers.50.self_attn.v_proj.qzeros", "model.layers.50.mlp.down_proj.qzeros", "model.layers.50.mlp.gate_proj.qzeros", "model.layers.50.mlp.up_proj.qzeros", "model.layers.51.self_attn.k_proj.qzeros", "model.layers.51.self_attn.o_proj.qzeros", "model.layers.51.self_attn.q_proj.qzeros", "model.layers.51.self_attn.v_proj.qzeros", "model.layers.51.mlp.down_proj.qzeros", "model.layers.51.mlp.gate_proj.qzeros", "model.layers.51.mlp.up_proj.qzeros", "model.layers.52.self_attn.k_proj.qzeros", "model.layers.52.self_attn.o_proj.qzeros", "model.layers.52.self_attn.q_proj.qzeros", "model.layers.52.self_attn.v_proj.qzeros", "model.layers.52.mlp.down_proj.qzeros", "model.layers.52.mlp.gate_proj.qzeros", "model.layers.52.mlp.up_proj.qzeros", "model.layers.53.self_attn.k_proj.qzeros", "model.layers.53.self_attn.o_proj.qzeros", "model.layers.53.self_attn.q_proj.qzeros", "model.layers.53.self_attn.v_proj.qzeros", "model.layers.53.mlp.down_proj.qzeros", "model.layers.53.mlp.gate_proj.qzeros", "model.layers.53.mlp.up_proj.qzeros", "model.layers.54.self_attn.k_proj.qzeros", "model.layers.54.self_attn.o_proj.qzeros", "model.layers.54.self_attn.q_proj.qzeros", "model.layers.54.self_attn.v_proj.qzeros", "model.layers.54.mlp.down_proj.qzeros", "model.layers.54.mlp.gate_proj.qzeros", "model.layers.54.mlp.up_proj.qzeros", "model.layers.55.self_attn.k_proj.qzeros", "model.layers.55.self_attn.o_proj.qzeros", "model.layers.55.self_attn.q_proj.qzeros", "model.layers.55.self_attn.v_proj.qzeros", "model.layers.55.mlp.down_proj.qzeros", "model.layers.55.mlp.gate_proj.qzeros", "model.layers.55.mlp.up_proj.qzeros", "model.layers.56.self_attn.k_proj.qzeros", "model.layers.56.self_attn.o_proj.qzeros", "model.layers.56.self_attn.q_proj.qzeros", "model.layers.56.self_attn.v_proj.qzeros", "model.layers.56.mlp.down_proj.qzeros", "model.layers.56.mlp.gate_proj.qzeros", "model.layers.56.mlp.up_proj.qzeros", "model.layers.57.self_attn.k_proj.qzeros", "model.layers.57.self_attn.o_proj.qzeros", "model.layers.57.self_attn.q_proj.qzeros", "model.layers.57.self_attn.v_proj.qzeros", "model.layers.57.mlp.down_proj.qzeros", "model.layers.57.mlp.gate_proj.qzeros", "model.layers.57.mlp.up_proj.qzeros", "model.layers.58.self_attn.k_proj.qzeros", "model.layers.58.self_attn.o_proj.qzeros", "model.layers.58.self_attn.q_proj.qzeros", "model.layers.58.self_attn.v_proj.qzeros", "model.layers.58.mlp.down_proj.qzeros", "model.layers.58.mlp.gate_proj.qzeros", "model.layers.58.mlp.up_proj.qzeros", "model.layers.59.self_attn.k_proj.qzeros", "model.layers.59.self_attn.o_proj.qzeros", "model.layers.59.self_attn.q_proj.qzeros", "model.layers.59.self_attn.v_proj.qzeros", "model.layers.59.mlp.down_proj.qzeros", "model.layers.59.mlp.gate_proj.qzeros", "model.layers.59.mlp.up_proj.qzeros".
Unexpected key(s) in state_dict: "model.layers.0.self_attn.k_proj.zeros", "model.layers.0.self_attn.o_proj.zeros", "model.layers.0.self_attn.q_proj.zeros", "model.layers.0.self_attn.v_proj.zeros", "model.layers.0.mlp.down_proj.zeros", "model.layers.0.mlp.gate_proj.zeros", "model.layers.0.mlp.up_proj.zeros", "model.layers.1.self_attn.k_proj.zeros", "model.layers.1.self_attn.o_proj.zeros", "model.layers.1.self_attn.q_proj.zeros", "model.layers.1.self_attn.v_proj.zeros", "model.layers.1.mlp.down_proj.zeros", "model.layers.1.mlp.gate_proj.zeros", "model.layers.1.mlp.up_proj.zeros", "model.layers.2.self_attn.k_proj.zeros", "model.layers.2.self_attn.o_proj.zeros", "model.layers.2.self_attn.q_proj.zeros", "model.layers.2.self_attn.v_proj.zeros", "model.layers.2.mlp.down_proj.zeros", "model.layers.2.mlp.gate_proj.zeros", "model.layers.2.mlp.up_proj.zeros", "model.layers.3.self_attn.k_proj.zeros", "model.layers.3.self_attn.o_proj.zeros", "model.layers.3.self_attn.q_proj.zeros", "model.layers.3.self_attn.v_proj.zeros", "model.layers.3.mlp.down_proj.zeros", "model.layers.3.mlp.gate_proj.zeros", "model.layers.3.mlp.up_proj.zeros", "model.layers.4.self_attn.k_proj.zeros", "model.layers.4.self_attn.o_proj.zeros", "model.layers.4.self_attn.q_proj.zeros", "model.layers.4.self_attn.v_proj.zeros", "model.layers.4.mlp.down_proj.zeros", "model.layers.4.mlp.gate_proj.zeros", "model.layers.4.mlp.up_proj.zeros", "model.layers.5.self_attn.k_proj.zeros", "model.layers.5.self_attn.o_proj.zeros", "model.layers.5.self_attn.q_proj.zeros", "model.layers.5.self_attn.v_proj.zeros", "model.layers.5.mlp.down_proj.zeros", "model.layers.5.mlp.gate_proj.zeros", "model.layers.5.mlp.up_proj.zeros", "model.layers.6.self_attn.k_proj.zeros", "model.layers.6.self_attn.o_proj.zeros", "model.layers.6.self_attn.q_proj.zeros", "model.layers.6.self_attn.v_proj.zeros", "model.layers.6.mlp.down_proj.zeros", "model.layers.6.mlp.gate_proj.zeros", "model.layers.6.mlp.up_proj.zeros", "model.layers.7.self_attn.k_proj.zeros", "model.layers.7.self_attn.o_proj.zeros", "model.layers.7.self_attn.q_proj.zeros", "model.layers.7.self_attn.v_proj.zeros", "model.layers.7.mlp.down_proj.zeros", "model.layers.7.mlp.gate_proj.zeros", "model.layers.7.mlp.up_proj.zeros", "model.layers.8.self_attn.k_proj.zeros", "model.layers.8.self_attn.o_proj.zeros", "model.layers.8.self_attn.q_proj.zeros", "model.layers.8.self_attn.v_proj.zeros", "model.layers.8.mlp.down_proj.zeros", "model.layers.8.mlp.gate_proj.zeros", "model.layers.8.mlp.up_proj.zeros", "model.layers.9.self_attn.k_proj.zeros", "model.layers.9.self_attn.o_proj.zeros", "model.layers.9.self_attn.q_proj.zeros", "model.layers.9.self_attn.v_proj.zeros", "model.layers.9.mlp.down_proj.zeros", "model.layers.9.mlp.gate_proj.zeros", "model.layers.9.mlp.up_proj.zeros", "model.layers.10.self_attn.k_proj.zeros", "model.layers.10.self_attn.o_proj.zeros", "model.layers.10.self_attn.q_proj.zeros", "model.layers.10.self_attn.v_proj.zeros", "model.layers.10.mlp.down_proj.zeros", "model.layers.10.mlp.gate_proj.zeros", "model.layers.10.mlp.up_proj.zeros", "model.layers.11.self_attn.k_proj.zeros", "model.layers.11.self_attn.o_proj.zeros", "model.layers.11.self_attn.q_proj.zeros", "model.layers.11.self_attn.v_proj.zeros", "model.layers.11.mlp.down_proj.zeros", "model.layers.11.mlp.gate_proj.zeros", "model.layers.11.mlp.up_proj.zeros", "model.layers.12.self_attn.k_proj.zeros", "model.layers.12.self_attn.o_proj.zeros", "model.layers.12.self_attn.q_proj.zeros", "model.layers.12.self_attn.v_proj.zeros", "model.layers.12.mlp.down_proj.zeros", "model.layers.12.mlp.gate_proj.zeros", "model.layers.12.mlp.up_proj.zeros", "model.layers.13.self_attn.k_proj.zeros", "model.layers.13.self_attn.o_proj.zeros", "model.layers.13.self_attn.q_proj.zeros", "model.layers.13.self_attn.v_proj.zeros", "model.layers.13.mlp.down_proj.zeros", "model.layers.13.mlp.gate_proj.zeros", "model.layers.13.mlp.up_proj.zeros", "model.layers.14.self_attn.k_proj.zeros", "model.layers.14.self_attn.o_proj.zeros", "model.layers.14.self_attn.q_proj.zeros", "model.layers.14.self_attn.v_proj.zeros", "model.layers.14.mlp.down_proj.zeros", "model.layers.14.mlp.gate_proj.zeros", "model.layers.14.mlp.up_proj.zeros", "model.layers.15.self_attn.k_proj.zeros", "model.layers.15.self_attn.o_proj.zeros", "model.layers.15.self_attn.q_proj.zeros", "model.layers.15.self_attn.v_proj.zeros", "model.layers.15.mlp.down_proj.zeros", "model.layers.15.mlp.gate_proj.zeros", "model.layers.15.mlp.up_proj.zeros", "model.layers.16.self_attn.k_proj.zeros", "model.layers.16.self_attn.o_proj.zeros", "model.layers.16.self_attn.q_proj.zeros", "model.layers.16.self_attn.v_proj.zeros", "model.layers.16.mlp.down_proj.zeros", "model.layers.16.mlp.gate_proj.zeros", "model.layers.16.mlp.up_proj.zeros", "model.layers.17.self_attn.k_proj.zeros", "model.layers.17.self_attn.o_proj.zeros", "model.layers.17.self_attn.q_proj.zeros", "model.layers.17.self_attn.v_proj.zeros", "model.layers.17.mlp.down_proj.zeros", "model.layers.17.mlp.gate_proj.zeros", "model.layers.17.mlp.up_proj.zeros", "model.layers.18.self_attn.k_proj.zeros", "model.layers.18.self_attn.o_proj.zeros", "model.layers.18.self_attn.q_proj.zeros", "model.layers.18.self_attn.v_proj.zeros", "model.layers.18.mlp.down_proj.zeros", "model.layers.18.mlp.gate_proj.zeros", "model.layers.18.mlp.up_proj.zeros", "model.layers.19.self_attn.k_proj.zeros", "model.layers.19.self_attn.o_proj.zeros", "model.layers.19.self_attn.q_proj.zeros", "model.layers.19.self_attn.v_proj.zeros", "model.layers.19.mlp.down_proj.zeros", "model.layers.19.mlp.gate_proj.zeros", "model.layers.19.mlp.up_proj.zeros", "model.layers.20.self_attn.k_proj.zeros", "model.layers.20.self_attn.o_proj.zeros", "model.layers.20.self_attn.q_proj.zeros", "model.layers.20.self_attn.v_proj.zeros", "model.layers.20.mlp.down_proj.zeros", "model.layers.20.mlp.gate_proj.zeros", "model.layers.20.mlp.up_proj.zeros", "model.layers.21.self_attn.k_proj.zeros", "model.layers.21.self_attn.o_proj.zeros", "model.layers.21.self_attn.q_proj.zeros", "model.layers.21.self_attn.v_proj.zeros", "model.layers.21.mlp.down_proj.zeros", "model.layers.21.mlp.gate_proj.zeros", "model.layers.21.mlp.up_proj.zeros", "model.layers.22.self_attn.k_proj.zeros", "model.layers.22.self_attn.o_proj.zeros", "model.layers.22.self_attn.q_proj.zeros", "model.layers.22.self_attn.v_proj.zeros", "model.layers.22.mlp.down_proj.zeros", "model.layers.22.mlp.gate_proj.zeros", "model.layers.22.mlp.up_proj.zeros", "model.layers.23.self_attn.k_proj.zeros", "model.layers.23.self_attn.o_proj.zeros", "model.layers.23.self_attn.q_proj.zeros", "model.layers.23.self_attn.v_proj.zeros", "model.layers.23.mlp.down_proj.zeros", "model.layers.23.mlp.gate_proj.zeros", "model.layers.23.mlp.up_proj.zeros", "model.layers.24.self_attn.k_proj.zeros", "model.layers.24.self_attn.o_proj.zeros", "model.layers.24.self_attn.q_proj.zeros", "model.layers.24.self_attn.v_proj.zeros", "model.layers.24.mlp.down_proj.zeros", "model.layers.24.mlp.gate_proj.zeros", "model.layers.24.mlp.up_proj.zeros", "model.layers.25.self_attn.k_proj.zeros", "model.layers.25.self_attn.o_proj.zeros", "model.layers.25.self_attn.q_proj.zeros", "model.layers.25.self_attn.v_proj.zeros", "model.layers.25.mlp.down_proj.zeros", "model.layers.25.mlp.gate_proj.zeros", "model.layers.25.mlp.up_proj.zeros", "model.layers.26.self_attn.k_proj.zeros", "model.layers.26.self_attn.o_proj.zeros", "model.layers.26.self_attn.q_proj.zeros", "model.layers.26.self_attn.v_proj.zeros", "model.layers.26.mlp.down_proj.zeros", "model.layers.26.mlp.gate_proj.zeros", "model.layers.26.mlp.up_proj.zeros", "model.layers.27.self_attn.k_proj.zeros", "model.layers.27.self_attn.o_proj.zeros", "model.layers.27.self_attn.q_proj.zeros", "model.layers.27.self_attn.v_proj.zeros", "model.layers.27.mlp.down_proj.zeros", "model.layers.27.mlp.gate_proj.zeros", "model.layers.27.mlp.up_proj.zeros", "model.layers.28.self_attn.k_proj.zeros", "model.layers.28.self_attn.o_proj.zeros", "model.layers.28.self_attn.q_proj.zeros", "model.layers.28.self_attn.v_proj.zeros", "model.layers.28.mlp.down_proj.zeros", "model.layers.28.mlp.gate_proj.zeros", "model.layers.28.mlp.up_proj.zeros", "model.layers.29.self_attn.k_proj.zeros", "model.layers.29.self_attn.o_proj.zeros", "model.layers.29.self_attn.q_proj.zeros", "model.layers.29.self_attn.v_proj.zeros", "model.layers.29.mlp.down_proj.zeros", "model.layers.29.mlp.gate_proj.zeros", "model.layers.29.mlp.up_proj.zeros", "model.layers.30.self_attn.k_proj.zeros", "model.layers.30.self_attn.o_proj.zeros", "model.layers.30.self_attn.q_proj.zeros", "model.layers.30.self_attn.v_proj.zeros", "model.layers.30.mlp.down_proj.zeros", "model.layers.30.mlp.gate_proj.zeros", "model.layers.30.mlp.up_proj.zeros", "model.layers.31.self_attn.k_proj.zeros", "model.layers.31.self_attn.o_proj.zeros", "model.layers.31.self_attn.q_proj.zeros", "model.layers.31.self_attn.v_proj.zeros", "model.layers.31.mlp.down_proj.zeros", "model.layers.31.mlp.gate_proj.zeros", "model.layers.31.mlp.up_proj.zeros", "model.layers.32.self_attn.k_proj.zeros", "model.layers.32.self_attn.o_proj.zeros", "model.layers.32.self_attn.q_proj.zeros", "model.layers.32.self_attn.v_proj.zeros", "model.layers.32.mlp.down_proj.zeros", "model.layers.32.mlp.gate_proj.zeros", "model.layers.32.mlp.up_proj.zeros", "model.layers.33.self_attn.k_proj.zeros", "model.layers.33.self_attn.o_proj.zeros", "model.layers.33.self_attn.q_proj.zeros", "model.layers.33.self_attn.v_proj.zeros", "model.layers.33.mlp.down_proj.zeros", "model.layers.33.mlp.gate_proj.zeros", "model.layers.33.mlp.up_proj.zeros", "model.layers.34.self_attn.k_proj.zeros", "model.layers.34.self_attn.o_proj.zeros", "model.layers.34.self_attn.q_proj.zeros", "model.layers.34.self_attn.v_proj.zeros", "model.layers.34.mlp.down_proj.zeros", "model.layers.34.mlp.gate_proj.zeros", "model.layers.34.mlp.up_proj.zeros", "model.layers.35.self_attn.k_proj.zeros", "model.layers.35.self_attn.o_proj.zeros", "model.layers.35.self_attn.q_proj.zeros", "model.layers.35.self_attn.v_proj.zeros", "model.layers.35.mlp.down_proj.zeros", "model.layers.35.mlp.gate_proj.zeros", "model.layers.35.mlp.up_proj.zeros", "model.layers.36.self_attn.k_proj.zeros", "model.layers.36.self_attn.o_proj.zeros", "model.layers.36.self_attn.q_proj.zeros", "model.layers.36.self_attn.v_proj.zeros", "model.layers.36.mlp.down_proj.zeros", "model.layers.36.mlp.gate_proj.zeros", "model.layers.36.mlp.up_proj.zeros", "model.layers.37.self_attn.k_proj.zeros", "model.layers.37.self_attn.o_proj.zeros", "model.layers.37.self_attn.q_proj.zeros", "model.layers.37.self_attn.v_proj.zeros", "model.layers.37.mlp.down_proj.zeros", "model.layers.37.mlp.gate_proj.zeros", "model.layers.37.mlp.up_proj.zeros", "model.layers.38.self_attn.k_proj.zeros", "model.layers.38.self_attn.o_proj.zeros", "model.layers.38.self_attn.q_proj.zeros", "model.layers.38.self_attn.v_proj.zeros", "model.layers.38.mlp.down_proj.zeros", "model.layers.38.mlp.gate_proj.zeros", "model.layers.38.mlp.up_proj.zeros", "model.layers.39.self_attn.k_proj.zeros", "model.layers.39.self_attn.o_proj.zeros", "model.layers.39.self_attn.q_proj.zeros", "model.layers.39.self_attn.v_proj.zeros", "model.layers.39.mlp.down_proj.zeros", "model.layers.39.mlp.gate_proj.zeros", "model.layers.39.mlp.up_proj.zeros", "model.layers.40.self_attn.k_proj.zeros", "model.layers.40.self_attn.o_proj.zeros", "model.layers.40.self_attn.q_proj.zeros", "model.layers.40.self_attn.v_proj.zeros", "model.layers.40.mlp.down_proj.zeros", "model.layers.40.mlp.gate_proj.zeros", "model.layers.40.mlp.up_proj.zeros", "model.layers.41.self_attn.k_proj.zeros", "model.layers.41.self_attn.o_proj.zeros", "model.layers.41.self_attn.q_proj.zeros", "model.layers.41.self_attn.v_proj.zeros", "model.layers.41.mlp.down_proj.zeros", "model.layers.41.mlp.gate_proj.zeros", "model.layers.41.mlp.up_proj.zeros", "model.layers.42.self_attn.k_proj.zeros", "model.layers.42.self_attn.o_proj.zeros", "model.layers.42.self_attn.q_proj.zeros", "model.layers.42.self_attn.v_proj.zeros", "model.layers.42.mlp.down_proj.zeros", "model.layers.42.mlp.gate_proj.zeros", "model.layers.42.mlp.up_proj.zeros", "model.layers.43.self_attn.k_proj.zeros", "model.layers.43.self_attn.o_proj.zeros", "model.layers.43.self_attn.q_proj.zeros", "model.layers.43.self_attn.v_proj.zeros", "model.layers.43.mlp.down_proj.zeros", "model.layers.43.mlp.gate_proj.zeros", "model.layers.43.mlp.up_proj.zeros", "model.layers.44.self_attn.k_proj.zeros", "model.layers.44.self_attn.o_proj.zeros", "model.layers.44.self_attn.q_proj.zeros", "model.layers.44.self_attn.v_proj.zeros", "model.layers.44.mlp.down_proj.zeros", "model.layers.44.mlp.gate_proj.zeros", "model.layers.44.mlp.up_proj.zeros", "model.layers.45.self_attn.k_proj.zeros", "model.layers.45.self_attn.o_proj.zeros", "model.layers.45.self_attn.q_proj.zeros", "model.layers.45.self_attn.v_proj.zeros", "model.layers.45.mlp.down_proj.zeros", "model.layers.45.mlp.gate_proj.zeros", "model.layers.45.mlp.up_proj.zeros", "model.layers.46.self_attn.k_proj.zeros", "model.layers.46.self_attn.o_proj.zeros", "model.layers.46.self_attn.q_proj.zeros", "model.layers.46.self_attn.v_proj.zeros", "model.layers.46.mlp.down_proj.zeros", "model.layers.46.mlp.gate_proj.zeros", "model.layers.46.mlp.up_proj.zeros", "model.layers.47.self_attn.k_proj.zeros", "model.layers.47.self_attn.o_proj.zeros", "model.layers.47.self_attn.q_proj.zeros", "model.layers.47.self_attn.v_proj.zeros", "model.layers.47.mlp.down_proj.zeros", "model.layers.47.mlp.gate_proj.zeros", "model.layers.47.mlp.up_proj.zeros", "model.layers.48.self_attn.k_proj.zeros", "model.layers.48.self_attn.o_proj.zeros", "model.layers.48.self_attn.q_proj.zeros", "model.layers.48.self_attn.v_proj.zeros", "model.layers.48.mlp.down_proj.zeros", "model.layers.48.mlp.gate_proj.zeros", "model.layers.48.mlp.up_proj.zeros", "model.layers.49.self_attn.k_proj.zeros", "model.layers.49.self_attn.o_proj.zeros", "model.layers.49.self_attn.q_proj.zeros", "model.layers.49.self_attn.v_proj.zeros", "model.layers.49.mlp.down_proj.zeros", "model.layers.49.mlp.gate_proj.zeros", "model.layers.49.mlp.up_proj.zeros", "model.layers.50.self_attn.k_proj.zeros", "model.layers.50.self_attn.o_proj.zeros", "model.layers.50.self_attn.q_proj.zeros", "model.layers.50.self_attn.v_proj.zeros", "model.layers.50.mlp.down_proj.zeros", "model.layers.50.mlp.gate_proj.zeros", "model.layers.50.mlp.up_proj.zeros", "model.layers.51.self_attn.k_proj.zeros", "model.layers.51.self_attn.o_proj.zeros", "model.layers.51.self_attn.q_proj.zeros", "model.layers.51.self_attn.v_proj.zeros", "model.layers.51.mlp.down_proj.zeros", "model.layers.51.mlp.gate_proj.zeros", "model.layers.51.mlp.up_proj.zeros", "model.layers.52.self_attn.k_proj.zeros", "model.layers.52.self_attn.o_proj.zeros", "model.layers.52.self_attn.q_proj.zeros", "model.layers.52.self_attn.v_proj.zeros", "model.layers.52.mlp.down_proj.zeros", "model.layers.52.mlp.gate_proj.zeros", "model.layers.52.mlp.up_proj.zeros", "model.layers.53.self_attn.k_proj.zeros", "model.layers.53.self_attn.o_proj.zeros", "model.layers.53.self_attn.q_proj.zeros", "model.layers.53.self_attn.v_proj.zeros", "model.layers.53.mlp.down_proj.zeros", "model.layers.53.mlp.gate_proj.zeros", "model.layers.53.mlp.up_proj.zeros", "model.layers.54.self_attn.k_proj.zeros", "model.layers.54.self_attn.o_proj.zeros", "model.layers.54.self_attn.q_proj.zeros", "model.layers.54.self_attn.v_proj.zeros", "model.layers.54.mlp.down_proj.zeros", "model.layers.54.mlp.gate_proj.zeros", "model.layers.54.mlp.up_proj.zeros", "model.layers.55.self_attn.k_proj.zeros", "model.layers.55.self_attn.o_proj.zeros", "model.layers.55.self_attn.q_proj.zeros", "model.layers.55.self_attn.v_proj.zeros", "model.layers.55.mlp.down_proj.zeros", "model.layers.55.mlp.gate_proj.zeros", "model.layers.55.mlp.up_proj.zeros", "model.layers.56.self_attn.k_proj.zeros", "model.layers.56.self_attn.o_proj.zeros", "model.layers.56.self_attn.q_proj.zeros", "model.layers.56.self_attn.v_proj.zeros", "model.layers.56.mlp.down_proj.zeros", "model.layers.56.mlp.gate_proj.zeros", "model.layers.56.mlp.up_proj.zeros", "model.layers.57.self_attn.k_proj.zeros", "model.layers.57.self_attn.o_proj.zeros", "model.layers.57.self_attn.q_proj.zeros", "model.layers.57.self_attn.v_proj.zeros", "model.layers.57.mlp.down_proj.zeros", "model.layers.57.mlp.gate_proj.zeros", "model.layers.57.mlp.up_proj.zeros", "model.layers.58.self_attn.k_proj.zeros", "model.layers.58.self_attn.o_proj.zeros", "model.layers.58.self_attn.q_proj.zeros", "model.layers.58.self_attn.v_proj.zeros", "model.layers.58.mlp.down_proj.zeros", "model.layers.58.mlp.gate_proj.zeros", "model.layers.58.mlp.up_proj.zeros", "model.layers.59.self_attn.k_proj.zeros", "model.layers.59.self_attn.o_proj.zeros", "model.layers.59.self_attn.q_proj.zeros", "model.layers.59.self_attn.v_proj.zeros", "model.layers.59.mlp.down_proj.zeros", "model.layers.59.mlp.gate_proj.zeros", "model.layers.59.mlp.up_proj.zeros".
size mismatch for model.layers.0.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.0.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.0.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.0.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.0.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.0.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.0.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.1.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.1.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.1.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.1.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.1.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.1.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.1.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.2.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.2.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.2.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.2.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.2.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.2.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.2.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.3.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.3.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.3.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.3.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.3.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.3.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.3.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.4.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.4.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.4.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.4.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.4.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.4.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.4.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.5.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.5.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.5.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.5.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.5.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.5.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.5.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.6.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.6.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.6.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.6.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.6.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.6.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.6.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.7.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.7.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.7.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.7.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.7.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.7.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.7.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.8.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.8.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.8.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.8.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.8.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.8.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.8.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.9.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.9.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.9.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.9.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.9.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.9.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.9.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.10.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.10.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.10.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.10.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.10.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.10.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.10.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.11.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.11.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.11.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.11.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.11.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.11.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.11.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.12.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.12.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.12.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.12.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.12.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.12.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.12.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.13.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.13.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.13.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.13.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.13.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.13.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.13.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.14.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.14.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.14.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.14.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.14.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.14.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.14.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.15.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.15.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.15.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.15.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.15.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.15.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.15.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.16.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.16.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.16.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.16.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.16.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.16.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.16.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.17.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.17.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.17.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.17.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.17.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.17.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.17.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.18.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.18.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.18.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.18.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.18.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.18.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.18.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.19.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.19.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.19.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.19.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.19.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.19.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.19.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.20.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.20.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.20.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.20.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.20.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.20.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.20.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.21.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.21.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.21.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.21.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.21.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.21.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.21.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.22.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.22.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.22.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.22.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.22.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.22.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.22.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.23.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.23.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.23.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.23.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.23.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.23.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.23.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.24.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.24.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.24.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.24.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.24.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.24.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.24.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.25.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.25.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.25.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.25.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.25.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.25.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.25.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.26.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.26.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.26.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.26.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.26.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.26.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.26.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.27.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.27.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.27.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.27.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.27.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.27.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.27.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.28.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.28.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.28.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.28.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.28.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.28.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.28.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.29.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.29.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.29.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.29.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.29.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.29.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.29.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.30.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.30.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.30.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.30.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.30.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.30.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.30.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.31.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.31.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.31.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.31.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.31.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.31.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.31.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.32.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.32.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.32.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.32.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.32.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.32.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.32.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.33.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.33.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.33.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.33.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.33.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.33.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.33.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.34.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.34.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.34.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.34.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.34.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.34.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.34.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.35.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.35.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.35.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.35.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.35.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.35.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.35.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.36.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.36.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.36.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.36.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.36.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.36.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.36.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.37.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.37.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.37.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.37.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.37.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.37.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.37.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.38.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.38.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.38.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.38.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.38.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.38.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.38.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.39.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.39.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.39.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.39.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.39.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.39.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.39.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.40.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.40.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.40.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.40.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.40.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.40.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.40.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.41.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.41.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.41.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.41.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.41.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.41.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.41.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.42.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.42.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.42.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.42.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.42.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.42.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.42.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.43.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.43.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.43.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.43.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.43.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.43.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.43.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.44.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.44.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.44.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.44.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.44.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.44.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.44.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.45.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.45.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.45.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.45.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.45.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.45.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.45.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.46.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.46.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.46.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.46.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.46.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.46.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.46.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.47.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.47.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.47.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.47.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.47.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.47.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.47.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.48.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.48.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.48.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.48.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.48.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.48.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.48.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.49.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.49.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.49.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.49.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.49.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.49.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.49.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.50.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.50.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.50.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.50.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.50.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.50.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.50.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.51.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.51.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.51.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.51.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.51.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.51.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.51.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.52.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.52.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.52.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.52.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.52.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.52.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.52.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.53.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.53.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.53.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.53.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.53.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.53.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.53.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.54.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.54.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.54.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.54.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.54.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.54.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.54.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.55.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.55.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.55.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.55.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.55.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.55.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.55.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.56.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.56.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.56.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.56.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.56.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.56.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.56.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.57.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.57.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.57.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.57.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.57.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.57.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.57.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.58.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.58.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.58.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.58.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.58.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.58.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.58.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.59.self_attn.k_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.59.self_attn.o_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.59.self_attn.q_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.59.self_attn.v_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([52, 6656]).
size mismatch for model.layers.59.mlp.down_proj.scales: copying a param with shape torch.Size([6656, 1]) from checkpoint, the shape in current model is torch.Size([140, 6656]).
size mismatch for model.layers.59.mlp.gate_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
size mismatch for model.layers.59.mlp.up_proj.scales: copying a param with shape torch.Size([17920, 1]) from checkpoint, the shape in current model is torch.Size([52, 17920]).
Press any key to continue . . .

See https://huggingface.co/elinas/alpaca-30b-lora-int4#update-2023-04-03 on breaking changes in GPTQ and how to resolve it.

See https://huggingface.co/elinas/alpaca-30b-lora-int4#update-2023-04-03 on breaking changes in GPTQ and how to resolve it.

Thanks, but I'm on OObas older commit of GPTQ and it's working with other 4B Q models.

Okay so loading the .pt through those errors even without groupsize, but loading the .safetensors seems to work...

pt is the old model and there if anyone wants to use it for some reason.

elinas changed discussion status to closed

For future readers:
I had a similar promblem when merging the tuned lora adapter_model.bin with base mode (videollava). I found it is because when I load the base model, I did not cancel the the 'converting to lora' operations for some parameters. So maybe it is useful to comment such convert code when merging.

Sign up or log in to comment