Trying to quantize. Running into the issue below. Any suggestions?

#5
by BigDeeper - opened

Permuting layer 30
Permuting layer 31
model.embed_tokens.weight -> token_embd.weight | F16 | [32002, 4096]
model.layers.0.input_layernorm.weight -> blk.0.attn_norm.weight | F16 | [4096]
Traceback (most recent call last):
File "/home/developer/llama.cpp/convert.py", line 1228, in
main()
File "/home/developer/llama.cpp/convert.py", line 1215, in main
model = convert_model_names(model, params)
File "/home/developer/llama.cpp/convert.py", line 1004, in convert_model_names
raise Exception(f"Unexpected tensor name: {name}")
Exception: Unexpected tensor name: model.layers.0.mlp.experts.0.w1.weight

Disco Research org

Llama.cpp doesn't have support for this architecture. They'll probably wait until official arch is released before implementing

bjoernp changed discussion status to closed

Sign up or log in to comment