How did you create this gguf?

#1
by Venkman42 - opened

Could you please share how you created this gguf?

I keep getting errors like:

Traceback (most recent call last):
File "/content/llama.cpp/convert.py", line 1483, in
main()
File "/content/llama.cpp/convert.py", line 1419, in main
model_plus = load_some_model(args.model)
File "/content/llama.cpp/convert.py", line 1280, in load_some_model
model_plus = merge_multifile_models(models_plus)
File "/content/llama.cpp/convert.py", line 730, in merge_multifile_models
model = merge_sharded([mp.model for mp in models_plus])
File "/content/llama.cpp/convert.py", line 709, in merge_sharded
return {name: convert(name) for name in names}
File "/content/llama.cpp/convert.py", line 709, in
return {name: convert(name) for name in names}
File "/content/llama.cpp/convert.py", line 684, in convert
lazy_tensors: list[LazyTensor] = [model[name] for model in models]
File "/content/llama.cpp/convert.py", line 684, in
lazy_tensors: list[LazyTensor] = [model[name] for model in models]
KeyError: 'transformer.embd.wte.weight'
Traceback (most recent call last):
File "/content/./quantizeHFmodel/quantizeHFmodel.py", line 33, in
download_and_quantize_model(model_id)
File "/content/./quantizeHFmodel/quantizeHFmodel.py", line 18, in download_and_quantize_model
subprocess.run(["python", "llama.cpp/convert.py", local_dir, "--outtype", "f16", "--outfile", fp16_file], check=True)
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['python', 'llama.cpp/convert.py', 'dolphin-2_6-phi-2_oasst2_chatML_V2', '--outtype', 'f16', '--outfile', 'dolphin-2_6-phi-2_oasst2_chatML_V2/dolphin-2_6-phi-2_oasst2_chatml_v2.f16.gguf']' returned non-zero exit status 1.

Did you do anything special for quantizing them?

I figured it out. Had to use an older version of llama.cpp

Venkman42 changed discussion status to closed

Sign up or log in to comment