How did you convert it?

#1
by ZeroWw - opened

I tried to convert grom the original model to get the f16 model.
But I have an error:
python llama.cpp/convert-hf-to-gguf.py --outtype f16 /content/gemma-1.1-7b-it --outfile /content/gemma-1.1-7b-it.f16.gguf

INFO:hf-to-gguf:Loading model: gemma-1.1-7b-it
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
....
  File "/content/llama.cpp/gguf-py/gguf/gguf_writer.py", line 166, in add_key_value
    raise ValueError(f'Duplicated key name {key!r}')
ValueError: Duplicated key name 'tokenizer.chat_template'

I commented the line in gguf writer for now.. we'll see.
Commenting those instruction causes that the second duplicate will overwrite the previous one.

def add_key_value(self, key: str, val: Any, vtype: GGUFValueType) -> None:
    #if key in self.kv_data:
    #    raise ValueError(f'Duplicated key name {key!r}')

yeah someone broke the conversion for gemma recently and it needs to be fixed

Any idea how to run gemma in llama.cpp ?
I tried with the above models, the model answers in llama.cpp UI (server) but aftter the answer it continues by itself.

Need to specify proper stop tokens I would guess

Sign up or log in to comment