Weird result with gemma-4-E4B-i1-GGUF:Q4_K_M with llama.cpp

#2349
by ericxhc - opened

not sure if I'm doing something wrong but all the output I get is an infinite repeat of
<unused50><unused50><unused50>...
doesn't matter what parameters I used but I'm running on a Mac M4, maybe that's the issue? I haven't seen anything like this before.
I also saw these logs but not sure how useful they are (I didn't see any errors):

llama_prepare_model_devices: using device MTL0 (Apple M4 Pro) (unknown id) - 16383 MiB free                                                       load: override 'tokenizer.ggml.add_bos_token' to 'true' for Gemma4
load: 0 unused tokens
load: control-looking token:     50 '<|tool_response>' was not control-type; this is probably a bug in the model. its type will be overridden
load: control-looking token:    212 '</s>' was not control-type; this is probably a bug in the model. its type will be overridden
load: printing all EOG tokens:
load:   - 1 ('<eos>')
load:   - 50 ('<|tool_response>')
load:   - 106 ('<turn|>')
load:   - 212 ('</s>')
load: special_eog_ids contains '<|tool_response>', removing '</s>' token from EOG list
load: special tokens cache size = 24
load: token to piece cache size = 1.9445 MB
print_info: arch                  = gemma4

Sign up or log in to comment