Error report.
As usual.
However, it may be that Llama.cpp just doesn't know about it yet because it is a model in a lineage of names that I haven't seen.
llama_model_load: error loading model: check_tensor_dims: tensor 'rope_freqs.weight' has wrong shape; expected 48, got 64, 1, 1, 1
llama_load_model_from_file: failed to load model
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
response = await route_utils.call_process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 285, in call_process_api
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
prediction = await utils.async_iteration(iterator)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
return await iterator.__anext__()
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 656, in __anext__
return await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
return next(iterator)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
response = next(iterator)
File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
response = next(iterator)
File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 356, in gradio_handler
raise res.value
ValueError: Failed to load model from file: llm_models/Deedlit_4B.Q5_K_M.gguf
It's of minitron descendance, which is only supported for a day, and maybe "supported" is a bit euphemistic. I've notified the model creator.
I think my version of Llama.cpp was not yet compatible with minitron (the original).
When I used the original, it would have shown the unsupported error message...
Maybe it didn't recognize that is not yet supported because the model structure has changed from the original model.
Thanks for reporting this to the author.
This happens with bleeding edge llama.cpp as well, don't worry. It currently keeps this model from getting an imatrix.
In other words, it's not a false positive.
Good then. Well, not good.
Thanks anyway.
I have tested and this is working on ChatterUI which pulled the change from llama.cpp. My suggestion is pull latest llama.cpp and try again.
Thank you for the info.
https://github.com/abetlen/llama-cpp-python/releases
I guess it takes a few days to the python version of the library is released. I'll give it a try when it comes out.
https://github.com/abetlen/llama-cpp-python/releases/tag/v0.2.90-cu124
The new version came and I tried it and it works fine.
The chat template seems to work fine with LLAMA3, ChatML is fine, but using Mistral is somewhat unresponsive.
Anyway, it worked.
It's very strange indeed. llama-imatrix can't even load it here, but inferencing works.
https://huggingface.co/mradermacher/Deedlit_4B-GGUF/blob/main/Deedlit_4B.Q5_K_M.gguf
I tried again and it still works. And although the response was a little strange, it was a response that could be ranked high for a 4B model (I know it's crazy to ask for multi-lingual performance from a 4B model).
It's a curious model...