Error report.

#1
by John6666 - opened

As usual.
However, it may be that Llama.cpp just doesn't know about it yet because it is a model in a lineage of names that I haven't seen.

llama_model_load: error loading model: check_tensor_dims: tensor 'rope_freqs.weight' has wrong shape; expected    48, got    64,     1,     1,     1
llama_load_model_from_file: failed to load model
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 285, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
    return await iterator.__anext__()
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 656, in __anext__
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
    return next(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
    response = next(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 801, in gen_wrapper
    response = next(iterator)
  File "/usr/local/lib/python3.10/site-packages/spaces/zero/wrappers.py", line 356, in gradio_handler
    raise res.value
ValueError: Failed to load model from file: llm_models/Deedlit_4B.Q5_K_M.gguf

It's of minitron descendance, which is only supported for a day, and maybe "supported" is a bit euphemistic. I've notified the model creator.

I think my version of Llama.cpp was not yet compatible with minitron (the original).
When I used the original, it would have shown the unsupported error message...
Maybe it didn't recognize that is not yet supported because the model structure has changed from the original model.

Thanks for reporting this to the author.

This happens with bleeding edge llama.cpp as well, don't worry. It currently keeps this model from getting an imatrix.

In other words, it's not a false positive.
Good then. Well, not good.
Thanks anyway.

I have tested and this is working on ChatterUI which pulled the change from llama.cpp. My suggestion is pull latest llama.cpp and try again.

Thank you for the info.
https://github.com/abetlen/llama-cpp-python/releases
I guess it takes a few days to the python version of the library is released. I'll give it a try when it comes out.

https://github.com/abetlen/llama-cpp-python/releases/tag/v0.2.90-cu124
The new version came and I tried it and it works fine.
The chat template seems to work fine with LLAMA3, ChatML is fine, but using Mistral is somewhat unresponsive.
Anyway, it worked.

It's very strange indeed. llama-imatrix can't even load it here, but inferencing works.

https://huggingface.co/mradermacher/Deedlit_4B-GGUF/blob/main/Deedlit_4B.Q5_K_M.gguf
I tried again and it still works. And although the response was a little strange, it was a response that could be ranked high for a 4B model (I know it's crazy to ask for multi-lingual performance from a 4B model).
It's a curious model...

Sign up or log in to comment