wrong number of tensors; expected 292, got 291

#1
by Psykokwak - opened

Hi,
I'm trying to use your models in ollama.
The model creation from the gguf is OK.
But I have the following error on run :
Error: llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291

Do you have an idea of the origin of this issue ?

Thanks.

you'll have to wait for an update to Ollama, this uses the rope fixes from llama.cpp master branch and breaks backwards compatibility

LM Studio has released 0.2.29 which supports the new modles

Thanks for your answer.
Do you plan to release the 70b too ?

I don't think the author made a 70b

I have had the same error message using llama3.1 from unsloth. I was trying to implement the example from the official site from the unsloth git:
https://github.com/unslothai/unsloth -> https://colab.research.google.com/drive/1Ys44kVvmeZtnICzWz0xgpRnrIOjZAuxp?usp=sharing
and the code from the youtuber Mervin:https://www.youtube.com/@MervinPraison -> https://mer.vin/2024/07/llama-3-1-fine-tune/

So unsloth was done with conversion and there was no error in both codes by creating the gguf file.

I was trying both, mervins code and the official code to load the gguf from unsloth to ollama, both with the same error:
Error: llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291

Since unsloth implemented an automation to load llama.cpp by calling their functions, I had no idea what kind of version they loaded.
So I went in the llama.cpp directory (I have linux so it was "cd llama.cpp" - search for the llama.cpp folder in your project of course)
and then I executed: sudo git reset --hard 46e12c4692a37bdd31a0432fc5153d7d22bc7f72

And yes, I was asking chatGPT to help me with that problem. I am very happy, that it is working right now, but developing in this field seems to be not staple for the next years. I hope it will work on your system as well!
Best greetings
Matthias

ν˜Ήμ‹œ κ·Έλƒ₯ κΈ°λ‹€λ €μ•Ό λœλ‹€λŠ” λ§μ”€μ΄μ‹ κ±΄κ°€μš”?
ValueError: Error raised by inference API HTTP code: 500, {"error":"llama runner process has terminated: signal: aborted (core dumped) error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291"}

저도 이런 μ—λŸ¬ ν˜„μƒμ΄ λ°œμƒν•΄μ„œμš”

Hi, has this issue been solved? I am trying to run Llama 3.1 8b using llama_cpp. I downloaded a number of models (e.g. Meta-Llama-3.1-8B-Instruct-Q6_K.gguf), but keep getting:
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
llama_load_model_from_file: failed to load model
And in the terminal: AttributeError: 'Llama' object has no attribute '_lora_adapter'

Thanks in advance for any help!

Never seen that Lora adapter issue...

Can you share your exact commands?

# ollama run hillct/dolphin-llama-3.1
pulling manifest 
pulling b4cc1324cbb5... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 8.5 GB                         
pulling 62fbfd9ed093... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  182 B                         
pulling 9640c2212a51... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   41 B                         
pulling 4fa551d4f938... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  12 KB                         
pulling f02dd72bb242... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   59 B                         
pulling 67c41d573b3c... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  559 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success 
Error: llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
llama_load_model_from_file: exception loading model
# ollama --version
ollama version is 0.3.3
# ollama run CognitiveComputations/dolphin-llama3.1:8b-v2.9.4-Q3_K_L
pulling manifest 
pulling 33acc6f7959f... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 4.3 GB                         
pulling 13584952422b... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  131 B                         
pulling 7d9b917757c7... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   76 B                         
pulling 94e5d463b8ac... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  413 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success 
Error: llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
llama_load_model_from_file: exception loading model
# ollama run CognitiveComputations/dolphin-llama3.1:8b-v2.9.4-Q8_0
pulling manifest 
pulling b4cc1324cbb5... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 8.5 GB                         
pulling 13584952422b... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  131 B                         
pulling 7d9b917757c7... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–   76 B                         
pulling 24f881ee6123... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–  411 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success 
Error: llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
llama_load_model_from_file: exception loading model

This is fixed in latest ollama

ollama run hillct/dolphin-llama-3.1

pulling manifest
pulling b4cc1324cbb5... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 8.5 GB
pulling 62fbfd9ed093... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 182 B
pulling 9640c2212a51... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 41 B
pulling 4fa551d4f938... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 12 KB
pulling f02dd72bb242... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 59 B
pulling 67c41d573b3c... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 559 B
verifying sha256 digest
writing manifest
removing any unused layers
success
Error: llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
llama_load_model_from_file: exception loading model

ollama run CognitiveComputations/dolphin-llama3.1

pulling manifest
pulling c4e04968e3ca... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 4.7 GB
pulling 13584952422b... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 131 B
pulling 66112031815b... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 159 B
pulling e3bd59e71f09... 100% β–•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– 411 B
verifying sha256 digest
writing manifest
removing any unused layers
success
Error: llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
llama_load_model_from_file: exception loading model

ollama --version

ollama version is 0.3.6

$ wget -c https://huggingface.co/bartowski/Llama-3.1-8B-Lexi-Uncensored-GGUF/resolve/main/Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf
$ bat ModelFile
───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       β”‚ File: ModelFile
───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1   β”‚ FROM ./Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf
   2   β”‚
   3   β”‚ TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>
   4   β”‚
   5   β”‚ {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>
   6   β”‚
   7   β”‚ {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>
   8   β”‚
   9   β”‚ {{ .Response }}<|eot_id|>"""
  10   β”‚
  11   β”‚ PARAMETER num_ctx 16000
  12   β”‚ PARAMETER stop "<|eot_id|>"
  13   β”‚ PARAMETER stop "<|start_header_id|>"
  14   β”‚ PARAMETER stop "<|end_header_id|>"
  15   β”‚ PARAMETER top_k 1
───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
$ ollama create llama3.1-lexi -f ModelFile
transferring model data 100%
writing manifest
success
$ ollama run llama3.1-lexi
>>> Who is yavin?
Yavin is a reference to the planet Yavin 4, which is a key location in the original Star Wars film (Episode IV: A New Hope).

However, I'm assuming you might be asking about Yavin as a character. There are actually two characters named Yavin in the Star Wars universe:

1. **Yavin V**: He was a Jedi Master who lived during the time of the Old Republic. Unfortunately, I couldn't find much information on him.
2. **Yavin IV**: This is likely not what you're looking for, as it's just another name for the planet Yavin 4.

However, there is one more possibility:

**Yavin (also known as Yavin the Hutt)**: He was a Hutt crime lord who appeared in the Star Wars Legends universe.

>>> /exit
$ ollama --version
ollama version is 0.3.6

Oh but you are taking about an different model @Yavin5 .....

My answer was about the OP's issue, I realize a lot of answers are completely unrelated in this thread

# wget -c https://huggingface.co/bartowski/Llama-3.1-8B-Lexi-Uncensored-GGUF/resolve/main/Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf
Connecting to cdn-lfs-us-1.huggingface.co (cdn-lfs-us-1.huggingface.co)|2600:9000:236b:1e00:17:9a40:4dc0:93a1|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4781626464 (4.5G) [binary/octet-stream]
Saving to: β€˜Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf’

Llama-3.1-8B-Lexi-U 100%[===================>]   4.45G  33.4MB/s    in 2m 30s  

2024-08-21 14:09:26 (30.4 MB/s) - β€˜Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf’ saved [4781626464/4781626464]

# ollama create llama3.1-lexi -f modelfile-melmass 
transferring model data 100% 
using existing layer sha256:f823cc9ddc2c5c9953a7f1cd171a710128741b15d8023c5ecc2e3808859a27c5 
creating new layer sha256:8ab4849b038cf0abc5b1c9b8ee1443dca6b93a045c2272180d985126eb40bf6f 
creating new layer sha256:6774f82e80c4d5ffeab1dafd3a4dd0e843ba529edc74273811a567af32402b68 
creating new layer sha256:48714da7a6f14bfb596fea79157ed9406e1a51b49792c97bb9519bf7deaa5739 
writing manifest 
success 

# ollama run llama3.1-lexi
Error: llama runner process has terminated: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291
llama_load_model_from_file: exception loading model

# ollama --version
ollama version is 0.3.6

You're still claiming the problem is fixed in the latest version when it apparently isn't.

The "regular" ollama image named "llama3.1:latest" apparently works on this version however:

# ollama run llama3.1:latest
>>> who is melmass?
I couldn't find any notable or well-known person named Melmass. It's 
possible that you may have misspelled the name, or it could be a less 
common or private individual.

If you could provide more context or information about who Melmass is or 
what they are known for, I'd be happy to try and help you further!

this seems impossible, you're both running the same ollama version and both downloaded the same file yet somehow are getting different results??

I assume that you creating the modelfile in a different way isn't affecting it?

# cat modelfile-melmass 
FROM ./Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf

TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|>

{{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|>

{{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|>

{{ .Response }}<|eot_id|>"""

PARAMETER num_ctx 16000
PARAMETER stop "<|eot_id|>"
PARAMETER stop "<|start_header_id|>"
PARAMETER stop "<|end_header_id|>"
PARAMETER top_k 1

# ls -la Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf 
-rw-r--r--. 1 root root 4781626464 Jul 28 00:25 Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf

Anything else you'd like me to try / output?

i mean, just for the hell of it, sha256sum would be nice but i can't imagine how you're downloading it fresh and ending up with an old file...

I had an older version of ollama initially and faced the OPs issue, while googling I found this issue and a few others on the ollama repo itself, updated ollama and did what I oultined earlier which fixed the issue.
I also don’t get how Yavin still has the issue, the only thing I can think of is that since the sha of the model itself didn’t change it might use an older β€œcompiled” model (from an earlier ollama create)

# sha256sum Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf
f823cc9ddc2c5c9953a7f1cd171a710128741b15d8023c5ecc2e3808859a27c5  Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf
# ls -l Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf 
-rw-r--r--. 1 root root 4781626464 Jul 28 00:25 Llama-3.1-8B-Lexi-Uncensored-Q3_K_XL.gguf

Sign up or log in to comment