Failed to load
llama_model_loader: - type f32: 37 tensors
llama_model_loader: - type q8_0: 127 tensors
error loading model: unknown model architecture: 'gemma'
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'gemma-2b-it-q8_0.gguf'
{"timestamp":1708650706,"level":"ERROR","function":"load_model","line":590,"message":"unable to load model","model":"gemma-2b-it-q8_0.gguf"}
Sorry, I don't think you're using this repo because I haven't uploaded the Q8_0 version yet! :)
llm_load_tensors: using CUDA for GPU acceleration
error loading model: create_tensor: tensor 'output.weight' not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/ai/llama/models/gemma/gemma-7b-it.Q5_K_M.gguf'
main: error: unable to load model
Just pushed a new version that solves your problem @DKingg (and overall good perplexity).
hello bro,can you share you convert method here? because I use llama.cpp to convert gemma-7b-it list this: python convert.py /home/jovyan/share-pvc-dutianwei/models/huggingface/google/gemma-7b-it,it generate a file named ggml-model-f16.gguf, everything looks good ,but when I load with
- ./server
- '-m'
- >-
/home/jovyan/models/models/huggingface/google/gemma-7b-it/ggml-model-f16.gguf
- '-c'
- '4096'
- '--host'
- 0.0.0.0
- '--port'
- '8000'
- '-ngl'
- '100'
it shows error like llama_model_load: error loading model: create_tensor: tensor 'output.weight' not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/home/jovyan/models/models/huggingface/google/gemma-7b-it/ggml-model-f16.gguf'
terminate called without an active exception
{"timestamp":1710482521,"level":"ERROR","function":"load_model","line":375,"message":"unable to load model","model":"/home/jovyan/models/models/huggingface/google/gemma-7b-it/ggml-model-f16.gguf"},how to solve it ?please give me some help, thanks~~~
@totoro191
Sure, the fix should be simple. You need to use convert-hf-to-gguf.py
instead of convert.py
to create the f16 model.