Failed to load

#3
by Priderock - opened

llama_model_loader: - type f32: 37 tensors
llama_model_loader: - type q8_0: 127 tensors
error loading model: unknown model architecture: 'gemma'
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'gemma-2b-it-q8_0.gguf'
{"timestamp":1708650706,"level":"ERROR","function":"load_model","line":590,"message":"unable to load model","model":"gemma-2b-it-q8_0.gguf"}

Sorry, I don't think you're using this repo because I haven't uploaded the Q8_0 version yet! :)

llm_load_tensors: using CUDA for GPU acceleration
error loading model: create_tensor: tensor 'output.weight' not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/ai/llama/models/gemma/gemma-7b-it.Q5_K_M.gguf'
main: error: unable to load model

Just pushed a new version that solves your problem @DKingg (and overall good perplexity).

Just pushed a new version that solves your problem @DKingg (and overall good perplexity).

hello bro,can you share you convert method here? because I use llama.cpp to convert gemma-7b-it list this: python convert.py /home/jovyan/share-pvc-dutianwei/models/huggingface/google/gemma-7b-it,it generate a file named ggml-model-f16.gguf, everything looks good ,but when I load with

  • ./server
    - '-m'
    - >-
    /home/jovyan/models/models/huggingface/google/gemma-7b-it/ggml-model-f16.gguf
    - '-c'
    - '4096'
    - '--host'
    - 0.0.0.0
    - '--port'
    - '8000'
    - '-ngl'
    - '100'

it shows error like llama_model_load: error loading model: create_tensor: tensor 'output.weight' not found
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/home/jovyan/models/models/huggingface/google/gemma-7b-it/ggml-model-f16.gguf'
terminate called without an active exception
{"timestamp":1710482521,"level":"ERROR","function":"load_model","line":375,"message":"unable to load model","model":"/home/jovyan/models/models/huggingface/google/gemma-7b-it/ggml-model-f16.gguf"},how to solve it ?please give me some help, thanks~~~

@totoro191 Sure, the fix should be simple. You need to use convert-hf-to-gguf.py instead of convert.py to create the f16 model.

Sign up or log in to comment