error in loading the model using colab

#2
by prakash1524 - opened

OSError: TheBloke/Llama-2-7B-GGML does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

getting this error while loading the model using the same code mentioned

Load model directly

from transformers import AutoModel
model = AutoModel.from_pretrained("TheBloke/Llama-2-7B-GGML")

this is not for transformers, this is for llama cpp, ctransformers, llama cpp python, text generation webui.

This comment has been hidden

can we fine tune this model (ggml model) or we will have to fine tune original model then convert it to ggml format as mentioned on some github repo,just a beginner doubt?
i finetuned a sharded model but not able to fine tune this model getting error TheBloke/Llama-2-7B-GGML does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack.

You can't fine tune a GGML model using Python/transformers code. There is some training support in llama.cpp that you might be able to use, but I don't have any experience of it.

The general procedure is:

  • Fine tune the original unquantised model. This can be done as:
    • a full training in float16 - very expensive
    • a LoRA adapter training in float16 - less expensive
    • or a LoRA training in 4-bit, known as QLoRA - much cheaper.
  • Whichever is chosen, the result will be a new unquantised float16 model
  • That can then be quantised in GGML, and that GGML then used for inference.

Sign up or log in to comment