Trying to use llama-2-7b-chat.Q4_K_M.gguf with/without tensorflow weights

#33
by cgthayer - opened

n00bie question:
The libs think this has tensorflow weights, but "from_tf=True" doesn't resolve.
What am I doing wrong here?

from transformers import AutoModelForCausalLM
model_file = "llama-2-7b-chat.Q4_K_M.gguf"
model = AutoModelForCausalLM.from_pretrained(
    "TheBloke/Llama-2-7b-Chat-GGUF", model_file=model_file, model_type="llama", gpu_layers=50, from_tf=True)'''

Gives me (on google colab):
```

OSError Traceback (most recent call last)
in <cell line: 3>()
1 from transformers import LlamaForCausalLM, LlamaTokenizer, AutoModelForCausalLM
2 model_file = "llama-2-7b-chat.Q4_K_M.gguf"
----> 3 model = AutoModelForCausalLM.from_pretrained(
4 "TheBloke/Llama-2-7b-Chat-GGUF", model_file=model_file, model_type="llama", gpu_layers=50, from_tf=True)
5

1 frames
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py in from_pretrained(cls, pretrained_model_name_or_path, config, cache_dir, ignore_mismatched_sizes, force_download, local_files_only, token, revision, use_safetensors, *model_args, **kwargs)
3384 }
3385 if has_file(pretrained_model_name_or_path, TF2_WEIGHTS_NAME, **has_file_kwargs):
-> 3386 raise EnvironmentError(
3387 f"{pretrained_model_name_or_path} does not appear to have a file named"
3388 f" {_add_variant(WEIGHTS_NAME, variant)} but there is a file for TensorFlow weights."

OSError: TheBloke/Llama-2-7b-Chat-GGUF does not appear to have a file named pytorch_model.bin but there is a file for TensorFlow weights. Use from_tf=True to load this model from those weights.


I get this error with or without the "from_tf=True", did this parameter name change without an update to the EnvironmentError?

@cgthayer yeah the problem is huggingface does not support gguf models, and also I would not recommend using llama 2 7b since a MUCH better llama 3 8b came out. Its at least 2-3x better and not as censored. For gguf files, just search llama 3 8b gguf in huggingface.

To use gguf models, you can use llama.cpp or anything that uses it(text generation web ui, llama cpp python, lm studio, and much more)

I get this error with or without the "from_tf=True", did this parameter name change without an update to the EnvironmentError?

I hope you were able to resolve this error by now, but in case someone faces the same issue in future, here is the solution:

The model can be loaded using ctransformers library, which is different from the standard Huggingface transformers library.
ctransformers is a Python binding for the C++ implementation of Transformers, which is compatible with GGUF files. It's designed to work with quantized models, including those in GGUF format.

The following code should work:

from ctransformers import AutoModelForCausalLM

# Load the model
llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GGUF", 
                                              model_file="llama-2-7b-chat.q4_K_M.gguf", 
                                              model_type="llama", 
                                              gpu_layers=50)

Sign up or log in to comment