Error while using CTransformers: Model file 'llama-2-7b-chat.q4_K_M.gguf' not found

#7
by gaukelkar - opened

I have copy-pasted below code from the sample provided here-

llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GGUF", model_file="llama-2-7b-chat.q4_K_M.gguf", model_type="llama", gpu_layers=50)

Getting error that the model file is not found. The stack trace looks as follows:

Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/gradio/queueing.py", line 406, in call_prediction
output = await route_utils.call_process_api(
File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/gradio/route_utils.py", line 217, in call_process_api
output = await app.get_blocks().process_api(
File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/gradio/blocks.py", line 1553, in process_api
result = await self.call_function(
File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/gradio/blocks.py", line 1191, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/gradio/utils.py", line 659, in wrapper
response = f(*args, **kwargs)
File "/home/user/app/app.py", line 11, in answer
chat_agent = Llama_chat.Llama_chat()
File "/home/user/app/Llama_chat.py", line 49, in init
self.llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7b-Chat-GGUF", model_file="llama-2-7b-chat.q4_K_M.gguf", model_type="llama", gpu_layers=50)
File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/ctransformers/hub.py", line 168, in from_pretrained
model_path = cls._find_model_path_from_repo(
File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/ctransformers/hub.py", line 209, in _find_model_path_from_repo
return cls._find_model_path_from_dir(path, filename=filename)
File "/home/user/.pyenv/versions/3.10.13/lib/python3.10/site-packages/ctransformers/hub.py", line 242, in _find_model_path_from_dir
raise ValueError(f"Model file '{filename}' not found in '{path}'")
ValueError: Model file 'llama-2-7b-chat.q4_K_M.gguf' not found in '/home/user/.cache/huggingface/hub/models--TheBloke--Llama-2-7b-Chat-GGUF/snapshots/9ca625120374ddaae21f067cb006517d14dc91a6'

Sorry, this is a typo in the readme. The filename is llama-2-7b-chat.Q4_K_M.gguf (capital Q)

Thanks for the prompt response! That error is resolved now. Still getting error in ctransformers though :)

How does one create a Tokenizer for it?, This:

    model = AutoModelForCausalLM.from_pretrained(
        "TheBloke/Llama-2-7b-Chat-GGUF",
        model_file="llama-2-7b-chat.Q4_K_M.gguf",
        model_type="llama",
        gpu_layers=50,
        hf=True
    )

    # Initialize the LlamaTokenizer with default settings
    tokenizer = AutoTokenizer.from_pretrained(model)

    return model, tokenizer

Results in a NotImplementedError:

File ".../lib/python3.11/site-packages/ctransformers/hub.py", line 268, in from_pretrained
    return CTransformersTokenizer(model._llm)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Sign up or log in to comment