TheBloke/LMCocktail-phi-2-v1-GGUF · llama model type loading in marella's ctransformers

Jan 3

Thought I'd give it a try. Didn't expect much. But I'm finding that ctransformers is not able to load the 8-bit GGUF of this model. I don't recall what is the family for phi. LLama? It's been a while since I last looked at it.

While in google colab:

import time
from ctransformers import AutoModelForCausalLM

load_tm = time.time()

llm = AutoModelForCausalLM.from_pretrained(

    "TheBloke/LMCocktail-phi-2-v1-GGUF",
    model_file="lmcocktail-phi-2-v1.Q8_0.gguf",

    model_type="llama", # ?
    )

load_rt_sec = time.time() - load_tm
if load_rt_sec > 60:
  print(f"\nDownload and/or Loading runtime (min): {load_rt_sec / 60}")
else:
  print(f"\nDownload and/or Loading runtime (sec): {load_rt_sec}")

error:
```

RuntimeError Traceback (most recent call last)
in <cell line: 3>()
1 load_tm = time.time()
2
----> 3 llm = AutoModelForCausalLM.from_pretrained(
4
5 "TheBloke/LMCocktail-phi-2-v1-GGUF",

1 frames
/usr/local/lib/python3.10/dist-packages/ctransformers/llm.py in init(self, model_path, model_type, config, lib)
251 )
252 if self._llm is None:
--> 253 raise RuntimeError(
254 f"Failed to create LLM '{model_type}' from '{model_path}'."
255 )

RuntimeError: Failed to create LLM 'llama' from '/root/.cache/huggingface/hub/models--TheBloke--LMCocktail-phi-2-v1-GGUF/blobs/05d6680da2235732940781679c7925e140fad0ff087cbf10961942a644dca7b6'.


Maybe a conversation for marella on github.

YaTharThShaRma999

Jan 5

@dyoung I believe ctransformers does not support phi currently. Your best bet is to use llama cpp python if you want a ctransformers like experience. Also it should be considerably faster then ctransformers.

dyoung

Jan 5

•

edited Jan 5

@YaTharThShaRma999 I found that the phi-2 model is close to the gpt-j/neo family. Also, I've been able to get llama-cpp and llama-cpp-python working on my end. Due to my setup, I was dreading having to do what was needed to get GPU inference with llama-cpp. But because ctransformers is running a bit behind, I bit the bullet and figured out how to get llama-cpp going with my GPU hardware. Which was painful! Even with experience doing setups like this. lol. I'm glad I did it though. It's been fun playing with some of the new models. And it brushed up on my skills a bit. And I learned some new things.
Thanks for taking the time to reply.

dyoung changed discussion status to closed Jan 5

llama model type loading in marella's ctransformers

error:```

error:
```