Can this model be used with langchain llamacpp ? If so would you be kind enough to provide code. Thanks
Yeah - install llama-cpp-python then here is a quick example:
from llama_cpp import Llama import random llm = Llama(model_path="/path/to/stable-vicuna-13B.ggmlv3.q5_1.bin", n_gpu_layers=40, seed=random.randint(1, 2**31)) tokens = llm.tokenize(b"### Human: Write a story about llamas\n### Assistant:") output = b"" count = 0 for token in llm.generate(tokens, top_k=40, top_p=0.95, temp=0.72, repeat_penalty=1.1): text = llm.detokenize([token]) print(text.decode(), end='', flush=True) output += text count +=1 if count >= 500 or (token == llm.token_eos()): break print("Full response:", output.decode())
Thanks for the code but getting a assertion error . Using llama-cpp-python == 0.1.52. Using the ggmlv3.q5_1 bin file.
assert self.ctx is not None
Would you know if this bin file is compatible with the package version. Thank you for your help
I had that same issue, and had to use the ggmlv2 version. I think you have to build the newer llama.cpp for the ggmlv3, but I could be wrong.
llama-cpp-python got updated to support GGMLv3 about 10 hours ago. Version 0.1.53 supports GGMLv3.
You can install llama-cpp-python 0.1.53 on Windows without compiling with:
pip install https://github.com/abetlen/llama-cpp-python/releases/download/v0.1.53/llama_cpp_python-0.1.53-cp310-cp310-win_amd64.whl
Or yes use
ctransformers, which can be installed with
pip install ctransformers