Problem

#2
by TheLeCraft999 - opened

I tried to use the model and evertime i get when i print out "result" something like this: <generator object Model.generate at 0x000001EF1A79CA00>
can you tell me the problem please?

@TheLeCraft999 From your question i guess you are new to python, so i tried to keep it simple.

pyllamacpp had some changes, the result object now seams to be a python c object. I would recommend switching to llama-cpp-python as it is the official supported llama.cpp binding. You should read their documentation to better understand how to use the library.

Downloading the model works similarly. The following code should work:

from huggingface_hub import hf_hub_download
from llama_cpp import Llama

#Download the model
hf_hub_download(repo_id="LLukas22/gpt4all-lora-quantized-ggjt", filename="ggjt-model.bin", local_dir=".")

#Load the model
model = Llama(model_path="ggjt-model.bin")

#Generate
prompt="User: How are you doing?\nBot:"

result=model(prompt, max_tokens=50, stop=["User:"])
#Print result with additional infos
print(result)

#Get only the generated text
generated_text = result["choices"][0]["text"]
print(generated_text )

I'm also working on some llama-rs bindings which will hopefully simply the process of downloading and executing models further, but they are not ready yet.

Thanks, that works , but i always get the same answer
"Please be patient as I may take a moment to respond fully"

suggestions ?

it gave me results (not to long of a wait) when adding some params

result=model(prompt, max_tokens=50, stop=["User:"],
    temperature=0.9,
    top_p=0.95,
    repeat_penalty=1.2,
    top_k=50,
    echo=True)
LLukas22 changed discussion status to closed

Sign up or log in to comment