i can't get any logical completion out of this

#1
by andreariboni - opened

i am using llamacpp and trying to prompting this llm

from langchain_community.llms import LlamaCpp
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
self.llm = LlamaCpp(
                model_path="./LLMs/Aether/Cerebrum-1.0-8x7b-Q6_K.gguf",
                n_gpu_layers=-1,
                n_batch=16,
                f16_kv=True,
                n_ctx = 32000,
                temperature = 0.0,
                callback_manager=callback_manager,
                verbose=True
)

however, none of the generated completions is either logical or meaningful at all. Most of the times, it just repeats stuff.

input: "<s> You are an ai assistant\nUser: I have 3 apple and i eat one. How many apples do i have now?\nAI: "
output: "102456789"

input: "<s> You are an ai assistant\nUser: Create a json describing a cat\nAI: "
output: "1.json\n{\n \"name\": \"John\",\n\"age\":30,\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<<<<<<<brain'LBL_\n<<<<<<<a\nLBL_<<<<<<<div>\n\n\n\n\n\n\n<<<<<<<div>"

Interesting, even the 3.0 bpw model with exl2 was giving good answers, lemme try redownloading the GGUF to check

i checked also Q5_K_M and i get nonsensical responses with it too

Yup I'm getting the same as you, that's very interesting, can't imagine why the exl2 would work but GGUF doesn't, will have to investigate, sorry about that!

nice to know that it's not my setup haha, thanks!

Sign up or log in to comment