GGML models become dumb when used in python.

#5
by supercharge19 - opened

I am struggling with the issue of models not following instructions at all when they are used in python, however, they work much better when they are used in a shell (like cmd, or powershell).

python examples:
Q: llm("Can you solve math questions?")
R: '\nCan you solve these math questions?'

Q: llm("what is (4.5*2.1)^2.2?")
A: Long text output omitted. It was just not related to question. It just asked more questions instead of answering the question.

I am trying to use it with langchain as llm for agent, however, models are acting too dumb. I should be able to get a correct answer of the following:

from langchain.agents import load_tools

tools = load_tools(
    ['llm-math'],
    llm=llm
)

from langchain.agents import initialize_agent

zero_shot_agent = initialize_agent(
    agent="zero-shot-react-description",
    tools=tools,
    llm=llm,
    verbose=True,
    max_iterations=3
)
zero_shot_agent("what is (4.5*2.1)^2.2?")

The response I get:

Entering new AgentExecutor chain...
Llama.generate: prefix-match hit
 let's get the calculator out!
Action: [Calculator]
Action Input: 4.5 and 2.1 as a ratio
Observation: [Calculator] is not a valid tool, try another one.
Thought:Llama.generate: prefix-match hit

omitting large output

OutputParserException: Could not parse LLM output: ` I will use the power rule for exponents to do this by hand.
Action: (4.5*2.1)^2.2 = 4.5*2.1^2.2`

Is there a way to overcome this problem, but I want to use GGML model (or any model that can be run on cpu locally). Model that I got this outputs as above is manticore 13b q4_0. (though I am sure that larger models i.e. more bits eg 5 or 8 would not be any better). Also, this kind of error (OutputParserException only occours when I use a notebook (ipynb or google colab) I usually encounter a different problem when the code is run in python REPL (through cmd or powershell). The problem I encounter when running code in REPL is that langchain just can't use my tools. For example for my quetion zero_shot_agent("what is (4.5*2.1)^2.2?") I get outputs like

 I should use the calculator for this math problem.
Action: [Calculator]
Action Input: press the equals button and type in 4.5 and 2.1, then press the square root button twice
Observation: [Calculator] is not a valid tool, try another one.

 I will use a regular calculator.
Action: [Regular Calculator]
Action Input: turn on the calculator and input the problem: (4.5*2.1)^2.2
Observation: [Regular Calculator] is not a valid tool, try another one.

 I will use my phone's calculator app.
Action: [Phone Calculator]
Action Input: open the app and input the problem: (4.5*2.1)^2.2
Observation: [Phone Calculator] is not a valid tool, try another one.
Thought:

> Finished chain.
{'input': 'what is (4.5*2.1)^2.2?', 'output': 'Agent stopped due to iteration limit or time limit.'}

Though it stopped at third iteration (try) to solve the problem, however, I don't see any value for letting it run longer.

Hey @supercharge19 I have nothing to help you with this issue but I am kind of starting in this worlds an I would like to try this models local.
I have a code snippet that stole from other tutorial, I managed to make an agent respond using OpenAI model and also others but I was trying to use now Vicuna and I get an error while trying to load the model.

I have the following code:

checkpoint = "./wizard-vicuna-13B-uncensored/"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
base_model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint,device_map='auto',torch_dtype=torch.float32)
llm = HuggingFacePipeline.from_model_id(model_id=checkpoint,task = 'text2text-generation',model_kwargs={"temperature":0.60,"min_length":30, "max_length":600, "repetition_penalty": 5.0})

Could you point out what is wrong here? It is trying to find a "config.json" file which is not there but not sure what to change to make it work.

image.png

inside folder "wizard-vicuna-13B-uncensored" is the model file named as "pytorch_model"

if you can share me how you are using it would be enough also

Where did you get the model from? If you obtained from ehartford (ehartford/Wizard-Vicuna-13B-Uncensored) then this contains required json file. Also, the code you are using works only with hugging face models i.e. uploaded to huggingface directly after training these models are not gguf or ggml files, first ensure that you are downloading correct format for your code. Otherwise, you can try using ctranformers or llama_cpp_python code (look them up on github).

Sign up or log in to comment