Model Output is Changed

#116
by AnzaniAI - opened

As i have used mistralai/Mixtral-8x7B-Instruct-v0.1 for my use case NER, 10 days back it was giving correct output as per prompt but now with same prompt it is giving long text output. What can be the issue?

@AnzaniAI you might have different sampling parameters like higher temp or top p. Make sure that’s the same as before

Would recommend you to upgrade your transformers version as well !

@AnzaniAI why do you use mistralai/Mixtral-8x7B-Instruct-v0.1 for NER? Why not mistralai/Mixtral-8x7B-v0.1?

@gzguevara
If I use the mistralai/Mixtral-8x7B-v0.1 then it throw follwing error
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom
device_map to from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-g
Screenshot (115).png
pu
for more details.

@ArthurZ
transformer upgrading throws following error

ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom
device_map to from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.

@YaTharThShaRma999
I am using lowest temperature=0.01. Still persisting that ouput

Hi communities,
The model is actually giving the correct output but it carries the question and context in the output that I have passed in llm_chain.run
following is the code block
for i in tqdm(data.index):

    template = data["system_prompt"][i]
    context = data["text"][i]
    question = data["user_msg"][i]
    prompt = PromptTemplate(template=template, input_variables=["question","context"])

    llm_chain = LLMChain(prompt=prompt,llm=llm)
    entities = {"question":question,"context":context}
    response = llm_chain.run(entities)

@AnzaniAI

Could you please tell us if the error is fixed? I have a similar issue.

raise ValueError(
ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to from_pretrained. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.

This comment has been hidden

Sign up or log in to comment