Model Output is Changed

#116

by AnzaniAI - opened Feb 1

Feb 1

As i have used mistralai/Mixtral-8x7B-Instruct-v0.1 for my use case NER, 10 days back it was giving correct output as per prompt but now with same prompt it is giving long text output. What can be the issue?

YaTharThShaRma999

Feb 1

@AnzaniAI you might have different sampling parameters like higher temp or top p. Make sure that’s the same as before

ArthurZ

Feb 2

Would recommend you to upgrade your transformers version as well !

gzguevara

Feb 3

@AnzaniAI why do you use mistralai/Mixtral-8x7B-Instruct-v0.1 for NER? Why not mistralai/Mixtral-8x7B-v0.1?

AnzaniAI

Feb 13

@gzguevara
If I use the mistralai/Mixtral-8x7B-v0.1 then it throw follwing error
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom
device_map to from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-g

pu
for more details.

AnzaniAI

Feb 13

@ArthurZ
transformer upgrading throws following error

ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom
device_map to from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.

AnzaniAI

Feb 13

@YaTharThShaRma999
I am using lowest temperature=0.01. Still persisting that ouput

AnzaniAI

Feb 13

Hi communities,
The model is actually giving the correct output but it carries the question and context in the output that I have passed in llm_chain.run
following is the code block
for i in tqdm(data.index):

    template = data["system_prompt"][i]
    context = data["text"][i]
    question = data["user_msg"][i]
    prompt = PromptTemplate(template=template, input_variables=["question","context"])

    llm_chain = LLMChain(prompt=prompt,llm=llm)
    entities = {"question":question,"context":context}
    response = llm_chain.run(entities)

ShabnamHugs

Jul 26

•

edited Jul 26

@AnzaniAI

Could you please tell us if the error is fixed? I have a similar issue.

raise ValueError(
ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to from_pretrained. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.

ShabnamHugs

Jul 26

This comment has been hidden

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment