Model Output is Changed
As i have used mistralai/Mixtral-8x7B-Instruct-v0.1 for my use case NER, 10 days back it was giving correct output as per prompt but now with same prompt it is giving long text output. What can be the issue?
@AnzaniAI you might have different sampling parameters like higher temp or top p. Make sure that’s the same as before
Would recommend you to upgrade your transformers version as well !
@gzguevara
If I use the mistralai/Mixtral-8x7B-v0.1 then it throw follwing error
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True
and pass a custom
device_map
to from_pretrained
. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-g
pu
for more details.
@ArthurZ
transformer upgrading throws following error
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit
the quantized model. If you want to dispatch the model on the CPU or the disk while keeping
these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True
and pass a custom
device_map
to from_pretrained
. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.
@YaTharThShaRma999
I am using lowest temperature=0.01. Still persisting that ouput
Hi communities,
The model is actually giving the correct output but it carries the question and context in the output that I have passed in llm_chain.run
following is the code block
for i in tqdm(data.index):
template = data["system_prompt"][i]
context = data["text"][i]
question = data["user_msg"][i]
prompt = PromptTemplate(template=template, input_variables=["question","context"])
llm_chain = LLMChain(prompt=prompt,llm=llm)
entities = {"question":question,"context":context}
response = llm_chain.run(entities)
Could you please tell us if the error is fixed? I have a similar issue.
raise ValueError(
ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to from_pretrained. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.