WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.

#25
by kmukeshreddy - opened

I am getting "WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu." while loading the Mixtral to text-genetation pipeline.

You don't have enough GPU memory. Consider renting a GPU, or loading the model in a more efficient way (e.g. in 4-Bit)

I second what @cekal said, you probably don't have enough GPU ram to fit the model, try either to load it with smaller precision (e.g. float16 or load_in_4bit, or using the serialized 4-bit here: https://huggingface.co/ybelkada/Mixtral-8x7B-Instruct-v0.1-bnb-4bit)

Hi @ybelkada

Any idea what is the minimum system requirement to run this model (for e.g. GPU, etc..) ? I am trying to run below python code using streamlit and I get the above error (or warning, I would say) -

import streamlit as st
from langchain import PromptTemplate, LLMChain
from langchain import HuggingFacePipeline
from transformers import AutoTokenizer
import transformers
import torch

token = ""

model = "meta-llama/Llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(model)

pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
max_length=1000,
eos_token_id=tokenizer.eos_token_id
)

llm = HuggingFacePipeline(pipeline = pipeline, model_kwargs = {'temperature':0})

template = """
You are an intelligent chatbot that gives out useful information to humans.
You return the responses in sentences with arrows at the start of each sentence
{query}
"""

prompt = PromptTemplate(template=template, input_variables=["query"])

llm_chain = LLMChain(prompt=prompt, llm=llm)

print(llm_chain.invoke('What are the 3 causes of glacier meltdowns?'))

Sign up or log in to comment