ValueError: Trying to set a tensor of shape torch.Size([4096, 3072]) in "weight" (which has shape torch.Size([6291456, 1])), this look incorrect.
I was trying to use langchain with HuggingFaceLLM wrapper to experiment with Gemma-7B model. In colab, the model worked fine but on my laptop it is giving the above-mentioned error. I am unable to debug it. My laptop has 6GB GPU RAM and 32 GB CPU RAM. The llm model was built with the following code:
quantization_config = BitsAndBytesConfig(load_in_4bit=True,
llm_int8_enable_fp32_cpu_offload=True,
bnb_4bit_use_double_quant=True)
llm = HuggingFaceLLM(
context_window=4096,
max_new_tokens=50,
generate_kwargs={"do_sample": False},
system_prompt=system_prompt,
query_wrapper_prompt=query_wrapper_prompt,
tokenizer_name="google/gemma-7b",
model_name="google/gemma-7b",
device_map="auto",
model_kwargs={"torch_dtype": torch.float16, "quantization_config": quantization_config}
)
It gave a warning: "Some parameters are on the meta device device because they were offloaded to the CPU". This is understandable. No such warning was there in Colab as Colab gives larger GPU memory.
I created the index and the queries as below:
query_engine = index.as_query_engine()
r = query_engine.query(query)
Then it gave the dimension mismatch error.
Please suggest how to get past this problem.
The installed packages are:
accelerate==0.28.0
aiohttp==3.9.3
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.3.0
asttokens==2.4.1
async-timeout==4.0.3
attrs==23.2.0
backcall==0.2.0
beautifulsoup4==4.12.3
bitsandbytes==0.43.0
bs4==0.0.2
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
comm==0.2.2
dataclasses-json==0.6.4
debugpy==1.6.7
decorator==5.1.1
Deprecated==1.2.14
dirtyjson==1.0.8
distro==1.9.0
einops==0.7.0
entrypoints==0.4
exceptiongroup==1.2.0
executing==2.0.1
filelock==3.13.1
frozenlist==1.4.1
fsspec==2024.3.1
greenlet==3.0.3
h11==0.14.0
httpcore==1.0.4
httpx==0.27.0
huggingface-hub==0.20.3
idna==3.6
install==1.3.5
ipykernel==6.29.3
ipython==8.12.0
jedi==0.19.1
Jinja2==3.1.3
joblib==1.3.2
jsonpatch==1.33
jsonpointer==2.4
jupyter-client==7.3.4
jupyter_core==5.7.2
langchain==0.1.13
langchain-community==0.0.29
langchain-core==0.1.33
langchain-text-splitters==0.0.1
langsmith==0.1.31
llama-index==0.10.23
llama-index-agent-openai==0.1.7
llama-index-cli==0.1.11
llama-index-core==0.10.23.post1
llama-index-embeddings-huggingface==0.1.4
llama-index-embeddings-langchain==0.1.2
llama-index-embeddings-openai==0.1.7
llama-index-indices-managed-llama-cloud==0.1.5
llama-index-legacy==0.9.48
llama-index-llms-huggingface==0.1.4
llama-index-llms-openai==0.1.12
llama-index-multi-modal-llms-openai==0.1.4
llama-index-program-openai==0.1.4
llama-index-question-gen-openai==0.1.3
llama-index-readers-file==0.1.12
llama-index-readers-llama-parse==0.1.3
llama-parse==0.3.9
llamaindex-py-client==0.1.13
MarkupSafe==2.1.5
marshmallow==3.21.1
matplotlib-inline==0.1.6
mpmath==1.3.0
multidict==6.0.5
mypy-extensions==1.0.0
nest_asyncio==1.6.0
networkx==3.2.1
nltk==3.8.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.4.99
nvidia-nvtx-cu12==12.1.105
openai==1.14.2
orjson==3.9.15
packaging==23.2
pandas==2.2.1
parso==0.8.3
pexpect==4.9.0
pickleshare==0.7.5
pillow==10.2.0
pip==23.3.1
platformdirs==4.2.0
prompt-toolkit==3.0.42
psutil==5.9.0
ptyprocess==0.7.0
pure-eval==0.2.2
pydantic==2.6.4
pydantic_core==2.16.3
Pygments==2.17.2
PyMuPDF==1.24.0
PyMuPDFb==1.24.0
pypdf==4.1.0
python-dateutil==2.9.0
pytz==2024.1
PyYAML==6.0.1
pyzmq==25.1.2
regex==2023.12.25
requests==2.31.0
safetensors==0.4.2
scikit-learn==1.4.1.post1
scipy==1.12.0
sentence-transformers==2.6.0
setuptools==68.2.2
six==1.16.0
sniffio==1.3.1
soupsieve==2.5
SQLAlchemy==2.0.29
stack-data==0.6.2
striprtf==0.0.26
sympy==1.12
tenacity==8.2.3
threadpoolctl==3.4.0
tiktoken==0.6.0
tokenizers==0.15.2
torch==2.2.1
tornado==6.1
tqdm==4.66.2
traitlets==5.14.2
transformers==4.39.1
triton==2.2.0
typing_extensions==4.10.0
typing-inspect==0.9.0
tzdata==2024.1
urllib3==2.2.1
wcwidth==0.2.13
wheel==0.41.2
wrapt==1.16.0
yarl==1.9.4
Thanks in advance
Subhasis
Hi
@Subhasisdasgupta
, The device_map="auto" setting automatically splits the model across available devices. However, if your GPU memory is limited (6GB), it might cause issues when layers are distributed in ways that cause tensor dimension mismatches. You can try explicitly setting the device_map to a simpler configuration.
device_map={"": "cpu"}
to force CPU usage. Kindly try and let us know if the issue still persists. Thank you.