google/gemma-7b-it · ValueError: Trying to set a tensor of shape torch.Size([4096, 3072]) in "weight" (which has shape torch.Size([6291456, 1])), this look incorrect.

I was trying to use langchain with HuggingFaceLLM wrapper to experiment with Gemma-7B model. In colab, the model worked fine but on my laptop it is giving the above-mentioned error. I am unable to debug it. My laptop has 6GB GPU RAM and 32 GB CPU RAM. The llm model was built with the following code:

quantization_config = BitsAndBytesConfig(load_in_4bit=True,
llm_int8_enable_fp32_cpu_offload=True,
bnb_4bit_use_double_quant=True)

llm = HuggingFaceLLM(
context_window=4096,
max_new_tokens=50,
generate_kwargs={"do_sample": False},
system_prompt=system_prompt,
query_wrapper_prompt=query_wrapper_prompt,
tokenizer_name="google/gemma-7b",
model_name="google/gemma-7b",
device_map="auto",
model_kwargs={"torch_dtype": torch.float16, "quantization_config": quantization_config}
)
It gave a warning: "Some parameters are on the meta device device because they were offloaded to the CPU". This is understandable. No such warning was there in Colab as Colab gives larger GPU memory.

I created the index and the queries as below:

query_engine = index.as_query_engine()
r = query_engine.query(query)

Then it gave the dimension mismatch error.

Please suggest how to get past this problem.

The installed packages are:

accelerate==0.28.0
aiohttp==3.9.3
aiosignal==1.3.1
annotated-types==0.6.0
anyio==4.3.0
asttokens==2.4.1
async-timeout==4.0.3
attrs==23.2.0
backcall==0.2.0
beautifulsoup4==4.12.3
bitsandbytes==0.43.0
bs4==0.0.2
certifi==2024.2.2
charset-normalizer==3.3.2
click==8.1.7
comm==0.2.2
dataclasses-json==0.6.4
debugpy==1.6.7
decorator==5.1.1
Deprecated==1.2.14
dirtyjson==1.0.8
distro==1.9.0
einops==0.7.0
entrypoints==0.4
exceptiongroup==1.2.0
executing==2.0.1
filelock==3.13.1
frozenlist==1.4.1
fsspec==2024.3.1
greenlet==3.0.3
h11==0.14.0
httpcore==1.0.4
httpx==0.27.0
huggingface-hub==0.20.3
idna==3.6
install==1.3.5
ipykernel==6.29.3
ipython==8.12.0
jedi==0.19.1
Jinja2==3.1.3
joblib==1.3.2
jsonpatch==1.33
jsonpointer==2.4
jupyter-client==7.3.4
jupyter_core==5.7.2
langchain==0.1.13
langchain-community==0.0.29
langchain-core==0.1.33
langchain-text-splitters==0.0.1
langsmith==0.1.31
llama-index==0.10.23
llama-index-agent-openai==0.1.7
llama-index-cli==0.1.11
llama-index-core==0.10.23.post1
llama-index-embeddings-huggingface==0.1.4
llama-index-embeddings-langchain==0.1.2
llama-index-embeddings-openai==0.1.7
llama-index-indices-managed-llama-cloud==0.1.5
llama-index-legacy==0.9.48
llama-index-llms-huggingface==0.1.4
llama-index-llms-openai==0.1.12
llama-index-multi-modal-llms-openai==0.1.4
llama-index-program-openai==0.1.4
llama-index-question-gen-openai==0.1.3
llama-index-readers-file==0.1.12
llama-index-readers-llama-parse==0.1.3
llama-parse==0.3.9
llamaindex-py-client==0.1.13
MarkupSafe==2.1.5
marshmallow==3.21.1
matplotlib-inline==0.1.6
mpmath==1.3.0
multidict==6.0.5
mypy-extensions==1.0.0
nest_asyncio==1.6.0
networkx==3.2.1
nltk==3.8.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.4.99
nvidia-nvtx-cu12==12.1.105
openai==1.14.2
orjson==3.9.15
packaging==23.2
pandas==2.2.1
parso==0.8.3
pexpect==4.9.0
pickleshare==0.7.5
pillow==10.2.0
pip==23.3.1
platformdirs==4.2.0
prompt-toolkit==3.0.42
psutil==5.9.0
ptyprocess==0.7.0
pure-eval==0.2.2
pydantic==2.6.4
pydantic_core==2.16.3
Pygments==2.17.2
PyMuPDF==1.24.0
PyMuPDFb==1.24.0
pypdf==4.1.0
python-dateutil==2.9.0
pytz==2024.1
PyYAML==6.0.1
pyzmq==25.1.2
regex==2023.12.25
requests==2.31.0
safetensors==0.4.2
scikit-learn==1.4.1.post1
scipy==1.12.0
sentence-transformers==2.6.0
setuptools==68.2.2
six==1.16.0
sniffio==1.3.1
soupsieve==2.5
SQLAlchemy==2.0.29
stack-data==0.6.2
striprtf==0.0.26
sympy==1.12
tenacity==8.2.3
threadpoolctl==3.4.0
tiktoken==0.6.0
tokenizers==0.15.2
torch==2.2.1
tornado==6.1
tqdm==4.66.2
traitlets==5.14.2
transformers==4.39.1
triton==2.2.0
typing_extensions==4.10.0
typing-inspect==0.9.0
tzdata==2024.1
urllib3==2.2.1
wcwidth==0.2.13
wheel==0.41.2
wrapt==1.16.0
yarl==1.9.4

Thanks in advance

Subhasis