Dolly + LangChain SQL Chain - RuntimeError: The size of tensor a (2048) must match the size of tensor b (2611) at non-singleton dimension 3

#11

by kevinknights29 - opened Apr 27, 2023

Apr 27, 2023

Hi Team,

I've being playing with dolly v2 3b model and extending its functionality with LangChain, one of those being SQL Chain.

While doing so I've encountered the following error:
RuntimeError: The size of tensor a (2048) must match the size of tensor b (2611) at non-singleton dimension 3

Do you know what might be the cause of this error?

Support information:
Code (Executed on google collab with GPU):

!pip install "accelerate>=0.16.0,<1" "transformers[torch]>=4.28.1,<5" "torch>=1.13.1,<2" langchain 

import torch
from transformers import pipeline

generate_text = pipeline(model="databricks/dolly-v2-3b", torch_dtype=torch.bfloat16,
                         trust_remote_code=True, device_map="auto", return_full_text=True)

from langchain.llms import HuggingFacePipeline

hf_pipeline = HuggingFacePipeline(pipeline=generate_text)

import requests

dataset_url = "https://www.sqlitetutorial.net/wp-content/uploads/2018/03/chinook.zip"
dataset_zip = "chinook_db.zip"

response = requests.get(dataset_url)
with open(dataset_zip, "wb") as f:
  f.write(response.content)

import zipfile
from pathlib import Path

with zipfile.ZipFile(dataset_zip, "r") as f:
    f.extractall(".")
Path(dataset_zip).unlink()

from langchain import SQLDatabase

db = SQLDatabase.from_uri("sqlite:///./chinook.db")
db.table_info

from langchain import SQLDatabaseChain

db_chain = SQLDatabaseChain(llm=hf_pipeline, database=db, verbose=True)
db_chain.run("How many employees are there?")

Complete Error Log:

> Entering new SQLDatabaseChain chain...
How many employees are there?
SQLQuery:
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <cell line: 4>:4                                                                              │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/langchain/chains/base.py:213 in run                       │
│                                                                                                  │
│   210 │   │   if args and not kwargs:                                                            │
│   211 │   │   │   if len(args) != 1:                                                             │
│   212 │   │   │   │   raise ValueError("`run` supports only one positional argument.")           │
│ ❱ 213 │   │   │   return self(args[0])[self.output_keys[0]]                                      │
│   214 │   │                                                                                      │
│   215 │   │   if kwargs and not args:                                                            │
│   216 │   │   │   return self(kwargs)[self.output_keys[0]]                                       │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/langchain/chains/base.py:116 in __call__                  │
│                                                                                                  │
│   113 │   │   │   outputs = self._call(inputs)                                                   │
│   114 │   │   except (KeyboardInterrupt, Exception) as e:                                        │
│   115 │   │   │   self.callback_manager.on_chain_error(e, verbose=self.verbose)                  │
│ ❱ 116 │   │   │   raise e                                                                        │
│   117 │   │   self.callback_manager.on_chain_end(outputs, verbose=self.verbose)                  │
│   118 │   │   return self.prep_outputs(inputs, outputs, return_only_outputs)                     │
│   119                                                                                            │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/langchain/chains/base.py:113 in __call__                  │
│                                                                                                  │
│   110 │   │   │   verbose=self.verbose,                                                          │
│   111 │   │   )                                                                                  │
│   112 │   │   try:                                                                               │
│ ❱ 113 │   │   │   outputs = self._call(inputs)                                                   │
│   114 │   │   except (KeyboardInterrupt, Exception) as e:                                        │
│   115 │   │   │   self.callback_manager.on_chain_error(e, verbose=self.verbose)                  │
│   116 │   │   │   raise e                                                                        │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/langchain/chains/sql_database/base.py:83 in _call         │
│                                                                                                  │
│    80 │   │   │   "stop": ["\nSQLResult:"],                                                      │
│    81 │   │   }                                                                                  │
│    82 │   │   intermediate_steps = []                                                            │
│ ❱  83 │   │   sql_cmd = llm_chain.predict(**llm_inputs)                                          │
│    84 │   │   intermediate_steps.append(sql_cmd)                                                 │
│    85 │   │   self.callback_manager.on_text(sql_cmd, color="green", verbose=self.verbose)        │
│    86 │   │   result = self.database.run(sql_cmd)                                                │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/langchain/chains/llm.py:151 in predict                    │
│                                                                                                  │
│   148 │   │   │   │                                                                              │
│   149 │   │   │   │   completion = llm.predict(adjective="funny")                                │
│   150 │   │   """                                                                                │
│ ❱ 151 │   │   return self(kwargs)[self.output_key]                                               │
│   152 │                                                                                          │
│   153 │   async def apredict(self, **kwargs: Any) -> str:                                        │
│   154 │   │   """Format prompt with kwargs and pass to LLM.                                      │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/langchain/chains/base.py:116 in __call__                  │
│                                                                                                  │
│   113 │   │   │   outputs = self._call(inputs)                                                   │
│   114 │   │   except (KeyboardInterrupt, Exception) as e:                                        │
│   115 │   │   │   self.callback_manager.on_chain_error(e, verbose=self.verbose)                  │
│ ❱ 116 │   │   │   raise e                                                                        │
│   117 │   │   self.callback_manager.on_chain_end(outputs, verbose=self.verbose)                  │
│   118 │   │   return self.prep_outputs(inputs, outputs, return_only_outputs)                     │
│   119                                                                                            │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/langchain/chains/base.py:113 in __call__                  │
│                                                                                                  │
│   110 │   │   │   verbose=self.verbose,                                                          │
│   111 │   │   )                                                                                  │
│   112 │   │   try:                                                                               │
│ ❱ 113 │   │   │   outputs = self._call(inputs)                                                   │
│   114 │   │   except (KeyboardInterrupt, Exception) as e:                                        │
│   115 │   │   │   self.callback_manager.on_chain_error(e, verbose=self.verbose)                  │
│   116 │   │   │   raise e                                                                        │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/langchain/chains/llm.py:57 in _call                       │
│                                                                                                  │
│    54 │   │   return [self.output_key]                                                           │
│    55 │                                                                                          │
│    56 │   def _call(self, inputs: Dict[str, Any]) -> Dict[str, str]:                             │
│ ❱  57 │   │   return self.apply([inputs])[0]                                                     │
│    58 │                                                                                          │
│    59 │   def generate(self, input_list: List[Dict[str, Any]]) -> LLMResult:                     │
│    60 │   │   """Generate LLM result from inputs."""                                             │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/langchain/chains/llm.py:118 in apply                      │
│                                                                                                  │
│   115 │                                                                                          │
│   116 │   def apply(self, input_list: List[Dict[str, Any]]) -> List[Dict[str, str]]:             │
│   117 │   │   """Utilize the LLM generate method for speed gains."""                             │
│ ❱ 118 │   │   response = self.generate(input_list)                                               │
│   119 │   │   return self.create_outputs(response)                                               │
│   120 │                                                                                          │
│   121 │   async def aapply(self, input_list: List[Dict[str, Any]]) -> List[Dict[str, str]]:      │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/langchain/chains/llm.py:62 in generate                    │
│                                                                                                  │
│    59 │   def generate(self, input_list: List[Dict[str, Any]]) -> LLMResult:                     │
│    60 │   │   """Generate LLM result from inputs."""                                             │
│    61 │   │   prompts, stop = self.prep_prompts(input_list)                                      │
│ ❱  62 │   │   return self.llm.generate_prompt(prompts, stop)                                     │
│    63 │                                                                                          │
│    64 │   async def agenerate(self, input_list: List[Dict[str, Any]]) -> LLMResult:              │
│    65 │   │   """Generate LLM result from inputs."""                                             │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/langchain/llms/base.py:107 in generate_prompt             │
│                                                                                                  │
│   104 │   │   self, prompts: List[PromptValue], stop: Optional[List[str]] = None                 │
│   105 │   ) -> LLMResult:                                                                        │
│   106 │   │   prompt_strings = [p.to_string() for p in prompts]                                  │
│ ❱ 107 │   │   return self.generate(prompt_strings, stop=stop)                                    │
│   108 │                                                                                          │
│   109 │   async def agenerate_prompt(                                                            │
│   110 │   │   self, prompts: List[PromptValue], stop: Optional[List[str]] = None                 │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/langchain/llms/base.py:140 in generate                    │
│                                                                                                  │
│   137 │   │   │   │   output = self._generate(prompts, stop=stop)                                │
│   138 │   │   │   except (KeyboardInterrupt, Exception) as e:                                    │
│   139 │   │   │   │   self.callback_manager.on_llm_error(e, verbose=self.verbose)                │
│ ❱ 140 │   │   │   │   raise e                                                                    │
│   141 │   │   │   self.callback_manager.on_llm_end(output, verbose=self.verbose)                 │
│   142 │   │   │   return output                                                                  │
│   143 │   │   params = self.dict()                                                               │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/langchain/llms/base.py:137 in generate                    │
│                                                                                                  │
│   134 │   │   │   │   {"name": self.__class__.__name__}, prompts, verbose=self.verbose           │
│   135 │   │   │   )                                                                              │
│   136 │   │   │   try:                                                                           │
│ ❱ 137 │   │   │   │   output = self._generate(prompts, stop=stop)                                │
│   138 │   │   │   except (KeyboardInterrupt, Exception) as e:                                    │
│   139 │   │   │   │   self.callback_manager.on_llm_error(e, verbose=self.verbose)                │
│   140 │   │   │   │   raise e                                                                    │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/langchain/llms/base.py:324 in _generate                   │
│                                                                                                  │
│   321 │   │   # TODO: add caching here.                                                          │
│   322 │   │   generations = []                                                                   │
│   323 │   │   for prompt in prompts:                                                             │
│ ❱ 324 │   │   │   text = self._call(prompt, stop=stop)                                           │
│   325 │   │   │   generations.append([Generation(text=text)])                                    │
│   326 │   │   return LLMResult(generations=generations)                                          │
│   327                                                                                            │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/langchain/llms/huggingface_pipeline.py:150 in _call       │
│                                                                                                  │
│   147 │   │   return "huggingface_pipeline"                                                      │
│   148 │                                                                                          │
│   149 │   def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:                 │
│ ❱ 150 │   │   response = self.pipeline(prompt)                                                   │
│   151 │   │   if self.pipeline.task == "text-generation":                                        │
│   152 │   │   │   # Text generation return includes the starter text.                            │
│   153 │   │   │   text = response[0]["generated_text"][len(prompt) :]                            │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/transformers/pipelines/base.py:1109 in __call__           │
│                                                                                                  │
│   1106 │   │   │   │   )                                                                         │
│   1107 │   │   │   )                                                                             │
│   1108 │   │   else:                                                                             │
│ ❱ 1109 │   │   │   return self.run_single(inputs, preprocess_params, forward_params, postproces  │
│   1110 │                                                                                         │
│   1111 │   def run_multi(self, inputs, preprocess_params, forward_params, postprocess_params):   │
│   1112 │   │   return [self.run_single(item, preprocess_params, forward_params, postprocess_par  │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/transformers/pipelines/base.py:1116 in run_single         │
│                                                                                                  │
│   1113 │                                                                                         │
│   1114 │   def run_single(self, inputs, preprocess_params, forward_params, postprocess_params):  │
│   1115 │   │   model_inputs = self.preprocess(inputs, **preprocess_params)                       │
│ ❱ 1116 │   │   model_outputs = self.forward(model_inputs, **forward_params)                      │
│   1117 │   │   outputs = self.postprocess(model_outputs, **postprocess_params)                   │
│   1118 │   │   return outputs                                                                    │
│   1119                                                                                           │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/transformers/pipelines/base.py:1015 in forward            │
│                                                                                                  │
│   1012 │   │   │   │   inference_context = self.get_inference_context()                          │
│   1013 │   │   │   │   with inference_context():                                                 │
│   1014 │   │   │   │   │   model_inputs = self._ensure_tensor_on_device(model_inputs, device=se  │
│ ❱ 1015 │   │   │   │   │   model_outputs = self._forward(model_inputs, **forward_params)         │
│   1016 │   │   │   │   │   model_outputs = self._ensure_tensor_on_device(model_outputs, device=  │
│   1017 │   │   │   else:                                                                         │
│   1018 │   │   │   │   raise ValueError(f"Framework {self.framework} is not supported")          │
│                                                                                                  │
│ /root/.cache/huggingface/modules/transformers_modules/databricks/dolly-v2-3b/e19a5252f69d79d94ac │
│ 95045eb9b8a158775f701/instruct_pipeline.py:132 in _forward                                       │
│                                                                                                  │
│   129 │   │   else:                                                                              │
│   130 │   │   │   in_b = input_ids.shape[0]                                                      │
│   131 │   │                                                                                      │
│ ❱ 132 │   │   generated_sequence = self.model.generate(                                          │
│   133 │   │   │   input_ids=input_ids.to(self.model.device),                                     │
│   134 │   │   │   attention_mask=attention_mask.to(self.model.device) if attention_mask is not   │
│   135 │   │   │   pad_token_id=self.tokenizer.pad_token_id,                                      │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/torch/autograd/grad_mode.py:27 in decorate_context        │
│                                                                                                  │
│    24 │   │   @functools.wraps(func)                                                             │
│    25 │   │   def decorate_context(*args, **kwargs):                                             │
│    26 │   │   │   with self.clone():                                                             │
│ ❱  27 │   │   │   │   return func(*args, **kwargs)                                               │
│    28 │   │   return cast(F, decorate_context)                                                   │
│    29 │                                                                                          │
│    30 │   def _wrap_generator(self, func):                                                       │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/transformers/generation/utils.py:1485 in generate         │
│                                                                                                  │
│   1482 │   │   │   )                                                                             │
│   1483 │   │   │                                                                                 │
│   1484 │   │   │   # 13. run sample                                                              │
│ ❱ 1485 │   │   │   return self.sample(                                                           │
│   1486 │   │   │   │   input_ids,                                                                │
│   1487 │   │   │   │   logits_processor=logits_processor,                                        │
│   1488 │   │   │   │   logits_warper=logits_warper,                                              │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/transformers/generation/utils.py:2524 in sample           │
│                                                                                                  │
│   2521 │   │   │   model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)  │
│   2522 │   │   │                                                                                 │
│   2523 │   │   │   # forward pass to get next token                                              │
│ ❱ 2524 │   │   │   outputs = self(                                                               │
│   2525 │   │   │   │   **model_inputs,                                                           │
│   2526 │   │   │   │   return_dict=True,                                                         │
│   2527 │   │   │   │   output_attentions=output_attentions,                                      │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1194 in _call_impl             │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/accelerate/hooks.py:165 in new_forward                    │
│                                                                                                  │
│   162 │   │   │   with torch.no_grad():                                                          │
│   163 │   │   │   │   output = old_forward(*args, **kwargs)                                      │
│   164 │   │   else:                                                                              │
│ ❱ 165 │   │   │   output = old_forward(*args, **kwargs)                                          │
│   166 │   │   return module._hf_hook.post_forward(module, output)                                │
│   167 │                                                                                          │
│   168 │   module.forward = new_forward                                                           │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:662 in  │
│ forward                                                                                          │
│                                                                                                  │
│   659 │   │   ```"""                                                                             │
│   660 │   │   return_dict = return_dict if return_dict is not None else self.config.use_return   │
│   661 │   │                                                                                      │
│ ❱ 662 │   │   outputs = self.gpt_neox(                                                           │
│   663 │   │   │   input_ids,                                                                     │
│   664 │   │   │   attention_mask=attention_mask,                                                 │
│   665 │   │   │   position_ids=position_ids,                                                     │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1194 in _call_impl             │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/accelerate/hooks.py:165 in new_forward                    │
│                                                                                                  │
│   162 │   │   │   with torch.no_grad():                                                          │
│   163 │   │   │   │   output = old_forward(*args, **kwargs)                                      │
│   164 │   │   else:                                                                              │
│ ❱ 165 │   │   │   output = old_forward(*args, **kwargs)                                          │
│   166 │   │   return module._hf_hook.post_forward(module, output)                                │
│   167 │                                                                                          │
│   168 │   module.forward = new_forward                                                           │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:553 in  │
│ forward                                                                                          │
│                                                                                                  │
│   550 │   │   │   │   │   head_mask[i],                                                          │
│   551 │   │   │   │   )                                                                          │
│   552 │   │   │   else:                                                                          │
│ ❱ 553 │   │   │   │   outputs = layer(                                                           │
│   554 │   │   │   │   │   hidden_states,                                                         │
│   555 │   │   │   │   │   attention_mask=attention_mask,                                         │
│   556 │   │   │   │   │   position_ids=position_ids,                                             │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1194 in _call_impl             │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/accelerate/hooks.py:165 in new_forward                    │
│                                                                                                  │
│   162 │   │   │   with torch.no_grad():                                                          │
│   163 │   │   │   │   output = old_forward(*args, **kwargs)                                      │
│   164 │   │   else:                                                                              │
│ ❱ 165 │   │   │   output = old_forward(*args, **kwargs)                                          │
│   166 │   │   return module._hf_hook.post_forward(module, output)                                │
│   167 │                                                                                          │
│   168 │   module.forward = new_forward                                                           │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:320 in  │
│ forward                                                                                          │
│                                                                                                  │
│   317 │   │   layer_past: Optional[Tuple[torch.Tensor]] = None,                                  │
│   318 │   │   output_attentions: Optional[bool] = False,                                         │
│   319 │   ):                                                                                     │
│ ❱ 320 │   │   attention_layer_outputs = self.attention(                                          │
│   321 │   │   │   self.input_layernorm(hidden_states),                                           │
│   322 │   │   │   attention_mask=attention_mask,                                                 │
│   323 │   │   │   position_ids=position_ids,                                                     │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1194 in _call_impl             │
│                                                                                                  │
│   1191 │   │   # this function, and just call forward.                                           │
│   1192 │   │   if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o  │
│   1193 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1194 │   │   │   return forward_call(*input, **kwargs)                                         │
│   1195 │   │   # Do not call functions when jit is used                                          │
│   1196 │   │   full_backward_hooks, non_full_backward_hooks = [], []                             │
│   1197 │   │   if self._backward_hooks or _global_backward_hooks:                                │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/accelerate/hooks.py:165 in new_forward                    │
│                                                                                                  │
│   162 │   │   │   with torch.no_grad():                                                          │
│   163 │   │   │   │   output = old_forward(*args, **kwargs)                                      │
│   164 │   │   else:                                                                              │
│ ❱ 165 │   │   │   output = old_forward(*args, **kwargs)                                          │
│   166 │   │   return module._hf_hook.post_forward(module, output)                                │
│   167 │                                                                                          │
│   168 │   module.forward = new_forward                                                           │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:152 in  │
│ forward                                                                                          │
│                                                                                                  │
│   149 │   │   present = (key, value) if use_cache else None                                      │
│   150 │   │                                                                                      │
│   151 │   │   # Compute attention                                                                │
│ ❱ 152 │   │   attn_output, attn_weights = self._attn(query, key, value, attention_mask, head_m   │
│   153 │   │                                                                                      │
│   154 │   │   # Reshape outputs                                                                  │
│   155 │   │   attn_output = self._merge_heads(attn_output, self.num_attention_heads, self.head   │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:219 in  │
│ _attn                                                                                            │
│                                                                                                  │
│   216 │   │   # Need to be a tensor, otherwise we get error: `RuntimeError: expected scalar ty   │
│   217 │   │   # Need to be on the same device, otherwise `RuntimeError: ..., x and y to be on    │
│   218 │   │   mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.de   │
│ ❱ 219 │   │   attn_scores = torch.where(causal_mask, attn_scores, mask_value)                    │
│   220 │   │                                                                                      │
│   221 │   │   if attention_mask is not None:                                                     │
│   222 │   │   │   # Apply the attention mask                                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: The size of tensor a (2048) must match the size of tensor b (2611) at non-singleton dimension 3

srowen

Databricks org Apr 27, 2023

The model has a context window limit of 2048 tokens. If you end up constructing input that's longer it won't work. The SQL chain is producing very long input. Maybe you need to limit the size of the results it parses or something (number of rows). See the SQL chain docs.

Krishnaveni

Apr 27, 2023

@srowen , would this limit also exist for the dolly-v2-7b and dolly-v2-12b versions as well ?

srowen

Databricks org Apr 27, 2023

Yes, they're based on Pythia, and all have a 2048 token context window. In general, you want to be careful about sending so many tokens even if the model accommodates them, as it increases runtime (and cost). The prompt engineering is most of the work here!

saurabh48782

Apr 27, 2023

@srowen I'm also encountering a similar kind of tensor size mismatch error when using RetrievalQA chain to create a model that can learn from my pdf document and then answer based on the content in the pdf document. When I pass the HuggingFacePipeline model into the RetrievalQA chain it loads correctly. But when I use the chain.run(query) as shown here https://python.langchain.com/en/latest/modules/chains/index_examples/vector_db_qa.html .It shows me the following error.
RuntimeError: The size of tensor a (2048) must match the size of tensor b (2568) at non-singleton dimension 3

srowen

Databricks org Apr 27, 2023

Same answer, you're exceeding the context window size.

saurabh48782

Apr 27, 2023

I'm assuming there is something to do with tokenization of the document while creating embeddings in the vector score. The model is not able to retrieve large size embeddings from the vector store. Can this be an issue here?

srowen

Databricks org Apr 27, 2023

No, not related to the vector DB. This happens when you feed context to the model. You are perhaps retrieving too much to feed to the model, yes.

kevinknights29

Apr 28, 2023

Hi @srowen , thanks for your response.

Can you please expand on: "The prompt engineering is most of the work here!"?

Would like to know what approaches or strategies you recommend or have seen that work to reduce the amount of context needed for the model to generate accurate predictions.

In this example, passing the schema of the database and tables structure is making us exceed the limit of tokens.

Thanks and regards,
Kevin K.

PS: my account was created yesterday, so I'm limited to one comment a day.

srowen

Databricks org Apr 28, 2023

See for example https://python.langchain.com/en/latest/modules/chains/examples/sqlite.html#choosing-how-to-limit-the-number-of-rows-returned to limit rows.
Try turning on verbose=True in the chain so you can see what is being passed. That may give you ideas about what is very long here.
You may have to write custom, shorter prompts that are more targeted at your use case. Langchain prompts tend to have a lot of boilerplate instruction.

kevinknights29 changed discussion status to closed May 4, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment