How to load existing index?
#8
by
imhyunalee
- opened
Hi, thx for your research.
I'm using Multimodal RAG codebook (https://huggingface.co/learn/cookbook/multimodal_rag_using_document_retrieval_and_vlms#multimodal-retrieval-augmented-generation-rag-with-document-retrieval-colpali-and-vision-language-models-vlms),
and I used RAGMultiModalModel to index my pdf dataset.
After that, I loaded the existing index using the from_index() function in RAGMultiModalModel class.
However, when I executed the search function after executing from_index, the following error occurs.
ValueError Traceback (most recent call last)
Cell In[12], line 1
----> 1 output_text = answer_with_multimodal_rag(
2 vl_model=vl_model,
3 docs_retrieval_model=docs_retrieval_model,
4 vl_model_processor=vl_model_processor,
5 all_images=all_images,
6 text_query="{My query~~~.}",
7 top_k=3,
8 max_new_tokens=500,
9 )
10 print(output_text[0])
Cell In[10], line 4, in answer_with_multimodal_rag(vl_model, docs_retrieval_model, vl_model_processor, all_images, text_query, top_k, max_new_tokens)
1 def answer_with_multimodal_rag(
2 vl_model, docs_retrieval_model, vl_model_processor, all_images, text_query, top_k, max_new_tokens
3 ):
----> 4 results = docs_retrieval_model.search(text_query, k=top_k)
5 grouped_images = get_grouped_images(results, all_images)
7 resized_images = []
File /usr/local/envs/tr4.45/lib/python3.10/site-packages/byaldi/RAGModel.py:174, in RAGMultiModalModel.search(self, query, k, return_base64_results)
158 def search(
159 self,
160 query: Union[str, List[str]],
161 k: int = 10,
162 return_base64_results: Optional[bool] = None,
163 ) -> Union[List[Result], List[List[Result]]]:
164 """Query an index.
165
166 Parameters:
(...)
172 Union[List[Result], List[List[Result]]]: A list of Result objects or a list of lists of Result objects.
173 """
--> 174 return self.model.search(query, k, return_base64_results)
File /usr/local/envs/tr4.45/lib/python3.10/site-packages/byaldi/colpali.py:625, in ColPaliModel.search(self, query, k, return_base64_results)
622 qs = list(torch.unbind(embeddings_query.to("cpu")))
624 # Compute scores
--> 625 scores = self.processor.score(qs, self.indexed_embeddings).cpu().numpy()
627 # Get top k relevant pages
628 top_pages = scores.argsort(axis=1)[0][-k:][::-1].tolist()
File /usr/local/envs/tr4.45/lib/python3.10/site-packages/colpali_engine/models/paligemma/colpali/processing_colpali.py:90, in ColPaliProcessor.score(self, qs, ps, device, **kwargs)
80 def score(
81 self,
82 qs: List[torch.Tensor],
(...)
85 **kwargs,
86 ) -> torch.Tensor:
87 """
88 Compute the MaxSim score (ColBERT-like) for the given multi-vector query and passage embeddings.
89 """
---> 90 return self.score_multi_vector(qs, ps, device=device, **kwargs)
File /usr/local/envs/tr4.45/lib/python3.10/site-packages/colpali_engine/utils/processing_utils.py:82, in BaseVisualRetrieverProcessor.score_multi_vector(qs, ps, batch_size, device)
80 raise ValueError("No queries provided")
81 if len(ps) == 0:
---> 82 raise ValueError("No passages provided")
84 scores_list: List[torch.Tensor] = []
86 for i in range(0, len(qs), batch_size):
ValueError: No passages provided
How do I fix this error? Did I load the index wrong?
Please tell me how to load the index that I have already saved.
imhyunalee
changed discussion title from
How to load existing index using docs_retrieval_model.from_index() function?
to How to load existing index?
imhyunalee
changed discussion status to
closed