ColPali
Safetensors
English
vidore
vidore-experimental

How to load existing index?

#8
by imhyunalee - opened

Hi, thx for your research.
I'm using Multimodal RAG codebook (https://huggingface.co/learn/cookbook/multimodal_rag_using_document_retrieval_and_vlms#multimodal-retrieval-augmented-generation-rag-with-document-retrieval-colpali-and-vision-language-models-vlms),
and I used RAGMultiModalModel to index my pdf dataset.
After that, I loaded the existing index using the from_index() function in RAGMultiModalModel class.
However, when I executed the search function after executing from_index, the following error occurs.

ValueError                                Traceback (most recent call last)
Cell In[12], line 1
----> 1 output_text = answer_with_multimodal_rag(
      2     vl_model=vl_model,
      3     docs_retrieval_model=docs_retrieval_model,
      4     vl_model_processor=vl_model_processor,
      5     all_images=all_images,
      6     text_query="{My query~~~.}",
      7     top_k=3,
      8     max_new_tokens=500,
      9 )
     10 print(output_text[0])

Cell In[10], line 4, in answer_with_multimodal_rag(vl_model, docs_retrieval_model, vl_model_processor, all_images, text_query, top_k, max_new_tokens)
      1 def answer_with_multimodal_rag(
      2     vl_model, docs_retrieval_model, vl_model_processor, all_images, text_query, top_k, max_new_tokens
      3 ):
----> 4     results = docs_retrieval_model.search(text_query, k=top_k)
      5     grouped_images = get_grouped_images(results, all_images)
      7     resized_images = []

File /usr/local/envs/tr4.45/lib/python3.10/site-packages/byaldi/RAGModel.py:174, in RAGMultiModalModel.search(self, query, k, return_base64_results)
    158 def search(
    159     self,
    160     query: Union[str, List[str]],
    161     k: int = 10,
    162     return_base64_results: Optional[bool] = None,
    163 ) -> Union[List[Result], List[List[Result]]]:
    164     """Query an index.
    165 
    166     Parameters:
   (...)
    172         Union[List[Result], List[List[Result]]]: A list of Result objects or a list of lists of Result objects.
    173     """
--> 174     return self.model.search(query, k, return_base64_results)

File /usr/local/envs/tr4.45/lib/python3.10/site-packages/byaldi/colpali.py:625, in ColPaliModel.search(self, query, k, return_base64_results)
    622 qs = list(torch.unbind(embeddings_query.to("cpu")))
    624 # Compute scores
--> 625 scores = self.processor.score(qs, self.indexed_embeddings).cpu().numpy()
    627 # Get top k relevant pages
    628 top_pages = scores.argsort(axis=1)[0][-k:][::-1].tolist()

File /usr/local/envs/tr4.45/lib/python3.10/site-packages/colpali_engine/models/paligemma/colpali/processing_colpali.py:90, in ColPaliProcessor.score(self, qs, ps, device, **kwargs)
     80 def score(
     81     self,
     82     qs: List[torch.Tensor],
   (...)
     85     **kwargs,
     86 ) -> torch.Tensor:
     87     """
     88     Compute the MaxSim score (ColBERT-like) for the given multi-vector query and passage embeddings.
     89     """
---> 90     return self.score_multi_vector(qs, ps, device=device, **kwargs)

File /usr/local/envs/tr4.45/lib/python3.10/site-packages/colpali_engine/utils/processing_utils.py:82, in BaseVisualRetrieverProcessor.score_multi_vector(qs, ps, batch_size, device)
     80     raise ValueError("No queries provided")
     81 if len(ps) == 0:
---> 82     raise ValueError("No passages provided")
     84 scores_list: List[torch.Tensor] = []
     86 for i in range(0, len(qs), batch_size):

ValueError: No passages provided

How do I fix this error? Did I load the index wrong?
Please tell me how to load the index that I have already saved.

imhyunalee changed discussion title from How to load existing index using docs_retrieval_model.from_index() function? to How to load existing index?
imhyunalee changed discussion status to closed

Sign up or log in to comment