Spaces:

HUANG-Stephanie
/

cvquest-colpali

Running

App Files Files Community

cvquest-colpali / colpali-main /README.md

HUANG-Stephanie

Upload 88 files

9ff79dc verified 7 months ago

preview code

raw

history blame

7.31 kB

	# ColPali: Efficient Document Retrieval with Vision Language Models


	[[Blog]](https://huggingface.co/blog/manu/colpali)
	[[Paper]](https://arxiv.org/abs/2407.01449)
	[[ColPali Model card]](https://huggingface.co/vidore/colpali)
	[[ViDoRe Benchmark]](https://huggingface.co/vidore)
	<!---[[Colab example]]()-->
	[[HuggingFace Demo]](https://huggingface.co/spaces/manu/ColPali-demo)


	## Associated Paper

	ColPali: Efficient Document Retrieval with Vision Language Models
	Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo

	This repository contains the code for training custom Colbert retriever models.
	Notably, we train colbert with LLMs (decoders) as well as Image Language models !

	## Installation

	### From git
	```bash
	pip install git+https://github.com/illuin-tech/colpali
	```

	### From source
	```bash
	git clone https://github.com/illuin-tech/colpali
	mv colpali
	pip install -r requirements.txt
	```

	## Usage

	Example usage of the model is shown in the `scripts` directory.

	```bash
	# hackable example script to adapt
	python scripts/infer/run_inference_with_python.py
	```


	```python
	import torch
	import typer
	from torch.utils.data import DataLoader
	from tqdm import tqdm
	from transformers import AutoProcessor
	from PIL import Image

	from colpali_engine.models.paligemma_colbert_architecture import ColPali
	from colpali_engine.trainer.retrieval_evaluator import CustomEvaluator
	from colpali_engine.utils.colpali_processing_utils import process_images, process_queries
	from colpali_engine.utils.image_from_page_utils import load_from_dataset


	def main() -> None:
	"""Example script to run inference with ColPali"""
	# Load model
	model_name = "vidore/colpali"
	model = ColPali.from_pretrained("google/paligemma-3b-mix-448", torch_dtype=torch.bfloat16, device_map="cuda").eval()
	model.load_adapter(model_name)
	processor = AutoProcessor.from_pretrained(model_name)

	# select images -> load_from_pdf(<pdf_path>), load_from_image_urls(["<url_1>"]), load_from_dataset(<path>)
	images = load_from_dataset("vidore/docvqa_test_subsampled")
	queries = ["From which university does James V. Fiorca come ?", "Who is the japanese prime minister?"]

	# run inference - docs
	dataloader = DataLoader(
	images,
	batch_size=4,
	shuffle=False,
	collate_fn=lambda x: process_images(processor, x),
	)
	ds = []
	for batch_doc in tqdm(dataloader):
	with torch.no_grad():
	batch_doc = {k: v.to(model.device) for k, v in batch_doc.items()}
	embeddings_doc = model(**batch_doc)
	ds.extend(list(torch.unbind(embeddings_doc.to("cpu"))))

	# run inference - queries
	dataloader = DataLoader(
	queries,
	batch_size=4,
	shuffle=False,
	collate_fn=lambda x: process_queries(processor, x, Image.new("RGB", (448, 448), (255, 255, 255))),
	)

	qs = []
	for batch_query in dataloader:
	with torch.no_grad():
	batch_query = {k: v.to(model.device) for k, v in batch_query.items()}
	embeddings_query = model(**batch_query)
	qs.extend(list(torch.unbind(embeddings_query.to("cpu"))))

	# run evaluation
	retriever_evaluator = CustomEvaluator(is_multi_vector=True)
	scores = retriever_evaluator.evaluate(qs, ds)
	print(scores.argmax(axis=1))


	if __name__ == "__main__":
	typer.run(main)
	```

	Detais are also given in the model card for the base Colpali model on HuggingFace: [ColPali Model card](https://huggingface.co/vidore/colpali).

	## Training

	```bash
	USE_LOCAL_DATASET=0 python scripts/train/train_colbert.py scripts/configs/siglip/train_siglip_model_debug.yaml
	```

	or

	```bash
	accelerate launch scripts/train/train_colbert.py scripts/configs/train_colidefics_model.yaml
	```

	### Configurations
	All training arguments can be set through a configuration file.
	The configuration file is a yaml file that contains all the arguments for training.

	The construction is as follows:

	```python
	@dataclass
	class ColModelTrainingConfig:
	model: PreTrainedModel
	tr_args: TrainingArguments = None
	output_dir: str = None
	max_length: int = 256
	run_eval: bool = True
	run_train: bool = True
	peft_config: Optional[LoraConfig] = None
	add_suffix: bool = False
	processor: Idefics2Processor = None
	tokenizer: PreTrainedTokenizer = None
	loss_func: Optional[Callable] = ColbertLoss()
	dataset_loading_func: Optional[Callable] = None
	eval_dataset_loader: Optional[Dict[str, Callable]] = None
	pretrained_peft_model_name_or_path: Optional[str] = None
	```
	### Example

	An example configuration file is:

	```yaml
	config:
	(): colpali_engine.utils.train_colpali_engine_models.ColModelTrainingConfig
	output_dir: !path ../../../models/without_tabfquad/train_colpali-3b-mix-448
	processor:
	() : colpali_engine.utils.wrapper.AutoProcessorWrapper
	pretrained_model_name_or_path: "./models/paligemma-3b-mix-448"
	max_length: 50
	model:
	(): colpali_engine.utils.wrapper.AutoColModelWrapper
	pretrained_model_name_or_path: "./models/paligemma-3b-mix-448"
	training_objective: "colbertv1"
	# attn_implementation: "eager"
	torch_dtype: !ext torch.bfloat16
	# device_map: "auto"
	# quantization_config:
	# (): transformers.BitsAndBytesConfig
	# load_in_4bit: true
	# bnb_4bit_quant_type: "nf4"
	# bnb_4bit_compute_dtype: "bfloat16"
	# bnb_4bit_use_double_quant: true

	dataset_loading_func: !ext colpali_engine.utils.dataset_transformation.load_train_set
	eval_dataset_loader: !import ../data/test_data.yaml

	max_length: 50
	run_eval: true
	add_suffix: true
	loss_func:
	(): colpali_engine.loss.colbert_loss.ColbertPairwiseCELoss
	tr_args: !import ../tr_args/default_tr_args.yaml
	peft_config:
	(): peft.LoraConfig
	r: 32
	lora_alpha: 32
	lora_dropout: 0.1
	init_lora_weights: "gaussian"
	bias: "none"
	task_type: "FEATURE_EXTRACTION"
	target_modules: '(.(language_model).(down_proj\|gate_proj\|up_proj\|k_proj\|q_proj\|v_proj\|o_proj).$\|.(custom_text_proj).*$)'
	# target_modules: '(.(language_model).(down_proj\|gate_proj\|up_proj\|k_proj\|q_proj\|v_proj\|o_proj).$\|.(custom_text_proj).*$)'
	```


	#### Local training

	```bash
	USE_LOCAL_DATASET=0 python scripts/train/train_colbert.py scripts/configs/siglip/train_siglip_model_debug.yaml
	```


	#### SLURM

	```bash
	sbatch --nodes=1 --cpus-per-task=16 --mem-per-cpu=32GB --time=20:00:00 --gres=gpu:1 -p gpua100 --job-name=colidefics --output=colidefics.out --error=colidefics.err --wrap="accelerate launch scripts/train/train_colbert.py scripts/configs/train_colidefics_model.yaml"

	sbatch --nodes=1 --time=5:00:00 -A cad15443 --gres=gpu:8 --constraint=MI250 --job-name=colpali --wrap="python scripts/train/train_colbert.py scripts/configs/train_colpali_model.yaml"
	```

	## CITATION

	```bibtex
	@misc{faysse2024colpaliefficientdocumentretrieval,
	title={ColPali: Efficient Document Retrieval with Vision Language Models},
	author={Manuel Faysse and Hugues Sibille and Tony Wu and Bilel Omrani and Gautier Viaud and Céline Hudelot and Pierre Colombo},
	year={2024},
	eprint={2407.01449},
	archivePrefix={arXiv},
	primaryClass={cs.IR},
	url={https://arxiv.org/abs/2407.01449},
	}
	```