--- license: apache-2.0 language: - en library_name: transformers tags: - information retrieval - llama2 - document expansion - LoRA --- This repository contains the LoRA weights for fine-tuning pre-trained Llama 2 7B for document expansion for use with [DeeperImpact](https://arxiv.org/abs/2405.17093). We use the same dataset as DocT5Query for fine-tuning the pre-trained Llama 2 model i.e. 532k document-query pairs from MSMARCO Passage Qrels Train Dataset. Please refer to the following GitHub repository to learn how to use it for document expansion: [inference_deeper_impact.ipynb](https://github.com/basnetsoyuj/improving-learned-index/blob/master/inference_deeper_impact.ipynb) You can also clone the [DeeperImpact repo](https://github.com/basnetsoyuj/improving-learned-index/blob/master) and run expansions on a collection of documents using the following command: ``` python -m src.llama2.generate \ --llama_path \ --collection_path \ --collection_type [msmarco | beir] \ --output_path \ --batch_size \ --max_tokens 512 \ --num_return_sequences 80 \ --max_new_tokens 50 \ --top_k 50 \ --top_p 0.95 \ --peft_path soyuj/llama2-doc2query ``` This will generate a jsonl file with expansions for each document in the collection. To append the unique expansion terms to the original collection, use the following command: ``` python -m src.llama2.merge \ --collection_path \ --collection_type [msmarco | beir] \ --queries_path \ --output_path ```