Spaces:

jiviteshjn
/

mistral-rag-qa

Sleeping

App Files Files

jiviteshjain commited on Nov 17, 2024

Commit

5fbb9be

1 Parent(s): 1536dad

improvements

Browse files

Files changed (8) hide show

README.md +176 -48
assets/app.jpg +0 -0
assets/fig-1.jpg +0 -0
assets/fig-2.jpg +0 -0
assets/fig-3.jpg +0 -0
pittsburgh.webp → assets/pittsburgh.webp +0 -0
wandb.png → assets/wandb.png +0 -0
src/requirements.txt +168 -0

README.md CHANGED Viewed

@@ -8,51 +8,128 @@ app_port: 8501
 pinned: false
 ---
-# Answering Questions about Pittsburgh using Retrieval Augmented Generation
-This is a retrieval-augmented generation (RAG) system for answering questions about the city of Pittsburgh, PA, and
-Carnegie Mellon University. Please refer to the technical report (`technical-report.pdf`) for details.
-## System Components
-1. Data Collection Pipeline, including web scraping using Scrapy and BeautifulSoup, HTML, PDF, and structured data (calendar events, tables) parsing (see `src/data_collection/using_scrapy`).
-2. Synthetic QA pair generation using a large language model (see `src/data_annotation/qa-generation.ipynb`).
-3. Manual annotation of test set (see `src/data_annotation/annotate.py` for a simple Annotation UI in Marimo).
-4. Parameter-efficient reader model finetuning in 4 bit using Quantized LoRA (see `src/finetuning`).
-5. Embedding and dense FAISS index creation (see `src/rag_pipeline/embed_and_build_dense_index.py` and `src/rag_pipeline/load_embeddings_and_build_dense_index.py`).
-6. Sparse entity and date index for augmentation (see `src/rag_pipeline/build_sparse_index.py`).
-7. Query rewording using Hypothetical Document Embeddings (see `src/rag_pipeline/rag_inference.ipynb`).
-7. Retrieval using sparse and dense indices (see `src/rag_pipeline/rag_inference.ipynb`).
-8. Retrieved document reranking using a cross-encoder model (see `src/rag_pipeline/rag_inference.ipynb`).
-9. Retrieved document summarization over all documents (see `src/rag_pipeline/rag_inference.ipynb`).
-10. Answer generation using our quantized, finetuned models (see `src/rag_pipeline/rag_inference.ipynb`).
-## Example Outputs
-| *Mistral - No Finetune*                                                                                                                                                                                                                                                                                                                                            | *Mistral - Finetuned*                                                                                              | *Mistral - RAG*                                                                                             |
-|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
-| *Question:* Where is the NNOC House located? <br> *Answer:* The NNOC House is located in the heart of the city, on a busy street. It is a three-story building with a large front door and a small side entrance. The building is made of brick and has a red roof. The windows are small and the walls are white. The building is surrounded by a small garden. | *Question:* Where is the NNOC House located? <br> *Answer:* The NNOC House is located in Pittsburgh, Pennsylvania. | *Question:* Where is the NNOC House located? <br> *Answer:* 7101 Apple Street at Pittsburgh, PA.            |
-| *Question:* What is the theme of Tim Pearce's jokes? <br> *Answer:* The theme of Tim Pearce's jokes is that he is a very funny person.                                                                                                                                                                                                                           | *Question:* What is the theme of Tim Pearce's jokes? <br> *Answer:* His jokes are about his life and experiences.  | *Question:* What is the theme of Tim Pearce's jokes? <br> *Answer:* They are primarily snail-related jokes. |
-As can be seen above, the RAG model provides the most accurate and concise answers that are factual, followed by the
-fine-tuned mistral, and then followed by Mistral with in-context learning which does not control it’s output despite strong prompts.
-## Ablations and Comparisons
-Please see the report (`technical-report.pdf`).
 ## Usage
-To run any component of this RAG system, please first install the required dependencies listed in
-`src/environment.yaml`. All code has only been tested on Linux. Dependency support for quantization and GPU acceleration
-libraries can vary from system to system.
 ```shell
-$ conda env create -n rag_env -f src/environment.yaml
-$ conda activate rag_env
 ```
-### Data Collection
 All data collection scripts, which include crawlers and parsers for various websites are located in the
 `src/data_collection` repository.
@@ -64,7 +141,7 @@ $ cd src/data_collection/using_scrapy
 $ scrapy crawl visit_pittsburgh -O path/to/output.jsonl  # or pittsburgh_pa, steelers, pirates, penquins
 ```
-### Data Annotation
 `src/data_annotation` includes a QA generation notebook (
 `qa-generation.ipynb`) for automated data processing and question-answer generation.
@@ -72,46 +149,97 @@ $ scrapy crawl visit_pittsburgh -O path/to/output.jsonl  # or pittsburgh_pa, ste
 To execute the notebook, open it in Jupyter Notebook or a compatible IDE and run the cells in
 order.
-### Finetuning
-`src/finetuning_scripts` includes training notebooks for four 4-bit models, `gemma-2b`,`llama3.2-3b` and
-`mistral-7b` designed for fine-tuning the question-answer pairs and model optimization.
 To execute the notebook, open it in Jupyter Notebook or a compatible IDE and run the cells in order.
-### RAG Pipeline
 Components of the RAG pipeline, such as embedding documents and building the dense index, loading existing embeddings
 and building the dense index, building the sparse index and infering using the pipeline can be run as Python scripts
-from the `src/rag_pipeline` directory. The appropriate configuration needs to be set in a config file present in the `
-src/rag_pipeline/conf` directory and then specified in the command. Config files are managed by using Hydra/OmegaConf
 and are in the Hydra format. Please look at existing files for an example. To run the pipeline with a specific
 configuration, run:
 ```shell
-$ python src/rag_pipeline/embed_and_build_index.py --config-name=config
 ```
-Further, the complete inference pipeline can be run as IPython notebook -- `src/rag_pipeline/rag_inference.ipynb`. Open the notebook in a Jupyter IDE, set the configuration in OmegaConf format in the configuration cell located at the bottom of the notebook, and execute all cells in order.
 ### Data
-Our processed dataset, including the complete knowledge corpus, generated and manually annotated QA pairs, embeddings, and dense and sparse indices is available on Kaggle at https://www.kaggle.com/datasets/jiviteshjain/final-rag-data/.
 ### Fine-tuned Model Weights
-Our reader models, fine-tuned using Q-Lora on ~40,000 QA pairs, are available on the Huggingface Hub at the following
-links:
-- [Fine-tuned Gemma-2b 4 bit](https://huggingface.co/SupritiVijay/RAG-finetuned-gemma-2b-bnb-4bit)
-- [Fine-tuned Mistral-7b 4 bit](https://huggingface.co/neelbhandari/lora_mistral_model)
-- [Fine-tuned Llama-3.2-3b 4 bit](https://huggingface.co/neelbhandari/lora_llama3b_model)
 They can be loaded via Unsloth AI's `FastLanguageModel` or Huggingface's `AutoPeftModel` classes. Please see `
-src/rag_pipeline/rag_inference.ipynb` for an example.
 ### Weights and Biases dashboard with visualizations of our experiments
-[Link to dashboard (please request an invitation)](https://wandb.ai/aanlp/anlp-ass-2)
-![wandb](wandb.png)

 pinned: false
 ---
+# Answering Questions using Retrieval-Augmented Generation
+_Implementation and Analysis of State of the Art Methods_
+<div align="center">
+    <img src="assets/pittsburgh.webp" width=200>
+</div>
+This project implements and evaluates an end-to-end retreival-augmented generation pipeline for question answering, using state of the art techniques and evaluating performance. The system specializes is answering questions about Pittsburgh and Carnegie Mellon University.
+We use a fine-tuned, quantized version of __Mistral-7B__ as our answering/reader model.
+## Try it out
+You can try out the final best-performing system on Huggingface Spaces here: [![Hugging Face Spaces](https://img.shields.io/badge/🤗%20Spaces-Try%20it%20out-blue)](https://huggingface.co/spaces/jiviteshjn/mistral-rag-qa)
+To understand what's happening behind the scenes, please read the rest of this README before trying it out.
+You can ask questions like:
+- _Who is the largest employer in Pittsburgh?_
+- _Where is the Smithsonian affiliated regional history museum in Pittsburgh?_
+- _Who is the president of CMU?_
+- _Where was the polio vaccine developed?_
+- and anything else about Pittsburgh and CMU!
+A sample list of questions and two sets of outputs generated by the system are provided in `outputs/`.
+> [!NOTE]
+> Huggingface puts inactive spaces to sleep and they can take a while to cold start. If you find the space sleeping, please press restart and wait for a few minutes.
+>
+> Further, this is a complex pipeline, consisting of several models, running on a low-tier GPU space. It may take a few minutes for models to load and caches to warm up, especially after a cold start. Please be patient. Subsequent queries will be faster.
+## System description
+This project implements and analyzes RAG end-to-end, from knowledge corpus collection, model fine-tuning, developing the index, to inference, ablations, and comparisons.
+The following is a brief description of each component:
+### Data collection
+This project builds its knowledge corpus by scraping websites related to Pittsburgh and Carnegie Mellon University. This includes the official city website, [pittsburghpa.gov](https://www.pittsburghpa.gov), websites about the city's sports teams such as [steelers.com](https://www.steelers.com), websites about the city's events, music, and lifestyle such as [visitpittsburgh.com](https://www.visitpittsburgh.com), websites belonging to Carnegie Mellon University, as well as hundreds of relevant Wikipedia and Encyclopaedia Brittanica pages (obtained by searching for keywords that are related to Pittsburgh according to BERT embeddings).
+Scrapy is used as the primary web crawler, owing to its flexibility and controls for not overloading web servers. Beautifulsoup and PDF parsers are also used where necessary, and manual tuning if performed to extract structured data such as calendar events and news items.
+See `src/data_collection`.
+### Synthetic QA pair generation.
+To fine-tune the answering/reader model as well as evaluate our system, we generate synthetic questions and answers from the knowledge corpus using a large language model. We use quantized models for efficiency. A total of __~38,000__ QA pairs are generated.
+See `src/data_annotation/qa-generation.ipynb`.
+### Manual annotation of test set
+To evaluate our system on gold standard examples, ~100 question answer pairs are manually annotated.
+See `src/data_annotation/annotate.py` for a simple Annotation UI in Marimo.
+### Reader model fine-tuning
+We fine-tune the reader model on the generated QA pairs. We use parameter efficient fine-tuning with 4 bit quantization for efficiency, using Quantized LoRA. We compare Mistral 7B, Llama 3.2 3B, and Gemma 2B, and fine Mistral to be the best performing model.
+See `src/finetuning`.
+### Embedding and dense FAISS index creation
+We chunk our documents to a length of 512 tokens and use FAISS as our index store, using a quantized HNSW index for its good performance while saving memory. We use Snowflake's Arctic Embed Medium Long for embedding textual documents, owing to its small size, large context length, and near-SOTA performance on the MTEB leaderboard. We finally embed around 20,000 documents from 14,000 URLs.
+See `src/rag_pipeline/embed_and_build_dense_index.py` and `src/rag_pipeline/load_embeddings_and_build_dense_index.py`.
+### Sparse entity and date index for augmentation
+Our experiments reveal that our retrieval system struggles with entities such as event names and dates, as documents corresponding to two different events tend to be similar as a whole, differing only in small specifics, which translates to embeddings that are similar.
+To mitigate this, we experiment with a sparse TF-IDF index built only over extracted entities and dates. We extract dates at index-building and inference time using SpaCy and entities using an off-the-shelf finetuned RoBERTa model. In practice however, we find the sparse index to be noisy (as is to be expected), and its benefits are not enough to offset the added noise and latency to the retriever system. We hypothesize that fine-tuning the embedding model contrastively will be a better solution to this problem.
+See `src/rag_pipeline/build_sparse_index.py`.
+### Query rewording using Hypothetical Document Embeddings
+Query rewriting to make it more similar to the documents that potentially would contain the answer has emerged as a popular technique. We implement this, using an off-the-shelf LLM as the rewriting model. We see significant gains as a result of this modification.
+See `src/rag_pipeline/rag_validation.py`.
+### Retrieval and reranking
+We retrieve documents from the dense and sparse indices separately, and then rerank them using a cross-encoder model (BAAI's BGE-reranker-v2-m3), only keeping the top scoring third of the documents. This approach works remarkably well in maintaining high recall, while also making sure the context is not too large for the reader model to handle (high precision).
+See `src/rag_pipeline/rag_validation.py`.
+### Retrieved document summarization
+Our documents are at most 512 tokens, which makes context lengths long enough to degade performance, even at small k's (k = 3, 4, or 5). To mitigate this, we summarize the retrieved documents using an LLM. The summarization LLM is query-aware.
+See `src/rag_pipeline/rag_validation.py`.
+### Answer generation using our quantized, finetuned models
+Finally, we get to generating answers! The pipeline implemented in `src/rag_pipeline/rag_validation.py` is batched and meant to run evaluations on a test set, and compute metrics. `src/rag_qa.py` implements a simple question answering class that uses the pipeline to anser queries one at a time. `src/app.py` uses this class to create the demo app hosted on Huggingface.
 ## Usage
+To run any component of this RAG system, please first install the required dependencies listed in `src/requirements.txt` using pip.
 ```shell
+$ # preferable inside a virtual environment
+$ pip install -r src/requirements.txt
 ```
+> [!WARNING]
+> Many quantization frameworks are under active development and support varies across systems and hardwares. This project uses BitsAndBytes, which is not compatible with Apple Silicon at this time. This project has only been tested on Linux servers. The exact requirements may require some tweaking to ensure compatibility with your system (hardware, OS, CUDA versions, etc.)
+>
+> The Huggingface space is the more convenient way to try this project out.
+### Data collection
 All data collection scripts, which include crawlers and parsers for various websites are located in the
 `src/data_collection` repository.
 $ scrapy crawl visit_pittsburgh -O path/to/output.jsonl  # or pittsburgh_pa, steelers, pirates, penquins
 ```
+### Data annotation
 `src/data_annotation` includes a QA generation notebook (
 `qa-generation.ipynb`) for automated data processing and question-answer generation.
 To execute the notebook, open it in Jupyter Notebook or a compatible IDE and run the cells in
 order.
+### Fine-tuning
+`src/finetuning_scripts` includes the notebook used to fine-tune the Mistral 7B model on the generated QA pairs using Q-LoRA in 4 bit quantization.
 To execute the notebook, open it in Jupyter Notebook or a compatible IDE and run the cells in order.
+### RAG pipeline
 Components of the RAG pipeline, such as embedding documents and building the dense index, loading existing embeddings
 and building the dense index, building the sparse index and infering using the pipeline can be run as Python scripts
+from the `src/rag_pipeline` directory. The appropriate configuration needs to be set in a config file present in the `src/rag_pipeline/conf` directory and then specified in the command. Config files are managed using Hydra/OmegaConf
 and are in the Hydra format. Please look at existing files for an example. To run the pipeline with a specific
 configuration, run:
 ```shell
+$ python src/rag_pipeline/embed_and_build_index.py --config-name=validation
 ```
+The complete validation pipeline can be run as:
+```shell
+$ python src/rag_pipeline/rag_validation.py --config-name=validation
+```
+### Demo app
+The demo app can be run as:
+```shell
+streamlit run src/app.py
+```
 ### Data
+The processed dataset, including the complete knowledge corpus, generated and manually annotated QA pairs, embeddings, and dense and sparse indices is available on [Kaggle](https://www.kaggle.com/datasets/jiviteshjain/final-rag-data/).
 ### Fine-tuned Model Weights
+Adapters for the best-performing reader model, Mistral 7B fine-tuned using Q-Lora on ~38,000 QA pairs, are available on the [Huggingface Hub](https://huggingface.co/jiviteshjn/pittsburgh-rag-qa-mistral-finetuned).
 They can be loaded via Unsloth AI's `FastLanguageModel` or Huggingface's `AutoPeftModel` classes. Please see `
+src/rag_pipeline/rag_validation.py` for an example.
 ### Weights and Biases dashboard with visualizations of our experiments
+Along with LangChain for implementation, this project uses Hydra for configuration management and Weights and Biases for experiment tracking.
+![wandb](assets/wandb.png)
+## Experiments and ablations
+### RAG and fine-tuning helps
+While numbers tell the same story, the difference is best seen through examples. Note the quality and factuality difference across the Mistral 7B model off-the-shelf, with fine-tuning, and with fine-tuning and retrieved context.
+| *Mistral - No Finetune*                                                                                                                                                                                                                                                                                                                                            | *Mistral - Finetuned*                                                                                              | *Mistral - RAG*                                                                                             |
+|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|
+| *Question:* Where is the NNOC House located? <br> *Answer:* The NNOC House is located in the heart of the city, on a busy street. It is a three-story building with a large front door and a small side entrance. The building is made of brick and has a red roof. The windows are small and the walls are white. The building is surrounded by a small garden. | *Question:* Where is the NNOC House located? <br> *Answer:* The NNOC House is located in Pittsburgh, Pennsylvania. | *Question:* Where is the NNOC House located? <br> *Answer:* 7101 Apple Street at Pittsburgh, PA.            |
+| *Question:* What is the theme of Tim Pearce's jokes? <br> *Answer:* The theme of Tim Pearce's jokes is that he is a very funny person.                                                                                                                                                                                                                           | *Question:* What is the theme of Tim Pearce's jokes? <br> *Answer:* His jokes are about his life and experiences.  | *Question:* What is the theme of Tim Pearce's jokes? <br> *Answer:* They are primarily snail-related jokes. |
+### Sophisticated retrieval techniques help
+<div align="left">
+    <img src="assets/fig-1.jpg" width=400>
+</div>
+Reranking of retrieved documents using a cross-encoder
+model as well as query rewording using HyDE lead to
+significant performance gains. All results reported with
+k = 5.
+### Higher k's lead to better recall; sparse retrieval is noisy
+<div align="left">
+    <img src="assets/fig-2.jpg" width=400>
+</div>
+The x-axis label corresponds to dense k - sparse k - reranking k. The first two sets of bars show that dense retrieval significantly beats sparse entity-based retrieval. Sets 2 and 3 show the benefit of using a larger k for dense retrieval – even if the mean reciprocal rank (MRR) goes down, the overall recall rate improves. Finally, sets 3, 4, and 5 show that the recall rate improves by using larger k’s for reranking, without hurting the MRR.
+### Mistral 7B is the best performing model*
+<div align="left">
+    <img src="assets/fig-3.jpg" width=500>
+</div>
+Mistral 7B performs the best on our test set according to the SQuAD exact match metric, which Gemma 2B performs better according to the SQuAD F1 metric, hence the star. The fine-tuned models perform significantly better than off-the-shelf models (which manage to get a 0 on the exact match metric).
+## Demo app screenshot
+![demo screenshot](assets/app.jpg)

assets/app.jpg ADDED Viewed

assets/fig-1.jpg ADDED Viewed

assets/fig-2.jpg ADDED Viewed

assets/fig-3.jpg ADDED Viewed

pittsburgh.webp → assets/pittsburgh.webp RENAMED Viewed

File without changes

wandb.png → assets/wandb.png RENAMED Viewed

File without changes

src/requirements.txt ADDED Viewed

	@@ -0,0 +1,168 @@

+accelerate==1.1.1
+aiohappyeyeballs==2.4.3
+aiohttp==3.11.2
+aiosignal==1.3.1
+altair==5.4.1
+annotated-types==0.7.0
+antlr4-python3-runtime==4.9.3
+anyio==4.6.2.post1
+attrs==24.2.0
+bitsandbytes==0.44.1
+blinker==1.9.0
+blis==0.7.11
+cachetools==5.5.0
+catalogue==2.0.10
+certifi==2024.8.30
+charset-normalizer==3.4.0
+click==8.1.7
+cloudpathlib==0.20.0
+confection==0.1.5
+cymem==2.0.8
+dataclasses-json==0.6.7
+datasets==3.1.0
+date-spacy==0.0.1
+dateparser==1.2.0
+dill==0.3.8
+docker-pycreds==0.4.0
+docstring_parser==0.16
+einops==0.8.0
+evaluate==0.4.3
+faiss-cpu==1.9.0
+fastapi==0.115.5
+filelock==3.16.1
+frozenlist==1.5.0
+fsspec==2024.9.0
+gitdb==4.0.11
+GitPython==3.1.43
+greenlet==3.1.1
+h11==0.14.0
+hf_transfer==0.1.8
+httpcore==1.0.7
+httpx==0.27.2
+httpx-sse==0.4.0
+huggingface-hub==0.23.5
+hydra-core==1.3.2
+idna==3.10
+Jinja2==3.1.4
+joblib==1.4.2
+jsonpatch==1.33
+jsonpointer==3.0.0
+jsonschema==4.23.0
+jsonschema-specifications==2024.10.1
+langchain==0.3.7
+langchain-community==0.3.7
+langchain-core==0.3.19
+langchain-huggingface==0.1.2
+langchain-text-splitters==0.3.2
+langcodes==3.4.1
+langsmith==0.1.143
+language_data==1.2.0
+marisa-trie==1.2.1
+markdown-it-py==3.0.0
+MarkupSafe==3.0.2
+marshmallow==3.23.1
+mdurl==0.1.2
+mpmath==1.3.0
+multidict==6.1.0
+multiprocess==0.70.16
+murmurhash==1.0.10
+mypy-extensions==1.0.0
+narwhals==1.13.5
+networkx==3.4.2
+numpy==1.26.4
+nvidia-cublas-cu12==12.1.3.1
+nvidia-cuda-cupti-cu12==12.1.105
+nvidia-cuda-nvrtc-cu12==12.1.105
+nvidia-cuda-runtime-cu12==12.1.105
+nvidia-cudnn-cu12==9.1.0.70
+nvidia-cufft-cu12==11.0.2.54
+nvidia-curand-cu12==10.3.2.106
+nvidia-cusolver-cu12==11.4.5.107
+nvidia-cusparse-cu12==12.1.0.106
+nvidia-nccl-cu12==2.20.5
+nvidia-nvjitlink-cu12==12.6.77
+nvidia-nvtx-cu12==12.1.105
+omegaconf==2.3.0
+orjson==3.10.11
+packaging==24.2
+pandas==2.2.2
+peft==0.13.2
+pillow==11.0.0
+pip==24.0
+pip3-autoremove==1.2.2
+platformdirs==4.3.6
+preshed==3.0.9
+propcache==0.2.0
+protobuf==3.20.3
+psutil==6.1.0
+pyarrow==18.0.0
+pydantic==2.9.2
+pydantic_core==2.23.4
+pydantic-settings==2.6.1
+pydeck==0.9.1
+Pygments==2.18.0
+python-dateutil==2.9.0.post0
+python-dotenv==1.0.1
+pytz==2024.2
+PyYAML==6.0.2
+referencing==0.35.1
+regex==2024.11.6
+requests==2.32.3
+requests-toolbelt==1.0.0
+rich==13.9.4
+rpds-py==0.21.0
+safetensors==0.4.5
+scikit-learn==1.5.2
+scipy==1.14.1
+sentence-transformers==3.3.0
+sentencepiece==0.2.0
+sentry-sdk==2.18.0
+seqeval==1.2.2
+setproctitle==1.3.3
+setuptools==65.5.1
+shellingham==1.5.4
+shtab==1.7.1
+six==1.16.0
+smart-open==7.0.5
+smmap==5.0.1
+sniffio==1.3.1
+spacy==3.7.5
+spacy-legacy==3.0.12
+spacy-loggers==1.0.5
+span-marker==1.5.0
+SQLAlchemy==2.0.35
+srsly==2.4.8
+starlette==0.41.2
+streamlit==1.40.1
+sympy==1.13.3
+tenacity==9.0.0
+thinc==8.2.5
+threadpoolctl==3.5.0
+tokenizers==0.20.3
+toml==0.10.2
+torch==2.4.0
+torchaudio==2.4.0
+torchvision==0.19.0
+tornado==6.4.1
+tqdm==4.67.0
+transformers==4.46.2
+triton==3.0.0
+trl==0.12.1
+typer==0.13.0
+typing_extensions==4.12.2
+typing-inspect==0.9.0
+tyro==0.8.14
+tzdata==2024.2
+tzlocal==5.2
+unsloth==2024.11.7
+unsloth_zoo==2024.11.5
+urllib3==2.2.3
+wandb==0.18.7
+wasabi==1.1.3
+watchdog==6.0.0
+weasel==0.4.1
+wheel==0.45.0
+wrapt==1.16.0
+xformers==0.0.27.post2
+xxhash==3.5.0
+yarl==1.17.1