|
# RAG: Retrieval-Augmented Generation |
|
Paper: [Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks](https://arxiv.org/pdf/2005.11401v4.pdf) |
|
Code: [https://python.langchain.com/docs/use_cases/question_answering/quickstart](https://python.langchain.com/docs/use_cases/question_answering/quickstart) |
|
Similarity Search: [https://arxiv.org/pdf/2403.05440.pdf](https://arxiv.org/pdf/2403.05440.pdf) |
|
Prompt Hub: [https://smith.langchain.com/hub](https://smith.langchain.com/hub) |
|
|
|
## Premise |
|
LLMs store factual knowledge in their parameters. But accessing and manipulating this knowledge in a precise way is not easy. Instead, that specific knowledge can be accessed through the weights and similarity and then added to the prompt of another model for answer generation: |
|
|
|
![RAG Architecture](readme_data/rag1.png) |
|
The original paper considers training end-to-end retriever and generator models in one pipeline. |
|
Models like GPT3.5 and GPT4 don't need that training piece, if their are used with the open AI embedding models. |
|
|
|
**Where RAG Is Used?** |
|
* Mitigate hallucination generated by LLMs |
|
* Hallucinations are factually incorrect information generated by an LLM in response to an instruction or question from a user. |
|
* Hallucinations are very hard to capture and need other methodologies to catch them. |
|
* Even with RAG the probability of hallucination is not zero. |
|
* Allow LLMs the consume the data that is not part of their training in their inference. |
|
* LLMs are pre-training on huge amounts of data from public sources. |
|
* Proprietary data is not available to any general pre-trained LLM. |
|
|
|
**How RAG Works** |
|
1. One can vectorize the semantics of a piece of text using specialized LLMs, called embedding models. |
|
2. A collection of text can be vectorized to be used for answering the incoming questions. |
|
3. A question is embedded using the same embedding model, and similar documents from a vector database is retrieved using a similarity search algorithm, like cosine similarity. |
|
4. Found documents with the question are passed to generator LLM to generate and answer. |
|
|
|
|
|
## RAG Components |
|
Here are different components present in a RAG pipeline: |
|
1. **Embedding Model:** Vectorization model which for each string outputs a vector of fixed length. |
|
* The length is the dimension of latent space for this model. |
|
3. **Vector DB:** Specialized database for saving pairs of (text,embeddings). Each of these pairs are called a document. Usually we put related documents in one collection. This makes the similarity search easier. |
|
2. **Similarity Metric:** Given two document pairs $(t_1,e_1)$ and $(t_2,e_2), the metric calculates the similarity of $t1$ and $t_2$ by performing some geometric calculation on their respective embeddings. |
|
* **Cosine Similarity:** Calculates the cosine of the angle between embedding1 and embedding2: |
|
$$\cos(\theta)=\frac{e_1 \cdot e_2}{||e_1||\;||e_2||}.$$ |
|
* **Inner Product:** Calculates the inner product of $e_1$ and $e_2$: |
|
$$e1\cdot e_2.$$ |
|
* **Distance:** Calculates the distance of $e_1$ from $e_2$ using $L_p$ norms: |
|
$$||e_1-e_2||_p.$$ |
|
4. **Generator Model:** Generates the final answer based on the question and found similar text in the database that may contain the answer. |
|
|
|
## Cosine Similarity Problems |
|
* The motivation for using cosine similarity is that the norm of the learned embedding-vectors is not as important as the directional alignment between the embedding-vectors. |
|
* But cosine similarity "work better but sometimes also worse than the unnormalized dot-product between embedded vectors in practice." |
|
* The paper derives "analytically how cosine-similarity can yield arbitrary and therefore meaningless ‘similarities.’" |
|
* To do this, they "study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights." |
|
* "The underlying reason is not cosine similarity itself, but the fact that the learned embeddings have a degree of freedom that can render arbitrary cosine-similarities even though their (unnormalized) dot-products are well-defined and unique." |