joelg commited on
Commit
8a18ce0
·
1 Parent(s): 5b22af7

initial attempt

Browse files
Files changed (9) hide show
  1. .gitignore +50 -0
  2. README.md +187 -6
  3. SPACE_README.md +60 -0
  4. app.py +257 -0
  5. default_corpus.pdf +3 -0
  6. default_corpus.txt +102 -0
  7. i18n.py +92 -0
  8. rag_system.py +205 -0
  9. requirements.txt +9 -0
.gitignore ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ env/
8
+ venv/
9
+ ENV/
10
+ build/
11
+ develop-eggs/
12
+ dist/
13
+ downloads/
14
+ eggs/
15
+ .eggs/
16
+ lib/
17
+ lib64/
18
+ parts/
19
+ sdist/
20
+ var/
21
+ wheels/
22
+ *.egg-info/
23
+ .installed.cfg
24
+ *.egg
25
+
26
+ # Virtual environments
27
+ .venv/
28
+ venv/
29
+ ENV/
30
+
31
+ # IDE
32
+ .vscode/
33
+ .idea/
34
+ *.swp
35
+ *.swo
36
+ *~
37
+
38
+ # OS
39
+ .DS_Store
40
+ Thumbs.db
41
+
42
+ # Gradio
43
+ flagged/
44
+
45
+ # Model cache
46
+ models/
47
+ .cache/
48
+
49
+ # Logs
50
+ *.log
README.md CHANGED
@@ -1,12 +1,193 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: Discover Rag
3
- emoji: 🚀
4
- colorFrom: indigo
5
- colorTo: gray
6
  sdk: gradio
7
- sdk_version: 5.49.0
8
  app_file: app.py
9
  pinned: false
 
10
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
+ # 🎓 RAG Pedagogical Demo
2
+
3
+ A pedagogical web application demonstrating Retrieval Augmented Generation (RAG) systems for students and learners.
4
+
5
+ ## 🌟 Features
6
+
7
+ - **Bilingual Interface** (English/French)
8
+ - **Document Processing**: Upload PDF documents or use default corpus
9
+ - **Configurable Retrieval**:
10
+ - Choose embedding models
11
+ - Adjust chunk size and overlap
12
+ - Set top-k and similarity thresholds
13
+ - **Configurable Generation**:
14
+ - Select different LLMs
15
+ - Adjust temperature and max tokens
16
+ - **Educational Visualization**:
17
+ - View retrieved chunks with similarity scores
18
+ - See the exact prompt sent to the LLM
19
+ - Understand each step of the RAG pipeline
20
+
21
+ ## 🚀 Quick Start
22
+
23
+ ### Local Installation
24
+
25
+ ```bash
26
+ # Clone the repository
27
+ git clone <your-repo-url>
28
+ cd RAG_pedago
29
+
30
+ # Install dependencies
31
+ pip install -r requirements.txt
32
+
33
+ # Run the application
34
+ python app.py
35
+ ```
36
+
37
+ ### HuggingFace Spaces
38
+
39
+ This application is designed to run on HuggingFace Spaces with ZeroGPU support.
40
+
41
+ 1. Create a new Space on HuggingFace
42
+ 2. Select "Gradio" as the SDK
43
+ 3. Enable ZeroGPU in Space settings
44
+ 4. Upload all files from this repository
45
+ 5. The app will automatically deploy
46
+
47
+ ## 📚 Usage
48
+
49
+ ### 1. Corpus Management
50
+ - Upload your own PDF document or use the included default corpus about RAG
51
+ - Configure chunk size (100-1000 characters) and overlap (0-200 characters)
52
+ - Process the corpus to create embeddings
53
+
54
+ ### 2. Retrieval Configuration
55
+ - Choose an embedding model:
56
+ - `all-MiniLM-L6-v2`: Fast, lightweight
57
+ - `all-mpnet-base-v2`: Better quality, slower
58
+ - `paraphrase-multilingual-MiniLM-L12-v2`: Multilingual support
59
+ - Set top-k (1-10): Number of chunks to retrieve
60
+ - Set similarity threshold (0.0-1.0): Minimum similarity score
61
+
62
+ ### 3. Generation Configuration
63
+ - Select a language model:
64
+ - `zephyr-7b-beta`: Fast, good quality
65
+ - `Mistral-7B-Instruct-v0.2`: High quality
66
+ - `Llama-2-7b-chat-hf`: Alternative option
67
+ - Adjust temperature (0.0-2.0): Controls creativity
68
+ - Set max tokens (50-1000): Response length
69
+
70
+ ### 4. Query & Results
71
+ - Enter your question
72
+ - Use example questions to get started
73
+ - View the generated answer
74
+ - Examine retrieved chunks with similarity scores
75
+ - Inspect the prompt sent to the LLM
76
+
77
+ ## 🏗️ Architecture
78
+
79
+ ```
80
+ ┌─────────────────┐
81
+ │ PDF Document │
82
+ └────────┬────────┘
83
+
84
+
85
+ ┌─────────────────┐
86
+ │ Text Chunking │
87
+ └────────┬────────┘
88
+
89
+
90
+ ┌─────────────────┐
91
+ │ Embeddings │◄──── Embedding Model
92
+ └────────┬────────┘
93
+
94
+
95
+ ┌─────────────────┐
96
+ │ FAISS Index │
97
+ └────────┬────────┘
98
+
99
+
100
+ ┌─────────────────┐
101
+ │ User Query │
102
+ └────────┬────────┘
103
+
104
+
105
+ ┌─────────────────┐
106
+ │ Retrieval │──► Top-K Chunks
107
+ └────────┬────────┘
108
+
109
+
110
+ ┌─────────────────┐
111
+ │ Generation │◄──── Language Model
112
+ └────────┬────────┘
113
+
114
+
115
+ ┌─────────────────┐
116
+ │ Answer │
117
+ └─────────────────┘
118
+ ```
119
+
120
+ ## 🛠️ Technical Stack
121
+
122
+ - **Framework**: Gradio 4.44.0
123
+ - **Embeddings**: Sentence Transformers
124
+ - **Vector Store**: FAISS
125
+ - **LLMs**: HuggingFace Inference API
126
+ - **GPU**: HuggingFace ZeroGPU
127
+ - **PDF Processing**: PyPDF2
128
+
129
+ ## 📝 Files Structure
130
+
131
+ ```
132
+ RAG_pedago/
133
+ ├── app.py # Main Gradio interface
134
+ ├── rag_system.py # Core RAG logic
135
+ ├── i18n.py # Internationalization
136
+ ├── requirements.txt # Python dependencies
137
+ ├── default_corpus.pdf # Default corpus about RAG
138
+ ├── default_corpus.txt # Source text for default corpus
139
+ └── README.md # This file
140
+ ```
141
+
142
+ ## 🎯 Educational Goals
143
+
144
+ This application helps students understand:
145
+
146
+ 1. **Document Processing**: How text is split into chunks
147
+ 2. **Embeddings**: How text is converted to vectors
148
+ 3. **Similarity Search**: How relevant information is retrieved
149
+ 4. **Prompt Engineering**: How context is provided to LLMs
150
+ 5. **Generation**: How LLMs produce answers based on retrieved context
151
+ 6. **Parameter Impact**: How different settings affect results
152
+
153
+ ## 🔧 Configuration for HuggingFace Spaces
154
+
155
+ Create a `README.md` in your Space with this header:
156
+
157
+ ```yaml
158
  ---
159
+ title: RAG Pedagogical Demo
160
+ emoji: 🎓
161
+ colorFrom: blue
162
+ colorTo: purple
163
  sdk: gradio
164
+ sdk_version: 4.44.0
165
  app_file: app.py
166
  pinned: false
167
+ license: mit
168
  ---
169
+ ```
170
+
171
+ ## 🤝 Contributing
172
+
173
+ Contributions are welcome! Feel free to:
174
+ - Add more embedding models
175
+ - Include additional LLMs
176
+ - Improve the interface
177
+ - Add more visualizations
178
+ - Enhance documentation
179
+
180
+ ## 📄 License
181
+
182
+ MIT License - Feel free to use this for educational purposes.
183
+
184
+ ## 🙏 Acknowledgments
185
+
186
+ - HuggingFace for the Spaces platform and ZeroGPU
187
+ - Sentence Transformers for embeddings
188
+ - FAISS for efficient similarity search
189
+ - Gradio for the interface framework
190
+
191
+ ## 📧 Contact
192
 
193
+ For questions or feedback, please open an issue on GitHub.
SPACE_README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: RAG Pedagogical Demo
3
+ emoji: 🎓
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # 🎓 RAG Pedagogical Demo
14
+
15
+ An interactive educational application to learn about Retrieval Augmented Generation (RAG) systems.
16
+
17
+ ## What is RAG?
18
+
19
+ Retrieval Augmented Generation (RAG) combines information retrieval with language generation to create more accurate and grounded AI responses. Instead of relying solely on a language model's training data, RAG systems:
20
+
21
+ 1. **Retrieve** relevant information from a document corpus
22
+ 2. **Augment** the query with this retrieved context
23
+ 3. **Generate** an answer based on both the query and the retrieved information
24
+
25
+ ## Features
26
+
27
+ - 📚 **Upload your own PDFs** or use the default corpus
28
+ - 🔧 **Configure retrieval parameters**: embedding models, chunk size, top-k, similarity threshold
29
+ - 🤖 **Configure generation parameters**: LLM selection, temperature, max tokens
30
+ - 📊 **Visualize the process**: see retrieved chunks, similarity scores, and prompts
31
+ - 🌍 **Bilingual interface**: English and French
32
+
33
+ ## How to Use
34
+
35
+ 1. **Corpus Tab**: Upload a PDF or use the default corpus about RAG
36
+ 2. **Retrieval Tab**: Choose embedding model and retrieval parameters
37
+ 3. **Generation Tab**: Select language model and generation settings
38
+ 4. **Query Tab**: Ask questions and see how RAG works!
39
+
40
+ ## Educational Value
41
+
42
+ This demo helps you understand:
43
+ - How documents are processed and chunked
44
+ - How semantic search retrieves relevant information
45
+ - How context is provided to language models
46
+ - How different parameters affect the results
47
+
48
+ Perfect for students, educators, and anyone curious about modern AI systems!
49
+
50
+ ## Technology
51
+
52
+ - **Framework**: Gradio
53
+ - **Embeddings**: Sentence Transformers
54
+ - **Vector Store**: FAISS
55
+ - **LLMs**: HuggingFace Inference API
56
+ - **Infrastructure**: HuggingFace ZeroGPU
57
+
58
+ ---
59
+
60
+ *Note: This application runs on ZeroGPU. Initial requests may take longer as models are loaded.*
app.py ADDED
@@ -0,0 +1,257 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import spaces
3
+ from rag_system import RAGSystem
4
+ from i18n import get_text
5
+
6
+ # Initialize RAG system
7
+ rag = RAGSystem()
8
+
9
+ # Language state
10
+ language = "en"
11
+
12
+ def switch_language(lang):
13
+ global language
14
+ language = lang
15
+ return update_interface()
16
+
17
+ def update_interface():
18
+ t = lambda key: get_text(key, language)
19
+ return {
20
+ # Update all interface elements with new language
21
+ }
22
+
23
+ @spaces.GPU
24
+ def process_pdf(pdf_file, chunk_size, chunk_overlap):
25
+ """Process uploaded PDF and create embeddings"""
26
+ t = lambda key: get_text(key, language)
27
+ try:
28
+ if pdf_file is None:
29
+ # Load default corpus
30
+ status = rag.load_default_corpus(chunk_size, chunk_overlap)
31
+ else:
32
+ status = rag.process_document(pdf_file.name, chunk_size, chunk_overlap)
33
+ return status
34
+ except Exception as e:
35
+ return f"{t('error')}: {str(e)}"
36
+
37
+ @spaces.GPU
38
+ def perform_query(
39
+ query,
40
+ embedding_model,
41
+ top_k,
42
+ similarity_threshold,
43
+ llm_model,
44
+ temperature,
45
+ max_tokens
46
+ ):
47
+ """Perform RAG query and return results"""
48
+ t = lambda key: get_text(key, language)
49
+
50
+ if not rag.is_ready():
51
+ return t("no_corpus"), "", "", ""
52
+
53
+ try:
54
+ # Set models and parameters
55
+ rag.set_embedding_model(embedding_model)
56
+ rag.set_llm_model(llm_model)
57
+
58
+ # Retrieve relevant chunks
59
+ results = rag.retrieve(query, top_k, similarity_threshold)
60
+
61
+ # Format retrieved chunks display
62
+ chunks_display = format_chunks(results, t)
63
+
64
+ # Generate answer
65
+ answer, prompt = rag.generate(
66
+ query,
67
+ results,
68
+ temperature,
69
+ max_tokens
70
+ )
71
+
72
+ return answer, chunks_display, prompt, ""
73
+
74
+ except Exception as e:
75
+ return "", "", "", f"{t('error')}: {str(e)}"
76
+
77
+ def format_chunks(results, t):
78
+ """Format retrieved chunks with scores for display"""
79
+ output = f"### {t('retrieved_chunks')}\n\n"
80
+ for i, (chunk, score) in enumerate(results, 1):
81
+ output += f"**Chunk {i}** - {t('similarity_score')}: {score:.4f}\n"
82
+ output += f"```\n{chunk}\n```\n\n"
83
+ return output
84
+
85
+ def create_interface():
86
+ t = lambda key: get_text(key, language)
87
+
88
+ with gr.Blocks(title="RAG Pedagogical Demo", theme=gr.themes.Soft()) as demo:
89
+
90
+ # Header with language selector
91
+ with gr.Row():
92
+ gr.Markdown("# 🎓 RAG Pedagogical Demo / Démo Pédagogique RAG")
93
+ lang_radio = gr.Radio(
94
+ choices=["en", "fr"],
95
+ value="en",
96
+ label="Language / Langue"
97
+ )
98
+
99
+ with gr.Tabs() as tabs:
100
+
101
+ # Tab 1: Corpus Management
102
+ with gr.Tab(label="📚 Corpus"):
103
+ gr.Markdown(f"## {t('corpus_management')}")
104
+ gr.Markdown(t('corpus_description'))
105
+
106
+ pdf_upload = gr.File(
107
+ label=t('upload_pdf'),
108
+ file_types=[".pdf"]
109
+ )
110
+
111
+ with gr.Row():
112
+ chunk_size = gr.Slider(
113
+ minimum=100,
114
+ maximum=1000,
115
+ value=500,
116
+ step=50,
117
+ label=t('chunk_size')
118
+ )
119
+ chunk_overlap = gr.Slider(
120
+ minimum=0,
121
+ maximum=200,
122
+ value=50,
123
+ step=10,
124
+ label=t('chunk_overlap')
125
+ )
126
+
127
+ process_btn = gr.Button(t('process_corpus'), variant="primary")
128
+ corpus_status = gr.Textbox(label=t('status'), interactive=False)
129
+
130
+ process_btn.click(
131
+ fn=process_pdf,
132
+ inputs=[pdf_upload, chunk_size, chunk_overlap],
133
+ outputs=corpus_status
134
+ )
135
+
136
+ # Tab 2: Retrieval Configuration
137
+ with gr.Tab(label="🔍 Retrieval"):
138
+ gr.Markdown(f"## {t('retrieval_config')}")
139
+
140
+ embedding_model = gr.Dropdown(
141
+ choices=[
142
+ "sentence-transformers/all-MiniLM-L6-v2",
143
+ "sentence-transformers/all-mpnet-base-v2",
144
+ "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
145
+ ],
146
+ value="sentence-transformers/all-MiniLM-L6-v2",
147
+ label=t('embedding_model')
148
+ )
149
+
150
+ with gr.Row():
151
+ top_k = gr.Slider(
152
+ minimum=1,
153
+ maximum=10,
154
+ value=3,
155
+ step=1,
156
+ label=t('top_k')
157
+ )
158
+ similarity_threshold = gr.Slider(
159
+ minimum=0.0,
160
+ maximum=1.0,
161
+ value=0.0,
162
+ step=0.05,
163
+ label=t('similarity_threshold')
164
+ )
165
+
166
+ # Tab 3: Generation Configuration
167
+ with gr.Tab(label="🤖 Generation"):
168
+ gr.Markdown(f"## {t('generation_config')}")
169
+
170
+ llm_model = gr.Dropdown(
171
+ choices=[
172
+ "HuggingFaceH4/zephyr-7b-beta",
173
+ "mistralai/Mistral-7B-Instruct-v0.2",
174
+ "meta-llama/Llama-2-7b-chat-hf",
175
+ ],
176
+ value="HuggingFaceH4/zephyr-7b-beta",
177
+ label=t('llm_model')
178
+ )
179
+
180
+ with gr.Row():
181
+ temperature = gr.Slider(
182
+ minimum=0.0,
183
+ maximum=2.0,
184
+ value=0.7,
185
+ step=0.1,
186
+ label=t('temperature')
187
+ )
188
+ max_tokens = gr.Slider(
189
+ minimum=50,
190
+ maximum=1000,
191
+ value=300,
192
+ step=50,
193
+ label=t('max_tokens')
194
+ )
195
+
196
+ # Tab 4: Query & Results
197
+ with gr.Tab(label="💬 Query"):
198
+ gr.Markdown(f"## {t('ask_question')}")
199
+
200
+ query_input = gr.Textbox(
201
+ label=t('your_question'),
202
+ placeholder=t('question_placeholder'),
203
+ lines=3
204
+ )
205
+
206
+ examples = gr.Examples(
207
+ examples=[
208
+ ["What is Retrieval Augmented Generation?"],
209
+ ["How does RAG improve language models?"],
210
+ ["What are the main components of a RAG system?"],
211
+ ],
212
+ inputs=query_input,
213
+ label=t('example_questions')
214
+ )
215
+
216
+ query_btn = gr.Button(t('submit_query'), variant="primary")
217
+
218
+ gr.Markdown(f"### {t('answer')}")
219
+ answer_output = gr.Markdown()
220
+
221
+ with gr.Accordion(t('retrieved_chunks'), open=True):
222
+ chunks_output = gr.Markdown()
223
+
224
+ with gr.Accordion(t('prompt_sent'), open=False):
225
+ prompt_output = gr.Code(language="text")
226
+
227
+ error_output = gr.Textbox(label=t('errors'), visible=False)
228
+
229
+ query_btn.click(
230
+ fn=perform_query,
231
+ inputs=[
232
+ query_input,
233
+ embedding_model,
234
+ top_k,
235
+ similarity_threshold,
236
+ llm_model,
237
+ temperature,
238
+ max_tokens
239
+ ],
240
+ outputs=[answer_output, chunks_output, prompt_output, error_output]
241
+ )
242
+
243
+ # Footer
244
+ gr.Markdown("""
245
+ ---
246
+ **Note**: This is a pedagogical demonstration of RAG systems.
247
+ Models run on HuggingFace ZeroGPU infrastructure.
248
+
249
+ **Note** : Ceci est une démonstration pédagogique des systèmes RAG.
250
+ Les modèles tournent sur l'infrastructure HuggingFace ZeroGPU.
251
+ """)
252
+
253
+ return demo
254
+
255
+ if __name__ == "__main__":
256
+ demo = create_interface()
257
+ demo.launch()
default_corpus.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d7bab80f2f89a13137478e388431523e1e1efb2e85905151c9d88b9c4171a8c9
3
+ size 8352
default_corpus.txt ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Retrieval Augmented Generation (RAG): A Comprehensive Guide
2
+
3
+ Introduction to RAG
4
+
5
+ Retrieval Augmented Generation (RAG) is an advanced natural language processing technique that combines the strengths of retrieval-based and generation-based approaches. RAG systems enhance the capabilities of large language models by providing them with relevant external knowledge retrieved from a document corpus.
6
+
7
+ The fundamental principle behind RAG is straightforward: instead of relying solely on the knowledge encoded in a language model's parameters during training, RAG systems dynamically retrieve relevant information from an external knowledge base to inform their responses. This approach offers several advantages including more accurate and up-to-date information, reduced hallucinations, and the ability to cite sources.
8
+
9
+ Architecture of RAG Systems
10
+
11
+ A typical RAG system consists of three main components:
12
+
13
+ 1. Document Processing and Indexing
14
+ The first step involves processing a corpus of documents. Documents are split into smaller chunks or passages, typically ranging from 100 to 1000 tokens. These chunks are then converted into dense vector representations (embeddings) using neural embedding models such as BERT, Sentence-BERT, or other transformer-based encoders.
15
+
16
+ These embeddings capture the semantic meaning of the text and are stored in a vector database or index structure like FAISS, Pinecone, or Weaviate. This indexing allows for efficient similarity search during the retrieval phase.
17
+
18
+ 2. Retrieval Component
19
+ When a user submits a query, the retrieval component performs the following operations:
20
+ - The query is encoded into a vector representation using the same embedding model used for documents
21
+ - A similarity search is performed against the indexed document embeddings
22
+ - The top-k most similar document chunks are retrieved based on cosine similarity or other distance metrics
23
+ - These retrieved chunks serve as context for the generation phase
24
+
25
+ The retrieval component can use various strategies including dense retrieval (vector similarity), sparse retrieval (keyword-based like BM25), or hybrid approaches combining both methods.
26
+
27
+ 3. Generation Component
28
+ The generation component takes the retrieved documents along with the original query and generates a response. Modern RAG systems typically use large language models (LLMs) such as GPT-4, Claude, Llama, or other generative models.
29
+
30
+ The retrieved context is incorporated into the prompt sent to the LLM, typically following a template like:
31
+ "Given the following context: [retrieved documents], please answer this question: [user query]"
32
+
33
+ The LLM then generates a response grounded in the provided context, reducing the likelihood of hallucinations and improving factual accuracy.
34
+
35
+ Key Parameters in RAG Systems
36
+
37
+ Several parameters significantly impact the performance of RAG systems:
38
+
39
+ Chunk Size and Overlap
40
+ The size of document chunks affects both retrieval accuracy and context quality. Smaller chunks provide more precise retrieval but may lack sufficient context. Larger chunks provide more context but may dilute relevance. Typical chunk sizes range from 200 to 1000 characters. Overlap between chunks (e.g., 10-20%) helps ensure important information isn't split across chunk boundaries.
41
+
42
+ Number of Retrieved Documents (top-k)
43
+ This parameter determines how many relevant chunks are retrieved for each query. More documents provide richer context but may introduce noise and increase computational costs. Common values range from 3 to 10 documents.
44
+
45
+ Similarity Threshold
46
+ Setting a minimum similarity score filters out irrelevant chunks. This helps maintain response quality but may result in insufficient context if set too high.
47
+
48
+ Temperature and Generation Parameters
49
+ These control the creativity and randomness of the generated response. Lower temperatures (0.1-0.3) produce more deterministic outputs suitable for factual queries, while higher temperatures (0.7-1.0) allow for more creative responses.
50
+
51
+ Advantages of RAG
52
+
53
+ RAG systems offer several compelling benefits:
54
+
55
+ Up-to-date Information: By retrieving from external documents, RAG systems can access current information beyond the training data cutoff date of the language model.
56
+
57
+ Domain Specialization: RAG enables language models to be specialized for specific domains by using relevant document collections without requiring expensive fine-tuning.
58
+
59
+ Reduced Hallucinations: Grounding responses in retrieved documents significantly reduces the tendency of language models to generate false or invented information.
60
+
61
+ Source Attribution: RAG systems can cite specific documents or passages, improving transparency and trustworthiness.
62
+
63
+ Cost Efficiency: RAG provides a more economical alternative to fine-tuning large models for specific knowledge domains.
64
+
65
+ Challenges and Limitations
66
+
67
+ Despite its advantages, RAG faces several challenges:
68
+
69
+ Retrieval Quality: The entire system's performance depends heavily on retrieving relevant documents. Poor retrieval leads to poor generation.
70
+
71
+ Context Window Limitations: Language models have finite context windows, limiting how much retrieved information can be included.
72
+
73
+ Latency: The retrieval step adds latency compared to pure generation approaches.
74
+
75
+ Embedding Quality: The quality of document embeddings directly impacts retrieval accuracy, and creating good embeddings requires careful model selection.
76
+
77
+ Applications of RAG
78
+
79
+ RAG technology has found applications across numerous domains:
80
+
81
+ Question Answering Systems: RAG excels at building systems that answer questions based on large document collections, technical documentation, or knowledge bases.
82
+
83
+ Customer Support: Companies deploy RAG-based chatbots that retrieve information from product manuals, FAQs, and support tickets to provide accurate assistance.
84
+
85
+ Research Assistance: RAG helps researchers quickly find and synthesize information from vast academic literature.
86
+
87
+ Legal and Compliance: Law firms use RAG to search case law and regulations to support legal research and compliance checking.
88
+
89
+ Healthcare: Medical professionals leverage RAG to access the latest research papers and clinical guidelines.
90
+
91
+ Future Directions
92
+
93
+ The field of RAG continues to evolve rapidly. Current research focuses on:
94
+ - Hybrid retrieval methods combining dense and sparse retrieval
95
+ - Multi-modal RAG incorporating images, tables, and structured data
96
+ - Iterative retrieval strategies that refine searches based on intermediate results
97
+ - Better evaluation metrics for RAG system performance
98
+ - Integration with knowledge graphs for improved reasoning
99
+
100
+ Conclusion
101
+
102
+ Retrieval Augmented Generation represents a powerful paradigm for building more accurate, reliable, and controllable AI systems. By combining the flexibility of large language models with the precision of information retrieval, RAG enables applications that were previously difficult or impossible to implement. As the technology matures, we can expect RAG to become an increasingly standard component of production AI systems across industries.
i18n.py ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Internationalization support for the RAG demo"""
2
+
3
+ TRANSLATIONS = {
4
+ "en": {
5
+ # Corpus tab
6
+ "corpus_management": "Corpus Management",
7
+ "corpus_description": "Upload a PDF document or use the default corpus. The document will be split into chunks for retrieval.",
8
+ "upload_pdf": "Upload PDF",
9
+ "chunk_size": "Chunk Size (characters)",
10
+ "chunk_overlap": "Chunk Overlap (characters)",
11
+ "process_corpus": "Process Corpus",
12
+ "status": "Status",
13
+
14
+ # Retrieval tab
15
+ "retrieval_config": "Retrieval Configuration",
16
+ "embedding_model": "Embedding Model",
17
+ "top_k": "Top K (number of chunks to retrieve)",
18
+ "similarity_threshold": "Similarity Threshold (minimum score)",
19
+
20
+ # Generation tab
21
+ "generation_config": "Generation Configuration",
22
+ "llm_model": "Language Model",
23
+ "temperature": "Temperature (creativity)",
24
+ "max_tokens": "Max Tokens (response length)",
25
+
26
+ # Query tab
27
+ "ask_question": "Ask a Question",
28
+ "your_question": "Your Question",
29
+ "question_placeholder": "Enter your question here...",
30
+ "example_questions": "Example Questions",
31
+ "submit_query": "Submit Query",
32
+ "answer": "Answer",
33
+ "retrieved_chunks": "Retrieved Chunks",
34
+ "prompt_sent": "Prompt Sent to LLM",
35
+ "errors": "Errors",
36
+
37
+ # Results
38
+ "similarity_score": "Similarity Score",
39
+ "no_corpus": "Please process a corpus first in the Corpus tab.",
40
+
41
+ # Messages
42
+ "error": "Error",
43
+ "success": "Success",
44
+ "processing": "Processing...",
45
+ },
46
+ "fr": {
47
+ # Onglet Corpus
48
+ "corpus_management": "Gestion du Corpus",
49
+ "corpus_description": "Téléchargez un document PDF ou utilisez le corpus par défaut. Le document sera divisé en chunks pour la récupération.",
50
+ "upload_pdf": "Télécharger un PDF",
51
+ "chunk_size": "Taille des Chunks (caractères)",
52
+ "chunk_overlap": "Chevauchement des Chunks (caractères)",
53
+ "process_corpus": "Traiter le Corpus",
54
+ "status": "Statut",
55
+
56
+ # Onglet Retrieval
57
+ "retrieval_config": "Configuration du Retrieval",
58
+ "embedding_model": "Modèle d'Embedding",
59
+ "top_k": "Top K (nombre de chunks à récupérer)",
60
+ "similarity_threshold": "Seuil de Similarité (score minimum)",
61
+
62
+ # Onglet Génération
63
+ "generation_config": "Configuration de la Génération",
64
+ "llm_model": "Modèle de Langage",
65
+ "temperature": "Température (créativité)",
66
+ "max_tokens": "Max Tokens (longueur de la réponse)",
67
+
68
+ # Onglet Query
69
+ "ask_question": "Poser une Question",
70
+ "your_question": "Votre Question",
71
+ "question_placeholder": "Entrez votre question ici...",
72
+ "example_questions": "Questions d'Exemple",
73
+ "submit_query": "Soumettre la Question",
74
+ "answer": "Réponse",
75
+ "retrieved_chunks": "Chunks Récupérés",
76
+ "prompt_sent": "Prompt Envoyé au LLM",
77
+ "errors": "Erreurs",
78
+
79
+ # Résultats
80
+ "similarity_score": "Score de Similarité",
81
+ "no_corpus": "Veuillez d'abord traiter un corpus dans l'onglet Corpus.",
82
+
83
+ # Messages
84
+ "error": "Erreur",
85
+ "success": "Succès",
86
+ "processing": "Traitement en cours...",
87
+ }
88
+ }
89
+
90
+ def get_text(key, language="en"):
91
+ """Get translated text for a given key and language"""
92
+ return TRANSLATIONS.get(language, TRANSLATIONS["en"]).get(key, key)
rag_system.py ADDED
@@ -0,0 +1,205 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Core RAG system implementation"""
2
+
3
+ import os
4
+ from typing import List, Tuple, Optional
5
+ import PyPDF2
6
+ import faiss
7
+ import numpy as np
8
+ from sentence_transformers import SentenceTransformer
9
+ from huggingface_hub import InferenceClient
10
+ import spaces
11
+
12
+ class RAGSystem:
13
+ def __init__(self):
14
+ self.chunks = []
15
+ self.embeddings = None
16
+ self.index = None
17
+ self.embedding_model = None
18
+ self.embedding_model_name = None
19
+ self.llm_client = None
20
+ self.llm_model_name = None
21
+ self.ready = False
22
+
23
+ def is_ready(self) -> bool:
24
+ """Check if the system is ready to process queries"""
25
+ return self.ready and self.index is not None
26
+
27
+ def load_default_corpus(self, chunk_size: int = 500, chunk_overlap: int = 50) -> str:
28
+ """Load the default corpus"""
29
+ default_path = "default_corpus.pdf"
30
+ if os.path.exists(default_path):
31
+ return self.process_document(default_path, chunk_size, chunk_overlap)
32
+ else:
33
+ return "Default corpus not found. Please upload a PDF."
34
+
35
+ def extract_text_from_pdf(self, pdf_path: str) -> str:
36
+ """Extract text from PDF file"""
37
+ text = ""
38
+ with open(pdf_path, 'rb') as file:
39
+ pdf_reader = PyPDF2.PdfReader(file)
40
+ for page in pdf_reader.pages:
41
+ text += page.extract_text() + "\n"
42
+ return text
43
+
44
+ def chunk_text(self, text: str, chunk_size: int = 500, overlap: int = 50) -> List[str]:
45
+ """Split text into overlapping chunks"""
46
+ chunks = []
47
+ start = 0
48
+ text_length = len(text)
49
+
50
+ while start < text_length:
51
+ end = start + chunk_size
52
+ chunk = text[start:end]
53
+
54
+ # Try to break at sentence boundary
55
+ if end < text_length:
56
+ # Look for sentence endings
57
+ last_period = chunk.rfind('.')
58
+ last_newline = chunk.rfind('\n')
59
+ break_point = max(last_period, last_newline)
60
+
61
+ if break_point > chunk_size * 0.5: # Only break if we're past halfway
62
+ chunk = chunk[:break_point + 1]
63
+ end = start + break_point + 1
64
+
65
+ chunks.append(chunk.strip())
66
+ start = end - overlap
67
+
68
+ return [c for c in chunks if len(c) > 50] # Filter out very small chunks
69
+
70
+ @spaces.GPU
71
+ def create_embeddings(self, texts: List[str]) -> np.ndarray:
72
+ """Create embeddings for text chunks"""
73
+ if self.embedding_model is None:
74
+ self.set_embedding_model("sentence-transformers/all-MiniLM-L6-v2")
75
+
76
+ embeddings = self.embedding_model.encode(
77
+ texts,
78
+ show_progress_bar=True,
79
+ convert_to_numpy=True
80
+ )
81
+ return embeddings
82
+
83
+ def build_index(self, embeddings: np.ndarray):
84
+ """Build FAISS index from embeddings"""
85
+ dimension = embeddings.shape[1]
86
+ self.index = faiss.IndexFlatIP(dimension) # Inner product for cosine similarity
87
+
88
+ # Normalize embeddings for cosine similarity
89
+ faiss.normalize_L2(embeddings)
90
+ self.index.add(embeddings)
91
+
92
+ def process_document(self, pdf_path: str, chunk_size: int = 500, chunk_overlap: int = 50) -> str:
93
+ """Process a PDF document and create searchable index"""
94
+ try:
95
+ # Extract text
96
+ text = self.extract_text_from_pdf(pdf_path)
97
+
98
+ if not text.strip():
99
+ return "Error: No text could be extracted from the PDF."
100
+
101
+ # Chunk text
102
+ self.chunks = self.chunk_text(text, chunk_size, chunk_overlap)
103
+
104
+ if not self.chunks:
105
+ return "Error: No valid chunks created from the document."
106
+
107
+ # Create embeddings
108
+ self.embeddings = self.create_embeddings(self.chunks)
109
+
110
+ # Build index
111
+ self.build_index(self.embeddings)
112
+
113
+ self.ready = True
114
+ return f"Success! Processed {len(self.chunks)} chunks from the document."
115
+
116
+ except Exception as e:
117
+ self.ready = False
118
+ return f"Error processing document: {str(e)}"
119
+
120
+ def set_embedding_model(self, model_name: str):
121
+ """Set or change the embedding model"""
122
+ if self.embedding_model_name != model_name:
123
+ self.embedding_model_name = model_name
124
+ self.embedding_model = SentenceTransformer(model_name)
125
+
126
+ # If we have chunks, re-create embeddings and index
127
+ if self.chunks:
128
+ self.embeddings = self.create_embeddings(self.chunks)
129
+ self.build_index(self.embeddings)
130
+
131
+ def set_llm_model(self, model_name: str):
132
+ """Set or change the LLM model"""
133
+ if self.llm_model_name != model_name:
134
+ self.llm_model_name = model_name
135
+ self.llm_client = InferenceClient(model_name)
136
+
137
+ @spaces.GPU
138
+ def retrieve(
139
+ self,
140
+ query: str,
141
+ top_k: int = 3,
142
+ similarity_threshold: float = 0.0
143
+ ) -> List[Tuple[str, float]]:
144
+ """Retrieve relevant chunks for a query"""
145
+ if not self.is_ready():
146
+ return []
147
+
148
+ # Encode query
149
+ query_embedding = self.embedding_model.encode(
150
+ [query],
151
+ convert_to_numpy=True
152
+ )
153
+
154
+ # Normalize for cosine similarity
155
+ faiss.normalize_L2(query_embedding)
156
+
157
+ # Search
158
+ scores, indices = self.index.search(query_embedding, top_k)
159
+
160
+ # Filter by threshold and return results
161
+ results = []
162
+ for score, idx in zip(scores[0], indices[0]):
163
+ if score >= similarity_threshold:
164
+ results.append((self.chunks[idx], float(score)))
165
+
166
+ return results
167
+
168
+ @spaces.GPU
169
+ def generate(
170
+ self,
171
+ query: str,
172
+ retrieved_chunks: List[Tuple[str, float]],
173
+ temperature: float = 0.7,
174
+ max_tokens: int = 300
175
+ ) -> Tuple[str, str]:
176
+ """Generate answer using LLM"""
177
+ if self.llm_client is None:
178
+ self.set_llm_model("HuggingFaceH4/zephyr-7b-beta")
179
+
180
+ # Build context from retrieved chunks
181
+ context = "\n\n".join([chunk for chunk, _ in retrieved_chunks])
182
+
183
+ # Create prompt
184
+ prompt = f"""You are a helpful assistant. Use the following context to answer the question.
185
+ If you cannot answer based on the context, say so.
186
+
187
+ Context:
188
+ {context}
189
+
190
+ Question: {query}
191
+
192
+ Answer:"""
193
+
194
+ # Generate response
195
+ try:
196
+ response = self.llm_client.text_generation(
197
+ prompt,
198
+ max_new_tokens=max_tokens,
199
+ temperature=temperature,
200
+ return_full_text=False
201
+ )
202
+ return response, prompt
203
+
204
+ except Exception as e:
205
+ return f"Error generating response: {str(e)}", prompt
requirements.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ gradio==4.44.0
2
+ torch==2.1.0
3
+ sentence-transformers==2.2.2
4
+ faiss-cpu==1.7.4
5
+ PyPDF2==3.0.1
6
+ huggingface_hub==0.20.0
7
+ spaces==0.29.2
8
+ transformers==4.36.0
9
+ accelerate==0.25.0