Spaces:

joelg
/

discover_rag

Sleeping

App Files Files Community

joelg commited on Oct 8

Commit

8a18ce0

1 Parent(s): 5b22af7

initial attempt

Browse files

Files changed (9) hide show

.gitignore +50 -0
README.md +187 -6
SPACE_README.md +60 -0
app.py +257 -0
default_corpus.pdf +3 -0
default_corpus.txt +102 -0
i18n.py +92 -0
rag_system.py +205 -0
requirements.txt +9 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,50 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+env/
+venv/
+ENV/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+.venv/
+venv/
+ENV/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# OS
+.DS_Store
+Thumbs.db
+# Gradio
+flagged/
+# Model cache
+models/
+.cache/
+# Logs
+*.log

README.md CHANGED Viewed

@@ -1,12 +1,193 @@
 ---
-title: Discover Rag
-emoji: 🚀
-colorFrom: indigo
-colorTo: gray
 sdk: gradio
-sdk_version: 5.49.0
 app_file: app.py
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# 🎓 RAG Pedagogical Demo
+A pedagogical web application demonstrating Retrieval Augmented Generation (RAG) systems for students and learners.
+## 🌟 Features
+- **Bilingual Interface** (English/French)
+- **Document Processing**: Upload PDF documents or use default corpus
+- **Configurable Retrieval**:
+  - Choose embedding models
+  - Adjust chunk size and overlap
+  - Set top-k and similarity thresholds
+- **Configurable Generation**:
+  - Select different LLMs
+  - Adjust temperature and max tokens
+- **Educational Visualization**:
+  - View retrieved chunks with similarity scores
+  - See the exact prompt sent to the LLM
+  - Understand each step of the RAG pipeline
+## 🚀 Quick Start
+### Local Installation
+```bash
+# Clone the repository
+git clone <your-repo-url>
+cd RAG_pedago
+# Install dependencies
+pip install -r requirements.txt
+# Run the application
+python app.py
+```
+### HuggingFace Spaces
+This application is designed to run on HuggingFace Spaces with ZeroGPU support.
+1. Create a new Space on HuggingFace
+2. Select "Gradio" as the SDK
+3. Enable ZeroGPU in Space settings
+4. Upload all files from this repository
+5. The app will automatically deploy
+## 📚 Usage
+### 1. Corpus Management
+- Upload your own PDF document or use the included default corpus about RAG
+- Configure chunk size (100-1000 characters) and overlap (0-200 characters)
+- Process the corpus to create embeddings
+### 2. Retrieval Configuration
+- Choose an embedding model:
+  - `all-MiniLM-L6-v2`: Fast, lightweight
+  - `all-mpnet-base-v2`: Better quality, slower
+  - `paraphrase-multilingual-MiniLM-L12-v2`: Multilingual support
+- Set top-k (1-10): Number of chunks to retrieve
+- Set similarity threshold (0.0-1.0): Minimum similarity score
+### 3. Generation Configuration
+- Select a language model:
+  - `zephyr-7b-beta`: Fast, good quality
+  - `Mistral-7B-Instruct-v0.2`: High quality
+  - `Llama-2-7b-chat-hf`: Alternative option
+- Adjust temperature (0.0-2.0): Controls creativity
+- Set max tokens (50-1000): Response length
+### 4. Query & Results
+- Enter your question
+- Use example questions to get started
+- View the generated answer
+- Examine retrieved chunks with similarity scores
+- Inspect the prompt sent to the LLM
+## 🏗️ Architecture
+```
+┌─────────────────┐
+│  PDF Document   │
+└────────┬────────┘
+         │
+         ▼
+┌─────────────────┐
+│  Text Chunking  │
+└────────┬────────┘
+         │
+         ▼
+┌─────────────────┐
+│   Embeddings    │◄──── Embedding Model
+└────────┬────────┘
+         │
+         ▼
+┌─────────────────┐
+│  FAISS Index    │
+└────────┬────────┘
+         │
+         ▼
+┌─────────────────┐
+│  User Query     │
+└────────┬────────┘
+         │
+         ▼
+┌─────────────────┐
+│   Retrieval     │──► Top-K Chunks
+└────────┬────────┘
+         │
+         ▼
+┌─────────────────┐
+│   Generation    │◄──── Language Model
+└────────┬────────┘
+         │
+         ▼
+┌─────────────────┐
+│     Answer      │
+└─────────────────┘
+```
+## 🛠️ Technical Stack
+- **Framework**: Gradio 4.44.0
+- **Embeddings**: Sentence Transformers
+- **Vector Store**: FAISS
+- **LLMs**: HuggingFace Inference API
+- **GPU**: HuggingFace ZeroGPU
+- **PDF Processing**: PyPDF2
+## 📝 Files Structure
+```
+RAG_pedago/
+├── app.py                 # Main Gradio interface
+├── rag_system.py         # Core RAG logic
+├── i18n.py               # Internationalization
+├── requirements.txt      # Python dependencies
+├── default_corpus.pdf    # Default corpus about RAG
+├── default_corpus.txt    # Source text for default corpus
+└── README.md            # This file
+```
+## 🎯 Educational Goals
+This application helps students understand:
+1. **Document Processing**: How text is split into chunks
+2. **Embeddings**: How text is converted to vectors
+3. **Similarity Search**: How relevant information is retrieved
+4. **Prompt Engineering**: How context is provided to LLMs
+5. **Generation**: How LLMs produce answers based on retrieved context
+6. **Parameter Impact**: How different settings affect results
+## 🔧 Configuration for HuggingFace Spaces
+Create a `README.md` in your Space with this header:
+```yaml
 ---
+title: RAG Pedagogical Demo
+emoji: 🎓
+colorFrom: blue
+colorTo: purple
 sdk: gradio
+sdk_version: 4.44.0
 app_file: app.py
 pinned: false
+license: mit
 ---
+```
+## 🤝 Contributing
+Contributions are welcome! Feel free to:
+- Add more embedding models
+- Include additional LLMs
+- Improve the interface
+- Add more visualizations
+- Enhance documentation
+## 📄 License
+MIT License - Feel free to use this for educational purposes.
+## 🙏 Acknowledgments
+- HuggingFace for the Spaces platform and ZeroGPU
+- Sentence Transformers for embeddings
+- FAISS for efficient similarity search
+- Gradio for the interface framework
+## 📧 Contact
+For questions or feedback, please open an issue on GitHub.

SPACE_README.md ADDED Viewed

	@@ -0,0 +1,60 @@

+---
+title: RAG Pedagogical Demo
+emoji: 🎓
+colorFrom: blue
+colorTo: purple
+sdk: gradio
+sdk_version: 4.44.0
+app_file: app.py
+pinned: false
+license: mit
+---
+# 🎓 RAG Pedagogical Demo
+An interactive educational application to learn about Retrieval Augmented Generation (RAG) systems.
+## What is RAG?
+Retrieval Augmented Generation (RAG) combines information retrieval with language generation to create more accurate and grounded AI responses. Instead of relying solely on a language model's training data, RAG systems:
+1. **Retrieve** relevant information from a document corpus
+2. **Augment** the query with this retrieved context
+3. **Generate** an answer based on both the query and the retrieved information
+## Features
+- 📚 **Upload your own PDFs** or use the default corpus
+- 🔧 **Configure retrieval parameters**: embedding models, chunk size, top-k, similarity threshold
+- 🤖 **Configure generation parameters**: LLM selection, temperature, max tokens
+- 📊 **Visualize the process**: see retrieved chunks, similarity scores, and prompts
+- 🌍 **Bilingual interface**: English and French
+## How to Use
+1. **Corpus Tab**: Upload a PDF or use the default corpus about RAG
+2. **Retrieval Tab**: Choose embedding model and retrieval parameters
+3. **Generation Tab**: Select language model and generation settings
+4. **Query Tab**: Ask questions and see how RAG works!
+## Educational Value
+This demo helps you understand:
+- How documents are processed and chunked
+- How semantic search retrieves relevant information
+- How context is provided to language models
+- How different parameters affect the results
+Perfect for students, educators, and anyone curious about modern AI systems!
+## Technology
+- **Framework**: Gradio
+- **Embeddings**: Sentence Transformers
+- **Vector Store**: FAISS
+- **LLMs**: HuggingFace Inference API
+- **Infrastructure**: HuggingFace ZeroGPU
+---
+*Note: This application runs on ZeroGPU. Initial requests may take longer as models are loaded.*

app.py ADDED Viewed

	@@ -0,0 +1,257 @@

+import gradio as gr
+import spaces
+from rag_system import RAGSystem
+from i18n import get_text
+# Initialize RAG system
+rag = RAGSystem()
+# Language state
+language = "en"
+def switch_language(lang):
+    global language
+    language = lang
+    return update_interface()
+def update_interface():
+    t = lambda key: get_text(key, language)
+    return {
+        # Update all interface elements with new language
+    }
+@spaces.GPU
+def process_pdf(pdf_file, chunk_size, chunk_overlap):
+    """Process uploaded PDF and create embeddings"""
+    t = lambda key: get_text(key, language)
+    try:
+        if pdf_file is None:
+            # Load default corpus
+            status = rag.load_default_corpus(chunk_size, chunk_overlap)
+        else:
+            status = rag.process_document(pdf_file.name, chunk_size, chunk_overlap)
+        return status
+    except Exception as e:
+        return f"{t('error')}: {str(e)}"
+@spaces.GPU
+def perform_query(
+    query,
+    embedding_model,
+    top_k,
+    similarity_threshold,
+    llm_model,
+    temperature,
+    max_tokens
+):
+    """Perform RAG query and return results"""
+    t = lambda key: get_text(key, language)
+    if not rag.is_ready():
+        return t("no_corpus"), "", "", ""
+    try:
+        # Set models and parameters
+        rag.set_embedding_model(embedding_model)
+        rag.set_llm_model(llm_model)
+        # Retrieve relevant chunks
+        results = rag.retrieve(query, top_k, similarity_threshold)
+        # Format retrieved chunks display
+        chunks_display = format_chunks(results, t)
+        # Generate answer
+        answer, prompt = rag.generate(
+            query,
+            results,
+            temperature,
+            max_tokens
+        )
+        return answer, chunks_display, prompt, ""
+    except Exception as e:
+        return "", "", "", f"{t('error')}: {str(e)}"
+def format_chunks(results, t):
+    """Format retrieved chunks with scores for display"""
+    output = f"### {t('retrieved_chunks')}\n\n"
+    for i, (chunk, score) in enumerate(results, 1):
+        output += f"**Chunk {i}** - {t('similarity_score')}: {score:.4f}\n"
+        output += f"```\n{chunk}\n```\n\n"
+    return output
+def create_interface():
+    t = lambda key: get_text(key, language)
+    with gr.Blocks(title="RAG Pedagogical Demo", theme=gr.themes.Soft()) as demo:
+        # Header with language selector
+        with gr.Row():
+            gr.Markdown("# 🎓 RAG Pedagogical Demo / Démo Pédagogique RAG")
+            lang_radio = gr.Radio(
+                choices=["en", "fr"],
+                value="en",
+                label="Language / Langue"
+            )
+        with gr.Tabs() as tabs:
+            # Tab 1: Corpus Management
+            with gr.Tab(label="📚 Corpus"):
+                gr.Markdown(f"## {t('corpus_management')}")
+                gr.Markdown(t('corpus_description'))
+                pdf_upload = gr.File(
+                    label=t('upload_pdf'),
+                    file_types=[".pdf"]
+                )
+                with gr.Row():
+                    chunk_size = gr.Slider(
+                        minimum=100,
+                        maximum=1000,
+                        value=500,
+                        step=50,
+                        label=t('chunk_size')
+                    )
+                    chunk_overlap = gr.Slider(
+                        minimum=0,
+                        maximum=200,
+                        value=50,
+                        step=10,
+                        label=t('chunk_overlap')
+                    )
+                process_btn = gr.Button(t('process_corpus'), variant="primary")
+                corpus_status = gr.Textbox(label=t('status'), interactive=False)
+                process_btn.click(
+                    fn=process_pdf,
+                    inputs=[pdf_upload, chunk_size, chunk_overlap],
+                    outputs=corpus_status
+                )
+            # Tab 2: Retrieval Configuration
+            with gr.Tab(label="🔍 Retrieval"):
+                gr.Markdown(f"## {t('retrieval_config')}")
+                embedding_model = gr.Dropdown(
+                    choices=[
+                        "sentence-transformers/all-MiniLM-L6-v2",
+                        "sentence-transformers/all-mpnet-base-v2",
+                        "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
+                    ],
+                    value="sentence-transformers/all-MiniLM-L6-v2",
+                    label=t('embedding_model')
+                )
+                with gr.Row():
+                    top_k = gr.Slider(
+                        minimum=1,
+                        maximum=10,
+                        value=3,
+                        step=1,
+                        label=t('top_k')
+                    )
+                    similarity_threshold = gr.Slider(
+                        minimum=0.0,
+                        maximum=1.0,
+                        value=0.0,
+                        step=0.05,
+                        label=t('similarity_threshold')
+                    )
+            # Tab 3: Generation Configuration
+            with gr.Tab(label="🤖 Generation"):
+                gr.Markdown(f"## {t('generation_config')}")
+                llm_model = gr.Dropdown(
+                    choices=[
+                        "HuggingFaceH4/zephyr-7b-beta",
+                        "mistralai/Mistral-7B-Instruct-v0.2",
+                        "meta-llama/Llama-2-7b-chat-hf",
+                    ],
+                    value="HuggingFaceH4/zephyr-7b-beta",
+                    label=t('llm_model')
+                )
+                with gr.Row():
+                    temperature = gr.Slider(
+                        minimum=0.0,
+                        maximum=2.0,
+                        value=0.7,
+                        step=0.1,
+                        label=t('temperature')
+                    )
+                    max_tokens = gr.Slider(
+                        minimum=50,
+                        maximum=1000,
+                        value=300,
+                        step=50,
+                        label=t('max_tokens')
+                    )
+            # Tab 4: Query & Results
+            with gr.Tab(label="💬 Query"):
+                gr.Markdown(f"## {t('ask_question')}")
+                query_input = gr.Textbox(
+                    label=t('your_question'),
+                    placeholder=t('question_placeholder'),
+                    lines=3
+                )
+                examples = gr.Examples(
+                    examples=[
+                        ["What is Retrieval Augmented Generation?"],
+                        ["How does RAG improve language models?"],
+                        ["What are the main components of a RAG system?"],
+                    ],
+                    inputs=query_input,
+                    label=t('example_questions')
+                )
+                query_btn = gr.Button(t('submit_query'), variant="primary")
+                gr.Markdown(f"### {t('answer')}")
+                answer_output = gr.Markdown()
+                with gr.Accordion(t('retrieved_chunks'), open=True):
+                    chunks_output = gr.Markdown()
+                with gr.Accordion(t('prompt_sent'), open=False):
+                    prompt_output = gr.Code(language="text")
+                error_output = gr.Textbox(label=t('errors'), visible=False)
+                query_btn.click(
+                    fn=perform_query,
+                    inputs=[
+                        query_input,
+                        embedding_model,
+                        top_k,
+                        similarity_threshold,
+                        llm_model,
+                        temperature,
+                        max_tokens
+                    ],
+                    outputs=[answer_output, chunks_output, prompt_output, error_output]
+                )
+        # Footer
+        gr.Markdown("""
+        ---
+        **Note**: This is a pedagogical demonstration of RAG systems.
+        Models run on HuggingFace ZeroGPU infrastructure.
+        **Note** : Ceci est une démonstration pédagogique des systèmes RAG.
+        Les modèles tournent sur l'infrastructure HuggingFace ZeroGPU.
+        """)
+    return demo
+if __name__ == "__main__":
+    demo = create_interface()
+    demo.launch()

default_corpus.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d7bab80f2f89a13137478e388431523e1e1efb2e85905151c9d88b9c4171a8c9
+size 8352

default_corpus.txt ADDED Viewed

	@@ -0,0 +1,102 @@

+Retrieval Augmented Generation (RAG): A Comprehensive Guide
+Introduction to RAG
+Retrieval Augmented Generation (RAG) is an advanced natural language processing technique that combines the strengths of retrieval-based and generation-based approaches. RAG systems enhance the capabilities of large language models by providing them with relevant external knowledge retrieved from a document corpus.
+The fundamental principle behind RAG is straightforward: instead of relying solely on the knowledge encoded in a language model's parameters during training, RAG systems dynamically retrieve relevant information from an external knowledge base to inform their responses. This approach offers several advantages including more accurate and up-to-date information, reduced hallucinations, and the ability to cite sources.
+Architecture of RAG Systems
+A typical RAG system consists of three main components:
+1. Document Processing and Indexing
+The first step involves processing a corpus of documents. Documents are split into smaller chunks or passages, typically ranging from 100 to 1000 tokens. These chunks are then converted into dense vector representations (embeddings) using neural embedding models such as BERT, Sentence-BERT, or other transformer-based encoders.
+These embeddings capture the semantic meaning of the text and are stored in a vector database or index structure like FAISS, Pinecone, or Weaviate. This indexing allows for efficient similarity search during the retrieval phase.
+2. Retrieval Component
+When a user submits a query, the retrieval component performs the following operations:
+- The query is encoded into a vector representation using the same embedding model used for documents
+- A similarity search is performed against the indexed document embeddings
+- The top-k most similar document chunks are retrieved based on cosine similarity or other distance metrics
+- These retrieved chunks serve as context for the generation phase
+The retrieval component can use various strategies including dense retrieval (vector similarity), sparse retrieval (keyword-based like BM25), or hybrid approaches combining both methods.
+3. Generation Component
+The generation component takes the retrieved documents along with the original query and generates a response. Modern RAG systems typically use large language models (LLMs) such as GPT-4, Claude, Llama, or other generative models.
+The retrieved context is incorporated into the prompt sent to the LLM, typically following a template like:
+"Given the following context: [retrieved documents], please answer this question: [user query]"
+The LLM then generates a response grounded in the provided context, reducing the likelihood of hallucinations and improving factual accuracy.
+Key Parameters in RAG Systems
+Several parameters significantly impact the performance of RAG systems:
+Chunk Size and Overlap
+The size of document chunks affects both retrieval accuracy and context quality. Smaller chunks provide more precise retrieval but may lack sufficient context. Larger chunks provide more context but may dilute relevance. Typical chunk sizes range from 200 to 1000 characters. Overlap between chunks (e.g., 10-20%) helps ensure important information isn't split across chunk boundaries.
+Number of Retrieved Documents (top-k)
+This parameter determines how many relevant chunks are retrieved for each query. More documents provide richer context but may introduce noise and increase computational costs. Common values range from 3 to 10 documents.
+Similarity Threshold
+Setting a minimum similarity score filters out irrelevant chunks. This helps maintain response quality but may result in insufficient context if set too high.
+Temperature and Generation Parameters
+These control the creativity and randomness of the generated response. Lower temperatures (0.1-0.3) produce more deterministic outputs suitable for factual queries, while higher temperatures (0.7-1.0) allow for more creative responses.
+Advantages of RAG
+RAG systems offer several compelling benefits:
+Up-to-date Information: By retrieving from external documents, RAG systems can access current information beyond the training data cutoff date of the language model.
+Domain Specialization: RAG enables language models to be specialized for specific domains by using relevant document collections without requiring expensive fine-tuning.
+Reduced Hallucinations: Grounding responses in retrieved documents significantly reduces the tendency of language models to generate false or invented information.
+Source Attribution: RAG systems can cite specific documents or passages, improving transparency and trustworthiness.
+Cost Efficiency: RAG provides a more economical alternative to fine-tuning large models for specific knowledge domains.
+Challenges and Limitations
+Despite its advantages, RAG faces several challenges:
+Retrieval Quality: The entire system's performance depends heavily on retrieving relevant documents. Poor retrieval leads to poor generation.
+Context Window Limitations: Language models have finite context windows, limiting how much retrieved information can be included.
+Latency: The retrieval step adds latency compared to pure generation approaches.
+Embedding Quality: The quality of document embeddings directly impacts retrieval accuracy, and creating good embeddings requires careful model selection.
+Applications of RAG
+RAG technology has found applications across numerous domains:
+Question Answering Systems: RAG excels at building systems that answer questions based on large document collections, technical documentation, or knowledge bases.
+Customer Support: Companies deploy RAG-based chatbots that retrieve information from product manuals, FAQs, and support tickets to provide accurate assistance.
+Research Assistance: RAG helps researchers quickly find and synthesize information from vast academic literature.
+Legal and Compliance: Law firms use RAG to search case law and regulations to support legal research and compliance checking.
+Healthcare: Medical professionals leverage RAG to access the latest research papers and clinical guidelines.
+Future Directions
+The field of RAG continues to evolve rapidly. Current research focuses on:
+- Hybrid retrieval methods combining dense and sparse retrieval
+- Multi-modal RAG incorporating images, tables, and structured data
+- Iterative retrieval strategies that refine searches based on intermediate results
+- Better evaluation metrics for RAG system performance
+- Integration with knowledge graphs for improved reasoning
+Conclusion
+Retrieval Augmented Generation represents a powerful paradigm for building more accurate, reliable, and controllable AI systems. By combining the flexibility of large language models with the precision of information retrieval, RAG enables applications that were previously difficult or impossible to implement. As the technology matures, we can expect RAG to become an increasingly standard component of production AI systems across industries.

i18n.py ADDED Viewed

	@@ -0,0 +1,92 @@

+"""Internationalization support for the RAG demo"""
+TRANSLATIONS = {
+    "en": {
+        # Corpus tab
+        "corpus_management": "Corpus Management",
+        "corpus_description": "Upload a PDF document or use the default corpus. The document will be split into chunks for retrieval.",
+        "upload_pdf": "Upload PDF",
+        "chunk_size": "Chunk Size (characters)",
+        "chunk_overlap": "Chunk Overlap (characters)",
+        "process_corpus": "Process Corpus",
+        "status": "Status",
+        # Retrieval tab
+        "retrieval_config": "Retrieval Configuration",
+        "embedding_model": "Embedding Model",
+        "top_k": "Top K (number of chunks to retrieve)",
+        "similarity_threshold": "Similarity Threshold (minimum score)",
+        # Generation tab
+        "generation_config": "Generation Configuration",
+        "llm_model": "Language Model",
+        "temperature": "Temperature (creativity)",
+        "max_tokens": "Max Tokens (response length)",
+        # Query tab
+        "ask_question": "Ask a Question",
+        "your_question": "Your Question",
+        "question_placeholder": "Enter your question here...",
+        "example_questions": "Example Questions",
+        "submit_query": "Submit Query",
+        "answer": "Answer",
+        "retrieved_chunks": "Retrieved Chunks",
+        "prompt_sent": "Prompt Sent to LLM",
+        "errors": "Errors",
+        # Results
+        "similarity_score": "Similarity Score",
+        "no_corpus": "Please process a corpus first in the Corpus tab.",
+        # Messages
+        "error": "Error",
+        "success": "Success",
+        "processing": "Processing...",
+    },
+    "fr": {
+        # Onglet Corpus
+        "corpus_management": "Gestion du Corpus",
+        "corpus_description": "Téléchargez un document PDF ou utilisez le corpus par défaut. Le document sera divisé en chunks pour la récupération.",
+        "upload_pdf": "Télécharger un PDF",
+        "chunk_size": "Taille des Chunks (caractères)",
+        "chunk_overlap": "Chevauchement des Chunks (caractères)",
+        "process_corpus": "Traiter le Corpus",
+        "status": "Statut",
+        # Onglet Retrieval
+        "retrieval_config": "Configuration du Retrieval",
+        "embedding_model": "Modèle d'Embedding",
+        "top_k": "Top K (nombre de chunks à récupérer)",
+        "similarity_threshold": "Seuil de Similarité (score minimum)",
+        # Onglet Génération
+        "generation_config": "Configuration de la Génération",
+        "llm_model": "Modèle de Langage",
+        "temperature": "Température (créativité)",
+        "max_tokens": "Max Tokens (longueur de la réponse)",
+        # Onglet Query
+        "ask_question": "Poser une Question",
+        "your_question": "Votre Question",
+        "question_placeholder": "Entrez votre question ici...",
+        "example_questions": "Questions d'Exemple",
+        "submit_query": "Soumettre la Question",
+        "answer": "Réponse",
+        "retrieved_chunks": "Chunks Récupérés",
+        "prompt_sent": "Prompt Envoyé au LLM",
+        "errors": "Erreurs",
+        # Résultats
+        "similarity_score": "Score de Similarité",
+        "no_corpus": "Veuillez d'abord traiter un corpus dans l'onglet Corpus.",
+        # Messages
+        "error": "Erreur",
+        "success": "Succès",
+        "processing": "Traitement en cours...",
+    }
+}
+def get_text(key, language="en"):
+    """Get translated text for a given key and language"""
+    return TRANSLATIONS.get(language, TRANSLATIONS["en"]).get(key, key)

rag_system.py ADDED Viewed

	@@ -0,0 +1,205 @@

+"""Core RAG system implementation"""
+import os
+from typing import List, Tuple, Optional
+import PyPDF2
+import faiss
+import numpy as np
+from sentence_transformers import SentenceTransformer
+from huggingface_hub import InferenceClient
+import spaces
+class RAGSystem:
+    def __init__(self):
+        self.chunks = []
+        self.embeddings = None
+        self.index = None
+        self.embedding_model = None
+        self.embedding_model_name = None
+        self.llm_client = None
+        self.llm_model_name = None
+        self.ready = False
+    def is_ready(self) -> bool:
+        """Check if the system is ready to process queries"""
+        return self.ready and self.index is not None
+    def load_default_corpus(self, chunk_size: int = 500, chunk_overlap: int = 50) -> str:
+        """Load the default corpus"""
+        default_path = "default_corpus.pdf"
+        if os.path.exists(default_path):
+            return self.process_document(default_path, chunk_size, chunk_overlap)
+        else:
+            return "Default corpus not found. Please upload a PDF."
+    def extract_text_from_pdf(self, pdf_path: str) -> str:
+        """Extract text from PDF file"""
+        text = ""
+        with open(pdf_path, 'rb') as file:
+            pdf_reader = PyPDF2.PdfReader(file)
+            for page in pdf_reader.pages:
+                text += page.extract_text() + "\n"
+        return text
+    def chunk_text(self, text: str, chunk_size: int = 500, overlap: int = 50) -> List[str]:
+        """Split text into overlapping chunks"""
+        chunks = []
+        start = 0
+        text_length = len(text)
+        while start < text_length:
+            end = start + chunk_size
+            chunk = text[start:end]
+            # Try to break at sentence boundary
+            if end < text_length:
+                # Look for sentence endings
+                last_period = chunk.rfind('.')
+                last_newline = chunk.rfind('\n')
+                break_point = max(last_period, last_newline)
+                if break_point > chunk_size * 0.5:  # Only break if we're past halfway
+                    chunk = chunk[:break_point + 1]
+                    end = start + break_point + 1
+            chunks.append(chunk.strip())
+            start = end - overlap
+        return [c for c in chunks if len(c) > 50]  # Filter out very small chunks
+    @spaces.GPU
+    def create_embeddings(self, texts: List[str]) -> np.ndarray:
+        """Create embeddings for text chunks"""
+        if self.embedding_model is None:
+            self.set_embedding_model("sentence-transformers/all-MiniLM-L6-v2")
+        embeddings = self.embedding_model.encode(
+            texts,
+            show_progress_bar=True,
+            convert_to_numpy=True
+        )
+        return embeddings
+    def build_index(self, embeddings: np.ndarray):
+        """Build FAISS index from embeddings"""
+        dimension = embeddings.shape[1]
+        self.index = faiss.IndexFlatIP(dimension)  # Inner product for cosine similarity
+        # Normalize embeddings for cosine similarity
+        faiss.normalize_L2(embeddings)
+        self.index.add(embeddings)
+    def process_document(self, pdf_path: str, chunk_size: int = 500, chunk_overlap: int = 50) -> str:
+        """Process a PDF document and create searchable index"""
+        try:
+            # Extract text
+            text = self.extract_text_from_pdf(pdf_path)
+            if not text.strip():
+                return "Error: No text could be extracted from the PDF."
+            # Chunk text
+            self.chunks = self.chunk_text(text, chunk_size, chunk_overlap)
+            if not self.chunks:
+                return "Error: No valid chunks created from the document."
+            # Create embeddings
+            self.embeddings = self.create_embeddings(self.chunks)
+            # Build index
+            self.build_index(self.embeddings)
+            self.ready = True
+            return f"Success! Processed {len(self.chunks)} chunks from the document."
+        except Exception as e:
+            self.ready = False
+            return f"Error processing document: {str(e)}"
+    def set_embedding_model(self, model_name: str):
+        """Set or change the embedding model"""
+        if self.embedding_model_name != model_name:
+            self.embedding_model_name = model_name
+            self.embedding_model = SentenceTransformer(model_name)
+            # If we have chunks, re-create embeddings and index
+            if self.chunks:
+                self.embeddings = self.create_embeddings(self.chunks)
+                self.build_index(self.embeddings)
+    def set_llm_model(self, model_name: str):
+        """Set or change the LLM model"""
+        if self.llm_model_name != model_name:
+            self.llm_model_name = model_name
+            self.llm_client = InferenceClient(model_name)
+    @spaces.GPU
+    def retrieve(
+        self,
+        query: str,
+        top_k: int = 3,
+        similarity_threshold: float = 0.0
+    ) -> List[Tuple[str, float]]:
+        """Retrieve relevant chunks for a query"""
+        if not self.is_ready():
+            return []
+        # Encode query
+        query_embedding = self.embedding_model.encode(
+            [query],
+            convert_to_numpy=True
+        )
+        # Normalize for cosine similarity
+        faiss.normalize_L2(query_embedding)
+        # Search
+        scores, indices = self.index.search(query_embedding, top_k)
+        # Filter by threshold and return results
+        results = []
+        for score, idx in zip(scores[0], indices[0]):
+            if score >= similarity_threshold:
+                results.append((self.chunks[idx], float(score)))
+        return results
+    @spaces.GPU
+    def generate(
+        self,
+        query: str,
+        retrieved_chunks: List[Tuple[str, float]],
+        temperature: float = 0.7,
+        max_tokens: int = 300
+    ) -> Tuple[str, str]:
+        """Generate answer using LLM"""
+        if self.llm_client is None:
+            self.set_llm_model("HuggingFaceH4/zephyr-7b-beta")
+        # Build context from retrieved chunks
+        context = "\n\n".join([chunk for chunk, _ in retrieved_chunks])
+        # Create prompt
+        prompt = f"""You are a helpful assistant. Use the following context to answer the question.
+If you cannot answer based on the context, say so.
+Context:
+{context}
+Question: {query}
+Answer:"""
+        # Generate response
+        try:
+            response = self.llm_client.text_generation(
+                prompt,
+                max_new_tokens=max_tokens,
+                temperature=temperature,
+                return_full_text=False
+            )
+            return response, prompt
+        except Exception as e:
+            return f"Error generating response: {str(e)}", prompt

requirements.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+gradio==4.44.0
+torch==2.1.0
+sentence-transformers==2.2.2
+faiss-cpu==1.7.4
+PyPDF2==3.0.1
+huggingface_hub==0.20.0
+spaces==0.29.2
+transformers==4.36.0
+accelerate==0.25.0