Spaces:

mafzaal
/

lets_talk

Running

File size: 8,762 Bytes

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "b1a955e7",
   "metadata": {},
   "source": [
    "# Update Blog Data\n",
    "\n",
    "This notebook demonstrates how to update the blog data and vector store when new blog posts are published. It uses the utility functions from `utils_data_loading.ipynb`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "6ec048b4",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Adding package root to sys.path: /home/mafzaal/source/lets-talk/py-src\n"
     ]
    }
   ],
   "source": [
    "import sys\n",
    "import os\n",
    "from pathlib import Path\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "\n",
    "import sys\n",
    "import os\n",
    "\n",
    "# Add the project root to the Python path\n",
    "package_root = os.path.abspath(os.path.join(os.getcwd(), \"../\"))\n",
    "print(f\"Adding package root to sys.path: {package_root}\")\n",
    "if package_root not in sys.path:\n",
    "\tsys.path.append(package_root)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "7a7a9f3f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Current notebook directory: /home/mafzaal/source/lets-talk/py-src/notebooks\n",
      "Project root: /home/mafzaal/source/lets-talk\n"
     ]
    }
   ],
   "source": [
    "notebook_dir = os.getcwd()\n",
    "print(f\"Current notebook directory: {notebook_dir}\")\n",
    "# change to the directory to the root of the project\n",
    "project_root = os.path.abspath(os.path.join(os.getcwd(), \"../../\"))\n",
    "print(f\"Project root: {project_root}\")\n",
    "os.chdir(project_root)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc19ab4c",
   "metadata": {},
   "source": [
    "## Update Blog Data Process\n",
    "\n",
    "This process will:\n",
    "1. Load existing blog posts\n",
    "2. Process and update metadata\n",
    "3. Create or update vector embeddings"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d56f688",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 14/14 [00:00<00:00, 4617.46it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Loaded 14 documents from data/\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "import lets_talk.utils.blog as  blog_utils\n",
    "docs = blog_utils.load_blog_posts()\n",
    "docs = blog_utils.update_document_metadata(docs)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a14c70dc",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Split 14 documents into 162 chunks\n"
     ]
    }
   ],
   "source": [
    "split_docs = blog_utils.split_documents(docs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "1c40c587",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Document(metadata={'source': 'data/introduction-to-ragas/index.md', 'url': 'https://thedataguy.pro/blog/introduction-to-ragas/', 'post_slug': 'introduction-to-ragas', 'post_title': '\"Part 1: Introduction to Ragas: The Essential Evaluation Framework for LLM Applications\"', 'content_length': 6994}, page_content='---\\ntitle: \"Part 1: Introduction to Ragas: The Essential Evaluation Framework for LLM Applications\"\\ndate: 2025-04-26T18:00:00-06:00\\nlayout: blog\\ndescription: \"Explore the essential evaluation framework for LLM applications with Ragas. Learn how to assess performance, ensure accuracy, and improve reliability in Retrieval-Augmented Generation systems.\"\\ncategories: [\"AI\", \"RAG\", \"Evaluation\",\"Ragas\"]\\ncoverImage: \"https://images.unsplash.com/photo-1593642634367-d91a135587b5?q=80&w=1770&auto=format&fit=crop&ixlib=rb-4.0.3\"\\nreadingTime: 7\\npublished: true\\n---\\n\\nAs Large Language Models (LLMs) become fundamental components of modern applications, effectively evaluating their performance becomes increasingly critical. Whether you\\'re building a question-answering system, a document retrieval tool, or a conversational agent, you need reliable metrics to assess how well your application performs. This is where Ragas steps in.\\n\\n## What is Ragas?')"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "split_docs[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "72dd14b5",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store = blog_utils = blog_utils.create_vector_store(split_docs,'./db/vector_store_5')"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ad3b2dca",
   "metadata": {},
   "source": [
    "## Testing the Vector Store\n",
    "\n",
    "Let's test the vector store with a few queries to make sure it's working correctly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "8b552e6b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Query: What is RAGAS?\n",
      "Retrieved 3 documents:\n",
      "1. \"Part 3: Evaluating RAG Systems with Ragas\" (https://thedataguy.pro/blog/evaluating-rag-systems-with-ragas/)\n",
      "2. \"Part 1: Introduction to Ragas: The Essential Evaluation Framework for LLM Applications\" (https://thedataguy.pro/blog/introduction-to-ragas/)\n",
      "3. \"Part 4: Generating Test Data with Ragas\" (https://thedataguy.pro/blog/generating-test-data-with-ragas/)\n",
      "\n",
      "Query: How to build research agents?\n",
      "Retrieved 3 documents:\n",
      "1. Building a Research Agent with RSS Feed Support (https://thedataguy.pro/blog/building-research-agent/)\n",
      "2. \"Part 1: Introduction to Ragas: The Essential Evaluation Framework for LLM Applications\" (https://thedataguy.pro/blog/introduction-to-ragas/)\n",
      "3. Building a Research Agent with RSS Feed Support (https://thedataguy.pro/blog/building-research-agent/)\n",
      "\n",
      "Query: What is metric driven development?\n",
      "Retrieved 3 documents:\n",
      "1. \"Metric-Driven Development: Make Smarter Decisions, Faster\" (https://thedataguy.pro/blog/metric-driven-development/)\n",
      "2. \"Metric-Driven Development: Make Smarter Decisions, Faster\" (https://thedataguy.pro/blog/metric-driven-development/)\n",
      "3. \"Part 5: Advanced Metrics and Customization with Ragas\" (https://thedataguy.pro/blog/advanced-metrics-and-customization-with-ragas/)\n",
      "\n",
      "Query: Who is TheDataGuy?\n",
      "Retrieved 3 documents:\n",
      "1. \"Part 2: Basic Evaluation Workflow with Ragas\" (https://thedataguy.pro/blog/basic-evaluation-workflow-with-ragas/)\n",
      "2. \"Part 2: Basic Evaluation Workflow with Ragas\" (https://thedataguy.pro/blog/basic-evaluation-workflow-with-ragas/)\n",
      "3. \"Part 6: Evaluating AI Agents: Beyond Simple Answers with Ragas\" (https://thedataguy.pro/blog/evaluating-ai-agents-with-ragas/)\n"
     ]
    }
   ],
   "source": [
    "# Create a retriever from the vector store\n",
    "retriever = vector_store.as_retriever(search_kwargs={\"k\": 3})\n",
    "\n",
    "# Test queries\n",
    "test_queries = [\n",
    "    \"What is RAGAS?\",\n",
    "    \"How to build research agents?\",\n",
    "    \"What is metric driven development?\",\n",
    "    \"Who is TheDataGuy?\"\n",
    "]\n",
    "\n",
    "for query in test_queries:\n",
    "    print(f\"\\nQuery: {query}\")\n",
    "    docs = retriever.invoke(query)\n",
    "    print(f\"Retrieved {len(docs)} documents:\")\n",
    "    for i, doc in enumerate(docs):\n",
    "        title = doc.metadata.get(\"post_title\", \"Unknown\")\n",
    "        url = doc.metadata.get(\"url\", \"No URL\")\n",
    "        print(f\"{i+1}. {title} ({url})\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "4cdd6899",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store.client.close()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}