Shrideep commited on
Commit
55fdf94
1 Parent(s): 70d84c3

Upload RAG.ipynb

Browse files
Files changed (1) hide show
  1. RAG.ipynb +226 -0
RAG.ipynb ADDED
@@ -0,0 +1,226 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "id": "6c5cc776-c427-4bab-9f61-d0d1bc71cbed",
6
+ "metadata": {},
7
+ "source": [
8
+ "## Integrating ChatGPT"
9
+ ]
10
+ },
11
+ {
12
+ "cell_type": "code",
13
+ "execution_count": null,
14
+ "id": "34e1b7bd-2443-4ae6-816a-002edba31897",
15
+ "metadata": {
16
+ "scrolled": true
17
+ },
18
+ "outputs": [],
19
+ "source": [
20
+ "pip install openai"
21
+ ]
22
+ },
23
+ {
24
+ "cell_type": "code",
25
+ "execution_count": 9,
26
+ "id": "b23d0c00-fe8c-4860-ab4b-0d08eb4da62f",
27
+ "metadata": {},
28
+ "outputs": [
29
+ {
30
+ "name": "stdin",
31
+ "output_type": "stream",
32
+ "text": [
33
+ "What type of chatbot would you like to create?\n",
34
+ " informational\n"
35
+ ]
36
+ },
37
+ {
38
+ "name": "stdout",
39
+ "output_type": "stream",
40
+ "text": [
41
+ "Your new assistant is ready! Type 'quit()' to end the session.\n"
42
+ ]
43
+ },
44
+ {
45
+ "name": "stdin",
46
+ "output_type": "stream",
47
+ "text": [
48
+ "You: when was stable diffusion created?\n"
49
+ ]
50
+ },
51
+ {
52
+ "name": "stdout",
53
+ "output_type": "stream",
54
+ "text": [
55
+ "\n",
56
+ "Assistant: Stable diffusion characterizes a phenomenon, rather than being a specific technology or invention with a single creation date. Diffusion refers to the spread of innovations, ideas, or technologies from one group or society to another. The concept of diffusion can be traced back to the work of anthropologist Franz Boas in the late 19th and early 20th centuries. Boas examined the diffusion of cultural traits and practices among indigenous groups in North America. Since then, the study of diffusion has been further developed by various social scientists and has become a fundamental concept in fields such as anthropology, sociology, and innovation studies. So, stable diffusion can be considered as an ongoing process rather than something that was created at a specific point in time.\n",
57
+ "\n"
58
+ ]
59
+ },
60
+ {
61
+ "name": "stdin",
62
+ "output_type": "stream",
63
+ "text": [
64
+ "You: quit()\n"
65
+ ]
66
+ }
67
+ ],
68
+ "source": [
69
+ "from openai import OpenAI\n",
70
+ "import os\n",
71
+ "import pandas as pd\n",
72
+ "import time\n",
73
+ "\n",
74
+ "client = OpenAI(api_key = 'your-api-key')\n",
75
+ "\n",
76
+ "messages = []\n",
77
+ "system_msg = input(\"What type of chatbot would you like to create?\\n\")\n",
78
+ "messages.append({\"role\": \"system\", \"content\": system_msg})\n",
79
+ "\n",
80
+ "print(\"Your new assistant is ready! Type 'quit()' to end the session.\")\n",
81
+ "\n",
82
+ "# Loop for chat interaction\n",
83
+ "while True:\n",
84
+ " user_message = input(\"You: \")\n",
85
+ " if user_message == \"quit()\":\n",
86
+ " break\n",
87
+ "\n",
88
+ " messages.append({\"role\": \"user\", \"content\": user_message})\n",
89
+ " \n",
90
+ " try:\n",
91
+ " response = client.chat.completions.create(\n",
92
+ " model=\"gpt-3.5-turbo\",\n",
93
+ " messages=messages\n",
94
+ " )\n",
95
+ " reply = response.choices[0].message.content\n",
96
+ " messages.append({\"role\": \"assistant\", \"content\": reply})\n",
97
+ " print(\"\\nAssistant: \" + reply + \"\\n\")\n",
98
+ " except Exception as e:\n",
99
+ " print(\"An error occurred: \", e)"
100
+ ]
101
+ },
102
+ {
103
+ "cell_type": "markdown",
104
+ "id": "ea044a1e-f1b6-4be8-bf2a-51bcb0c8dd18",
105
+ "metadata": {},
106
+ "source": [
107
+ "### ChatGPT fails to answer questions out of its scope\n",
108
+ "Above I integrated ChatGPT-3.5 to the python notebook and asked it a question about Stable diffusion.\n",
109
+ "Since, GPT-3.5 has data upto 2021 and stable diffusion was released in 2022, the answer output by GPT-3.5 was hallucinating in nature.\n",
110
+ "\n",
111
+ "To solve this problem we can feed ChatGPT with the relevant information after collecting the relevant information."
112
+ ]
113
+ },
114
+ {
115
+ "cell_type": "markdown",
116
+ "id": "f640b069-6840-46fa-85f0-b68b4772ae20",
117
+ "metadata": {},
118
+ "source": [
119
+ "# Implementing RAG using llama-index"
120
+ ]
121
+ },
122
+ {
123
+ "cell_type": "raw",
124
+ "id": "73491c65-c230-4477-a637-88b2d7e63d1a",
125
+ "metadata": {},
126
+ "source": [
127
+ "I scraped data from wikipedia related to Stable Diffusion and loaded that data."
128
+ ]
129
+ },
130
+ {
131
+ "cell_type": "code",
132
+ "execution_count": 10,
133
+ "id": "e049d234-b3dd-402c-a205-b7cdb0e47dce",
134
+ "metadata": {},
135
+ "outputs": [],
136
+ "source": [
137
+ "# pip install llama-index"
138
+ ]
139
+ },
140
+ {
141
+ "cell_type": "code",
142
+ "execution_count": 7,
143
+ "id": "c1e4b525-e110-4889-b76a-79ee40200d15",
144
+ "metadata": {},
145
+ "outputs": [],
146
+ "source": [
147
+ "import os\n",
148
+ "from llama_index import VectorStoreIndex, SimpleDirectoryReader\n",
149
+ "OPENAI_API_KEY = 'your-api-key'\n",
150
+ "\n",
151
+ "# Set the OpenAI API key\n",
152
+ "os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY\n",
153
+ "\n",
154
+ "documents = SimpleDirectoryReader(r\"C:\\Users\\user\\OneDrive\\Desktop\\Data\\stable_diffusion\").load_data()\n",
155
+ "index = VectorStoreIndex.from_documents(documents)"
156
+ ]
157
+ },
158
+ {
159
+ "cell_type": "raw",
160
+ "id": "f2bf6e3d-8f1a-4625-a8f2-8f4217172303",
161
+ "metadata": {},
162
+ "source": [
163
+ "In the above code cell, the loaded data was being vectorized so that it can be made available to ChatGPT in readable format."
164
+ ]
165
+ },
166
+ {
167
+ "cell_type": "markdown",
168
+ "id": "f0b5554a-5e3a-468f-8b27-72bebedb75f7",
169
+ "metadata": {},
170
+ "source": [
171
+ "### Final Step\n",
172
+ "Indexing the relevant data that matches the context of the query. "
173
+ ]
174
+ },
175
+ {
176
+ "cell_type": "code",
177
+ "execution_count": 11,
178
+ "id": "63362b5f-eb86-47c8-99fa-0bfbf2cf3cd6",
179
+ "metadata": {},
180
+ "outputs": [
181
+ {
182
+ "name": "stdout",
183
+ "output_type": "stream",
184
+ "text": [
185
+ "Stable Diffusion was released in 2022.\n"
186
+ ]
187
+ }
188
+ ],
189
+ "source": [
190
+ "query_engine = index.as_query_engine()\n",
191
+ "response = query_engine.query(\"When was stable diffusion released?\")\n",
192
+ "print(response)"
193
+ ]
194
+ },
195
+ {
196
+ "cell_type": "markdown",
197
+ "id": "9bd473b2-c6cc-4de7-a9d7-360215a689a5",
198
+ "metadata": {},
199
+ "source": [
200
+ "In this case, Indexing whatever ChatGPT can find about stable diffusion from data it was fed.\n",
201
+ "From the response you can see, ChatGPT is giving a much better answer than before for the SAME question."
202
+ ]
203
+ }
204
+ ],
205
+ "metadata": {
206
+ "kernelspec": {
207
+ "display_name": "Python 3 (ipykernel)",
208
+ "language": "python",
209
+ "name": "python3"
210
+ },
211
+ "language_info": {
212
+ "codemirror_mode": {
213
+ "name": "ipython",
214
+ "version": 3
215
+ },
216
+ "file_extension": ".py",
217
+ "mimetype": "text/x-python",
218
+ "name": "python",
219
+ "nbconvert_exporter": "python",
220
+ "pygments_lexer": "ipython3",
221
+ "version": "3.11.3"
222
+ }
223
+ },
224
+ "nbformat": 4,
225
+ "nbformat_minor": 5
226
+ }