Ilayda-j commited on
Commit
28e2c84
1 Parent(s): ea12579

Upload 15 files

Browse files
README.md CHANGED
@@ -1,12 +1,81 @@
1
- ---
2
- title: Gradio App
3
- emoji: 📊
4
- colorFrom: pink
5
- colorTo: blue
6
- sdk: gradio
7
- sdk_version: 3.44.3
8
- app_file: app.py
9
- pinned: false
10
- ---
11
-
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Classroom Learning and Assessment Suite (CLAS)
2
+
3
+ The Classroom Learning and Assessment Suite is a set of AI-driven solutions that complement in-class learning with interactive study and assessment. Primarily ideated by faculty interested in leveraging AI platforms to enhance student learning, the suite creates new mechanisms for both students and faculty members in both supporting high quality learning assessments and streamlining grading of student submissions using generative AI-based platforms.
4
+
5
+ ## Features
6
+ CLAS is suitable for users of diverse programming backgrounds - from no programming background to seasoned programmers. The Suite provides:
7
+
8
+ ### **Prompting Guides**
9
+
10
+ * **[A curated prompt dictionary for self-study](https://github.com/vanderbilt-data-science/lo-achievement/wiki/Guide-to-Learning-Objective-Prompts)**: A set of prompts that have been engineering and evaluated on OpenAI GPT 3.5 and GPT4 to provide numerous types of assessments for students engaging in self-study with instructor provided materials
11
+ * **[A curated prompt dictionary for student assessment](https://github.com/vanderbilt-data-science/lo-achievement/wiki/Bloom's-Taxonomy-Rubric-Prompts)**: Prompts targeted primarily towards instructors who want a generalized rubric for evaluating student submissions
12
+
13
+ ### **Interface-Driven Programs**
14
+ No programming experience? No problem.
15
+
16
+ * **[Student self-study using instructor assigned resources](https://huggingface.co/spaces/vanderbilt-dsi/selfstudy_learningobjectives_demo)**: A point-and-click interface with OpenAI generative AI. Upload your contextual coursework (book chapters, etc), and chat with your own customized tutor!
17
+ * **[Hosted self- and in-class oral exam app]()**: Interested in using oral exams for assessment of student knowledge? Students - want to prepare for oral exams? Visit our point-and-click interface with OpenAI generative AI using your own coursework. You can upload questions provided by the instructor or ask the generative AI to assist in creating questions for you.
18
+
19
+ ### **Google Colab Notebooks**
20
+ Want to evaluate students or want to customize your approach for interacting with generative AI?
21
+
22
+ * **[Instructor Grading Notebook](https://github.com/vanderbilt-data-science/lo-achievement/blob/main/instructor_intr_notebook.ipynb)**: Upload a zip file of the JSON output of your students' exploration and assessment, and get insight on student strengths and weaknesses on the topic as well as structured feedback for all students.
23
+ * **[Instructor Document Store Creation](https://github.com/vanderbilt-data-science/lo-achievement/blob/main/instructor_vector_store_creator.ipynb)**: Store all of your classroom content (pdfs, Youtube videos, website links) in a hosted location for easy accessibility for students working with generative AI platforms.
24
+ * **[Prompting with Inline Context](https://github.com/vanderbilt-data-science/lo-achievement/blob/main/prompt_with_context.ipynb)**: A customizable, programmatic way to interface with generative AI using Google Colab through direct copy/paste of text content
25
+ * **[Prompting with Vector Stores](https://github.com/vanderbilt-data-science/lo-achievement/blob/main/prompt_with_vector_store.ipynb)**: A customizable, programming way for self-study with generative AI using Google Colab through the creation of vector stores (better for larger corpuses of text)
26
+
27
+ **Use the repo!**
28
+ You can also directly clone/use the repo itself or use it as a package for development - this is great for experienced programmers or even those who would like to learn more about development with generative AI. The repo provides:
29
+ - Generative AI and LangChain integration to process sources and create assessments and answer keys
30
+ - Runs on Google Colab, with no additional installations needed
31
+ You can also develop locally as desired and we encourage PR contributions from the community!
32
+
33
+ ## Getting Started
34
+
35
+ There are a variety of ways to use CLAS:
36
+ * **Navigate to the Wiki to explore prompts.** You can copy/paste/amend these in the interfaces provided by OpenAI, Google, Anthropic, etc.
37
+ * **Use Google Colab to interact with notebooks.** Click on the notebook you'd like to enter in the files list above. You will see a blue Open In Colab link in the page that opens. Click this button to start your session in Google Colab, making sure that you're logged in with your Google Account.** It will take a few minutes to spin up and automatically install the required packages.
38
+ * **Study through our hosted app.** Navigate to [CLAS on Huggingface](https://huggingface.co/spaces/vanderbilt-dsi/selfstudy_learningobjectives_demo). Follow the instructions to use the platform for self study.
39
+
40
+ ## Contributing
41
+
42
+ To contribute to the project, please fork the repository and submit a pull request. Our community is supportive, and we provide training and classes if you're new to any of the frameworks used in the project. Everyone is welcome to contribute, as we believe participating in data science and AI projects is an excellent way to learn.
43
+
44
+ ## Community Guidelines
45
+
46
+ We aim to create a welcoming and inclusive community where everyone can feel comfortable and valued, regardless of skill level, background, ability, or identity. To ensure a positive atmosphere, please adhere to our code of conduct and community guidelines.
47
+
48
+ ## Meetings
49
+
50
+ - Sprint Planning & Retrospective: Mondays and Fridays at 10:30 am
51
+ - Demos: Fridays at 3 pm
52
+
53
+ ## Additional Resources
54
+
55
+ - LangChain documentation
56
+ - Introduction to transformers and generative AI on our [YouTube channel](https://www.youtube.com/channel/UC8C2_3L5gR9qLmL7rmb2BdQ)
57
+ - AI Summer and AI Winter sessions (free and open to all)
58
+
59
+ ## Reporting Issues
60
+
61
+ If you encounter a bug, please submit an issue and label it with "Bug." To escalate the issue, email [datascience@vanderbilt.edu](mailto:datascience@vanderbilt.edu).
62
+
63
+ ## Contact Information
64
+
65
+ - Organization: Data Science Institute at Vanderbilt University
66
+ - Program: Data Science for Social Good
67
+ - Main Email: [datascience@vanderbilt.edu](mailto:datascience@vanderbilt.edu)
68
+ - Principal Investigators (PIs)
69
+ - Jesse Spencer-Smith, Ph.D., Chief Data Scientist, Data Science Institute, Vanderbilt University
70
+ - Jesse Blocher, Ph.D., Director of Graduate Studies, Data Science Institute, Vanderbilt University
71
+ - Dr. Yaa Kumah-Crystal, Ph.D., Pediatric Endocrinologist and Professor, Vanderbilt University Medical Center
72
+ - Charreau Bell, Ph.D., Senior Data Scientist, Data Science Institute, Vanderbilt University
73
+ - Staff Lead: [charreau.s.bell@vanderbilt.edu](mailto:charreau.s.bell@vanderbilt.edu)
74
+ - Code Developers:
75
+ - Katrina Rbeiz, Ph.D. Student, Psychology, Vanderbilt University
76
+ - Minwoo Sohn, Graduate Student, Data Science, Vanderbilt University
77
+ - Ricky Sun, Graduate Student, Data Science, Vanderbilt University
78
+ - Eleanor Beers, Graduate Student, Data Science, Vanderbilt University
79
+ - Kevin Chen, Undergraduate, Computer Science, Vanderbilt University
80
+ - Adam Levav, Undergraduate, University of Maryland
81
+ - Varun Koduvayur, Undergraduate
UI_design_oral_exam_baseline_functionality.ipynb ADDED
@@ -0,0 +1,1454 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "nbformat": 4,
3
+ "nbformat_minor": 0,
4
+ "metadata": {
5
+ "colab": {
6
+ "provenance": [],
7
+ "gpuType": "V100",
8
+ "include_colab_link": true
9
+ },
10
+ "kernelspec": {
11
+ "name": "python3",
12
+ "display_name": "Python 3"
13
+ },
14
+ "language_info": {
15
+ "name": "python"
16
+ }
17
+ },
18
+ "cells": [
19
+ {
20
+ "cell_type": "markdown",
21
+ "metadata": {
22
+ "id": "view-in-github",
23
+ "colab_type": "text"
24
+ },
25
+ "source": [
26
+ "<a href=\"https://colab.research.google.com/github/vanderbilt-data-science/lo-achievement/blob/124-implement-baseline-functionality-for-oral-exam-module/UI_design_oral_exam_baseline_functionality.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
27
+ ]
28
+ },
29
+ {
30
+ "cell_type": "markdown",
31
+ "source": [
32
+ "# Project IO Achievement - UI Design (Oral Exam)"
33
+ ],
34
+ "metadata": {
35
+ "id": "PIbogPXyM0wr"
36
+ }
37
+ },
38
+ {
39
+ "cell_type": "markdown",
40
+ "source": [
41
+ "## Problem Definition\n",
42
+ "\n",
43
+ "The v1 functionality for the Oral Exam module requires the following:\n",
44
+ "\n",
45
+ "1. Upload or generation of questions: either the user should upload a set of questions or we should allow the model to generate the questions. The user should pick or it should be inherent if there is no upload of questions. Note that we must also allow for context to be uploaded (vector store, vector store link, specific documents)\n",
46
+ "2. The model should prompt the user with a question and pause.\n",
47
+ "The user should respond by audio.\n",
48
+ "3. This should continue on until some final point where the exam is over.\n",
49
+ "\n",
50
+ "Then:\n",
51
+ "\n",
52
+ "1. We should use Whisper to do the transcription, and\n",
53
+ "2. Send the transcription, questions, and context for GPT4 for evaluation\n",
54
+ "Return the evaluation.\n",
55
+ "3. This will primarily be work on a user interface."
56
+ ],
57
+ "metadata": {
58
+ "id": "x_Vp8SiKM4p1"
59
+ }
60
+ },
61
+ {
62
+ "cell_type": "markdown",
63
+ "source": [
64
+ "## Libraries\n",
65
+ "\n",
66
+ "This section will install and import some important libraries such as Langchain, openai, Gradio, and so on"
67
+ ],
68
+ "metadata": {
69
+ "id": "o_60X8H3NEne"
70
+ }
71
+ },
72
+ {
73
+ "cell_type": "code",
74
+ "source": [
75
+ "# install libraries here\n",
76
+ "# -q flag for \"quiet\" install\n",
77
+ "%%capture\n",
78
+ "!pip install -q langchain\n",
79
+ "!pip install -q openai\n",
80
+ "!pip install -q gradio\n",
81
+ "!pip install -q transformers\n",
82
+ "!pip install -q datasets\n",
83
+ "!pip install -q huggingsound\n",
84
+ "!pip install -q torchaudio\n",
85
+ "!pip install -q git+https://github.com/openai/whisper.git\n",
86
+ "!pip install -q docx\n",
87
+ "!pip install -q PyPDF2\n",
88
+ "!pip install python-docx"
89
+ ],
90
+ "metadata": {
91
+ "id": "pxcqXgg2aAN7"
92
+ },
93
+ "execution_count": 1,
94
+ "outputs": []
95
+ },
96
+ {
97
+ "cell_type": "code",
98
+ "execution_count": 2,
99
+ "metadata": {
100
+ "id": "pEjM1tLsMZBq"
101
+ },
102
+ "outputs": [],
103
+ "source": [
104
+ "# import libraries here\n",
105
+ "from langchain.llms import OpenAI\n",
106
+ "from langchain.prompts import PromptTemplate\n",
107
+ "from langchain.document_loaders import TextLoader\n",
108
+ "from langchain.indexes import VectorstoreIndexCreator\n",
109
+ "from langchain import ConversationChain, LLMChain, PromptTemplate\n",
110
+ "from langchain.chat_models import ChatOpenAI\n",
111
+ "from langchain.memory import ConversationBufferWindowMemory\n",
112
+ "from langchain.prompts import ChatPromptTemplate\n",
113
+ "from langchain.text_splitter import CharacterTextSplitter\n",
114
+ "from langchain.embeddings import OpenAIEmbeddings\n",
115
+ "from langchain.schema import SystemMessage, HumanMessage, AIMessage\n",
116
+ "import openai\n",
117
+ "import os\n",
118
+ "from getpass import getpass\n",
119
+ "from IPython.display import display, Javascript, HTML\n",
120
+ "from google.colab.output import eval_js\n",
121
+ "from base64 import b64decode\n",
122
+ "import ipywidgets as widgets\n",
123
+ "from IPython.display import clear_output\n",
124
+ "import time\n",
125
+ "import requests\n",
126
+ "from transformers import WhisperProcessor, WhisperForConditionalGeneration\n",
127
+ "from datasets import load_dataset\n",
128
+ "# from torchaudio.transforms import Resample\n",
129
+ "import whisper\n",
130
+ "from huggingsound import SpeechRecognitionModel\n",
131
+ "import numpy as np\n",
132
+ "import torch\n",
133
+ "import librosa\n",
134
+ "from datasets import load_dataset\n",
135
+ "from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor\n",
136
+ "from jiwer import wer\n",
137
+ "import pandas as pd\n",
138
+ "from IPython.display import display, HTML\n",
139
+ "import gradio as gr\n",
140
+ "from transformers import pipeline\n",
141
+ "from docx import Document\n",
142
+ "import PyPDF2\n",
143
+ "from pydub import AudioSegment\n",
144
+ "import tempfile\n",
145
+ "import os\n"
146
+ ]
147
+ },
148
+ {
149
+ "cell_type": "markdown",
150
+ "source": [
151
+ "## API Keys\n",
152
+ "\n",
153
+ "Use these cells to load the API keys required for this notebook. The below code cell uses the `getpass` library."
154
+ ],
155
+ "metadata": {
156
+ "id": "03KLZGI_a5W5"
157
+ }
158
+ },
159
+ {
160
+ "cell_type": "code",
161
+ "source": [
162
+ "openai_api_key = getpass()\n",
163
+ "os.environ[\"OPENAI_API_KEY\"] = openai_api_key\n",
164
+ "openai.api_key = openai_api_key"
165
+ ],
166
+ "metadata": {
167
+ "id": "5smcWj4DbFgy",
168
+ "outputId": "9a73707b-1a6a-4253-b7d8-181a82b1040f",
169
+ "colab": {
170
+ "base_uri": "https://localhost:8080/"
171
+ }
172
+ },
173
+ "execution_count": 3,
174
+ "outputs": [
175
+ {
176
+ "name": "stdout",
177
+ "output_type": "stream",
178
+ "text": [
179
+ "··········\n"
180
+ ]
181
+ }
182
+ ]
183
+ },
184
+ {
185
+ "cell_type": "markdown",
186
+ "source": [
187
+ "## Prompt Design\n",
188
+ "\n",
189
+ "To be added"
190
+ ],
191
+ "metadata": {
192
+ "id": "pMo9x8u4AEV1"
193
+ }
194
+ },
195
+ {
196
+ "cell_type": "code",
197
+ "source": [
198
+ "chat = ChatOpenAI(temperature=0.0, model_name='gpt-3.5-turbo-16k')\n",
199
+ "chat"
200
+ ],
201
+ "metadata": {
202
+ "colab": {
203
+ "base_uri": "https://localhost:8080/"
204
+ },
205
+ "id": "UgnCZRMhADvo",
206
+ "outputId": "1bd6b84d-3ea8-49ba-8156-701f4155d69c"
207
+ },
208
+ "execution_count": 4,
209
+ "outputs": [
210
+ {
211
+ "output_type": "execute_result",
212
+ "data": {
213
+ "text/plain": [
214
+ "ChatOpenAI(cache=None, verbose=False, callbacks=None, callback_manager=None, tags=None, metadata=None, client=<class 'openai.api_resources.chat_completion.ChatCompletion'>, model_name='gpt-3.5-turbo-16k', temperature=0.0, model_kwargs={}, openai_api_key='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', openai_api_base='', openai_organization='', openai_proxy='', request_timeout=None, max_retries=6, streaming=False, n=1, max_tokens=None, tiktoken_model_name=None)"
215
+ ]
216
+ },
217
+ "metadata": {},
218
+ "execution_count": 4
219
+ }
220
+ ]
221
+ },
222
+ {
223
+ "cell_type": "code",
224
+ "source": [
225
+ "# This is what I used to test the function 'generate_questions'\n",
226
+ "template_string2 = \"\"\"\n",
227
+ "You are teacher, and you will be given a {context} that is related to the presentation topic.\n",
228
+ "\n",
229
+ "Please generate a questions based on the context above and transcript that student created . \\\n",
230
+ "\n",
231
+ "The audio file generated by student is shown below: {transcribed_text}. \\\n",
232
+ "\"\"\""
233
+ ],
234
+ "metadata": {
235
+ "id": "WmysQZAhKBli"
236
+ },
237
+ "execution_count": null,
238
+ "outputs": []
239
+ },
240
+ {
241
+ "cell_type": "code",
242
+ "source": [
243
+ "prompt_template1 = ChatPromptTemplate.from_template(template_string2)"
244
+ ],
245
+ "metadata": {
246
+ "id": "oij6W5rwAaGb"
247
+ },
248
+ "execution_count": null,
249
+ "outputs": []
250
+ },
251
+ {
252
+ "cell_type": "code",
253
+ "source": [
254
+ "# prompt_template.messages[0].prompt\n",
255
+ "prompt_template1.messages[0].prompt.input_variables"
256
+ ],
257
+ "metadata": {
258
+ "colab": {
259
+ "base_uri": "https://localhost:8080/"
260
+ },
261
+ "id": "C1YRmL46AaJA",
262
+ "outputId": "e850524a-3831-4113-c796-5a0ec8584569"
263
+ },
264
+ "execution_count": null,
265
+ "outputs": [
266
+ {
267
+ "output_type": "execute_result",
268
+ "data": {
269
+ "text/plain": [
270
+ "['context', 'transcribed_text']"
271
+ ]
272
+ },
273
+ "metadata": {},
274
+ "execution_count": 25
275
+ }
276
+ ]
277
+ },
278
+ {
279
+ "cell_type": "code",
280
+ "source": [
281
+ "# This template is used for testing the function 'ai_evaluate'\n",
282
+ "# Detailed evaluation metrics are to be added\n",
283
+ "template_string3 = \"\"\"\n",
284
+ "You are teacher, and you will be given a context that is related to the presentation topic. \\\n",
285
+ "Now, given {context}, evaluate the answer based on the accuracy\n",
286
+ "\n",
287
+ "The main answer generated by student is shown below: {transcribed_text}. \\\n",
288
+ "The questions are shown below: {questions}. \\\n",
289
+ "The questions answered by student is shown below: {transcribed_qa}. \\\n",
290
+ "\"\"\"\n",
291
+ "prompt_template2 = ChatPromptTemplate.from_template(template_string3)\n",
292
+ "prompt_template2.messages[0].prompt.input_variables\n"
293
+ ],
294
+ "metadata": {
295
+ "id": "141Cxa2MT-l7",
296
+ "colab": {
297
+ "base_uri": "https://localhost:8080/"
298
+ },
299
+ "outputId": "0374d45d-a7f6-41e7-aed0-44c61681de21"
300
+ },
301
+ "execution_count": null,
302
+ "outputs": [
303
+ {
304
+ "output_type": "execute_result",
305
+ "data": {
306
+ "text/plain": [
307
+ "['context', 'questions', 'transcribed_qa', 'transcribed_text']"
308
+ ]
309
+ },
310
+ "metadata": {},
311
+ "execution_count": 7
312
+ }
313
+ ]
314
+ },
315
+ {
316
+ "cell_type": "markdown",
317
+ "source": [
318
+ "## Integrate Prompts from LO project"
319
+ ],
320
+ "metadata": {
321
+ "id": "MJCHl1T2TPWC"
322
+ }
323
+ },
324
+ {
325
+ "cell_type": "markdown",
326
+ "source": [
327
+ "### Creating a Chain for Short Answer Generation"
328
+ ],
329
+ "metadata": {
330
+ "id": "IPTyUOl-WdiL"
331
+ }
332
+ },
333
+ {
334
+ "cell_type": "markdown",
335
+ "source": [
336
+ "In this example, the context would include the poem \"The Road Not Taken\" by Robert Frost"
337
+ ],
338
+ "metadata": {
339
+ "id": "203qBjZvmFK1"
340
+ }
341
+ },
342
+ {
343
+ "cell_type": "code",
344
+ "source": [
345
+ "# This is what I used to test the function 'generate_questions_v2'\n",
346
+ "template_string = \"\"\"\n",
347
+ "You are a world-class tutor helping students to perform better on oral and written exams though interactive experiences.\"\n",
348
+ "\n",
349
+ "The following text should be used as the basis for the instructions which follow: {context} \\\n",
350
+ "\n",
351
+ "The following is the guideline for generating the questiion: {pre_prompt}\n",
352
+ "\"\"\""
353
+ ],
354
+ "metadata": {
355
+ "id": "w1AjHwIoVnvw"
356
+ },
357
+ "execution_count": 5,
358
+ "outputs": []
359
+ },
360
+ {
361
+ "cell_type": "code",
362
+ "source": [
363
+ "prompt_template = ChatPromptTemplate.from_template(template_string)\n",
364
+ "prompt_template.messages[0].prompt.input_variables"
365
+ ],
366
+ "metadata": {
367
+ "colab": {
368
+ "base_uri": "https://localhost:8080/"
369
+ },
370
+ "id": "39-lm5I-Wlep",
371
+ "outputId": "44b39930-5258-484b-8c7d-c36ff4b5dc1a"
372
+ },
373
+ "execution_count": 6,
374
+ "outputs": [
375
+ {
376
+ "output_type": "execute_result",
377
+ "data": {
378
+ "text/plain": [
379
+ "['context', 'pre_prompt']"
380
+ ]
381
+ },
382
+ "metadata": {},
383
+ "execution_count": 6
384
+ }
385
+ ]
386
+ },
387
+ {
388
+ "cell_type": "markdown",
389
+ "source": [
390
+ "### Creating a Chain for AI Evaluation"
391
+ ],
392
+ "metadata": {
393
+ "id": "-Pfxkcdxh9nZ"
394
+ }
395
+ },
396
+ {
397
+ "cell_type": "code",
398
+ "source": [
399
+ "template_evaluation = \"\"\"\n",
400
+ "Given the follwing {context} and the {transcript}, evaluate whether or not the student answered correctly on the {question}.\n",
401
+ "\"\"\""
402
+ ],
403
+ "metadata": {
404
+ "id": "u6pH1x-gWnFF"
405
+ },
406
+ "execution_count": 7,
407
+ "outputs": []
408
+ },
409
+ {
410
+ "cell_type": "code",
411
+ "source": [
412
+ "# @title\n",
413
+ "prompt_template2 = ChatPromptTemplate.from_template(template_evaluation)\n",
414
+ "prompt_template2.messages[0].prompt.input_variables"
415
+ ],
416
+ "metadata": {
417
+ "colab": {
418
+ "base_uri": "https://localhost:8080/"
419
+ },
420
+ "id": "YPO_IE5ThC6W",
421
+ "outputId": "5361929c-cf8c-483d-901a-ed14a0db89fa"
422
+ },
423
+ "execution_count": 8,
424
+ "outputs": [
425
+ {
426
+ "output_type": "execute_result",
427
+ "data": {
428
+ "text/plain": [
429
+ "['context', 'question', 'transcript']"
430
+ ]
431
+ },
432
+ "metadata": {},
433
+ "execution_count": 8
434
+ }
435
+ ]
436
+ },
437
+ {
438
+ "cell_type": "markdown",
439
+ "source": [
440
+ "## UI Design\n",
441
+ "\n",
442
+ "https://colab.research.google.com/github/petewarden/openai-whisper-webapp/blob/main/OpenAI_Whisper_ASR_Demo.ipynb"
443
+ ],
444
+ "metadata": {
445
+ "id": "M6IzVTjz5cex"
446
+ }
447
+ },
448
+ {
449
+ "cell_type": "markdown",
450
+ "source": [
451
+ "### Functions"
452
+ ],
453
+ "metadata": {
454
+ "id": "l4o8R5eUE1n8"
455
+ }
456
+ },
457
+ {
458
+ "cell_type": "code",
459
+ "source": [
460
+ "def embed_key(openai_api_key):\n",
461
+ " os.environ[\"OPENAI_API_KEY\"] = openai_api_key\n",
462
+ "\n",
463
+ "def transcribe(audio_file_path):\n",
464
+ " with open(audio_file_path, \"rb\") as audio_file:\n",
465
+ " # Call OpenAI's Whisper model for transcription\n",
466
+ " transcript = openai.Audio.transcribe(\"whisper-1\", audio_file)\n",
467
+ " transcribed_text = transcript[\"text\"]\n",
468
+ " return transcribed_text\n",
469
+ "\n",
470
+ "def translate(text):\n",
471
+ " # Create a prompt template (This will be changed later to fit the actual task)\n",
472
+ " # Here translation will be a filler task of GPT\n",
473
+ " test_input1 = prompt_template.format_messages(\n",
474
+ " expertise='Language Translation',\n",
475
+ " language='Japanese',\n",
476
+ " style='romantic',\n",
477
+ " transcribed_text=text)\n",
478
+ "\n",
479
+ " response = chat.predict_messages(test_input1)\n",
480
+ " return response.content\n",
481
+ "\n",
482
+ "def process_file(files):\n",
483
+ " for file in files:\n",
484
+ " try:\n",
485
+ " extension = file.name.split('.')[-1].lower()\n",
486
+ " if extension == 'docx':\n",
487
+ " doc = Document(file.name)\n",
488
+ " full_text = []\n",
489
+ " for paragraph in doc.paragraphs:\n",
490
+ " full_text.append(paragraph.text)\n",
491
+ " return '\\n'.join(full_text)\n",
492
+ "\n",
493
+ " elif extension == 'pdf':\n",
494
+ " pdf_file = open(file.name, 'rb')\n",
495
+ " reader = PyPDF2.PdfReader(pdf_file)\n",
496
+ " num_pages = len(reader.pages)\n",
497
+ " full_text = []\n",
498
+ " for page in range(num_pages):\n",
499
+ " page_obj = reader.pages[page]\n",
500
+ " full_text.append(page_obj.extract_text())\n",
501
+ " pdf_file.close()\n",
502
+ " return '\\n'.join(full_text)\n",
503
+ "\n",
504
+ " elif extension == 'txt':\n",
505
+ " with open(file.name, 'r') as txt_file:\n",
506
+ " full_text = txt_file.read()\n",
507
+ " return full_text\n",
508
+ "\n",
509
+ " else:\n",
510
+ " return \"Unsupported file type\"\n",
511
+ " except FileNotFoundError:\n",
512
+ " return \"File not found\"\n",
513
+ " except PermissionError:\n",
514
+ " return \"Permission denied\"\n",
515
+ "\n",
516
+ "def generate_questions(context, transcript):\n",
517
+ " text = process_file(context)\n",
518
+ " test_input1 = prompt_template1.format_messages(\n",
519
+ " context = text,\n",
520
+ " transcribed_text = transcript)\n",
521
+ "\n",
522
+ " response = chat(test_input1)\n",
523
+ " return response.content\n",
524
+ "\n",
525
+ "def generate_questions_v2(text, prompt):\n",
526
+ " #text = process_file(file)\n",
527
+ " test_input1 = prompt_template.format_messages(\n",
528
+ " context = text,\n",
529
+ " pre_prompt = prompt)\n",
530
+ "\n",
531
+ " response = chat(test_input1)\n",
532
+ " return response\n",
533
+ "\n",
534
+ "# def ai_evaluate(context, audio_main, audio_qa, questions):\n",
535
+ "# test_input1 = prompt_template2.format_messages(\n",
536
+ "# context = context,\n",
537
+ "# transcribed_text = audio_main,\n",
538
+ "# transcribed_qa = audio_qa,\n",
539
+ "# questions = questions)\n",
540
+ "\n",
541
+ "# response = chat(test_input1)\n",
542
+ "# return response.content\n",
543
+ "\n",
544
+ "def ai_evaluate_v2(text, audio_main, questions):\n",
545
+ " #audio = transcribe(audio_main)\n",
546
+ " test_input1 = prompt_template2.format_messages(\n",
547
+ " context = text,\n",
548
+ " transcript = audio_main,\n",
549
+ " question = questions\n",
550
+ " )\n",
551
+ "\n",
552
+ " response = chat(test_input1)\n",
553
+ " return response.content\n",
554
+ "\n",
555
+ "def upload_file(files):\n",
556
+ " file_paths = [file.name for file in files]\n",
557
+ " return file_paths"
558
+ ],
559
+ "metadata": {
560
+ "id": "ABN0X9xQHeii"
561
+ },
562
+ "execution_count": 12,
563
+ "outputs": []
564
+ },
565
+ {
566
+ "cell_type": "markdown",
567
+ "source": [
568
+ "### Test process_file"
569
+ ],
570
+ "metadata": {
571
+ "id": "a3WUL_hFyMkr"
572
+ }
573
+ },
574
+ {
575
+ "cell_type": "code",
576
+ "source": [
577
+ "from google.colab import files\n",
578
+ "def upload_syllabi():\n",
579
+ " uploaded = files.upload()\n",
580
+ " for name, data in uploaded.items():\n",
581
+ " with open(name, 'wb') as f:\n",
582
+ " f.write(data)\n",
583
+ " print('saved file', name)\n",
584
+ "upload_syllabi()"
585
+ ],
586
+ "metadata": {
587
+ "colab": {
588
+ "base_uri": "https://localhost:8080/",
589
+ "height": 90
590
+ },
591
+ "id": "nih4FXX0Pl9U",
592
+ "outputId": "ce48c70a-d52c-4267-f3fc-8b22404448d7"
593
+ },
594
+ "execution_count": 13,
595
+ "outputs": [
596
+ {
597
+ "output_type": "display_data",
598
+ "data": {
599
+ "text/plain": [
600
+ "<IPython.core.display.HTML object>"
601
+ ],
602
+ "text/html": [
603
+ "\n",
604
+ " <input type=\"file\" id=\"files-c72e8e8a-ac6f-48ab-9fb4-a84b1483268a\" name=\"files[]\" multiple disabled\n",
605
+ " style=\"border:none\" />\n",
606
+ " <output id=\"result-c72e8e8a-ac6f-48ab-9fb4-a84b1483268a\">\n",
607
+ " Upload widget is only available when the cell has been executed in the\n",
608
+ " current browser session. Please rerun this cell to enable.\n",
609
+ " </output>\n",
610
+ " <script>// Copyright 2017 Google LLC\n",
611
+ "//\n",
612
+ "// Licensed under the Apache License, Version 2.0 (the \"License\");\n",
613
+ "// you may not use this file except in compliance with the License.\n",
614
+ "// You may obtain a copy of the License at\n",
615
+ "//\n",
616
+ "// http://www.apache.org/licenses/LICENSE-2.0\n",
617
+ "//\n",
618
+ "// Unless required by applicable law or agreed to in writing, software\n",
619
+ "// distributed under the License is distributed on an \"AS IS\" BASIS,\n",
620
+ "// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
621
+ "// See the License for the specific language governing permissions and\n",
622
+ "// limitations under the License.\n",
623
+ "\n",
624
+ "/**\n",
625
+ " * @fileoverview Helpers for google.colab Python module.\n",
626
+ " */\n",
627
+ "(function(scope) {\n",
628
+ "function span(text, styleAttributes = {}) {\n",
629
+ " const element = document.createElement('span');\n",
630
+ " element.textContent = text;\n",
631
+ " for (const key of Object.keys(styleAttributes)) {\n",
632
+ " element.style[key] = styleAttributes[key];\n",
633
+ " }\n",
634
+ " return element;\n",
635
+ "}\n",
636
+ "\n",
637
+ "// Max number of bytes which will be uploaded at a time.\n",
638
+ "const MAX_PAYLOAD_SIZE = 100 * 1024;\n",
639
+ "\n",
640
+ "function _uploadFiles(inputId, outputId) {\n",
641
+ " const steps = uploadFilesStep(inputId, outputId);\n",
642
+ " const outputElement = document.getElementById(outputId);\n",
643
+ " // Cache steps on the outputElement to make it available for the next call\n",
644
+ " // to uploadFilesContinue from Python.\n",
645
+ " outputElement.steps = steps;\n",
646
+ "\n",
647
+ " return _uploadFilesContinue(outputId);\n",
648
+ "}\n",
649
+ "\n",
650
+ "// This is roughly an async generator (not supported in the browser yet),\n",
651
+ "// where there are multiple asynchronous steps and the Python side is going\n",
652
+ "// to poll for completion of each step.\n",
653
+ "// This uses a Promise to block the python side on completion of each step,\n",
654
+ "// then passes the result of the previous step as the input to the next step.\n",
655
+ "function _uploadFilesContinue(outputId) {\n",
656
+ " const outputElement = document.getElementById(outputId);\n",
657
+ " const steps = outputElement.steps;\n",
658
+ "\n",
659
+ " const next = steps.next(outputElement.lastPromiseValue);\n",
660
+ " return Promise.resolve(next.value.promise).then((value) => {\n",
661
+ " // Cache the last promise value to make it available to the next\n",
662
+ " // step of the generator.\n",
663
+ " outputElement.lastPromiseValue = value;\n",
664
+ " return next.value.response;\n",
665
+ " });\n",
666
+ "}\n",
667
+ "\n",
668
+ "/**\n",
669
+ " * Generator function which is called between each async step of the upload\n",
670
+ " * process.\n",
671
+ " * @param {string} inputId Element ID of the input file picker element.\n",
672
+ " * @param {string} outputId Element ID of the output display.\n",
673
+ " * @return {!Iterable<!Object>} Iterable of next steps.\n",
674
+ " */\n",
675
+ "function* uploadFilesStep(inputId, outputId) {\n",
676
+ " const inputElement = document.getElementById(inputId);\n",
677
+ " inputElement.disabled = false;\n",
678
+ "\n",
679
+ " const outputElement = document.getElementById(outputId);\n",
680
+ " outputElement.innerHTML = '';\n",
681
+ "\n",
682
+ " const pickedPromise = new Promise((resolve) => {\n",
683
+ " inputElement.addEventListener('change', (e) => {\n",
684
+ " resolve(e.target.files);\n",
685
+ " });\n",
686
+ " });\n",
687
+ "\n",
688
+ " const cancel = document.createElement('button');\n",
689
+ " inputElement.parentElement.appendChild(cancel);\n",
690
+ " cancel.textContent = 'Cancel upload';\n",
691
+ " const cancelPromise = new Promise((resolve) => {\n",
692
+ " cancel.onclick = () => {\n",
693
+ " resolve(null);\n",
694
+ " };\n",
695
+ " });\n",
696
+ "\n",
697
+ " // Wait for the user to pick the files.\n",
698
+ " const files = yield {\n",
699
+ " promise: Promise.race([pickedPromise, cancelPromise]),\n",
700
+ " response: {\n",
701
+ " action: 'starting',\n",
702
+ " }\n",
703
+ " };\n",
704
+ "\n",
705
+ " cancel.remove();\n",
706
+ "\n",
707
+ " // Disable the input element since further picks are not allowed.\n",
708
+ " inputElement.disabled = true;\n",
709
+ "\n",
710
+ " if (!files) {\n",
711
+ " return {\n",
712
+ " response: {\n",
713
+ " action: 'complete',\n",
714
+ " }\n",
715
+ " };\n",
716
+ " }\n",
717
+ "\n",
718
+ " for (const file of files) {\n",
719
+ " const li = document.createElement('li');\n",
720
+ " li.append(span(file.name, {fontWeight: 'bold'}));\n",
721
+ " li.append(span(\n",
722
+ " `(${file.type || 'n/a'}) - ${file.size} bytes, ` +\n",
723
+ " `last modified: ${\n",
724
+ " file.lastModifiedDate ? file.lastModifiedDate.toLocaleDateString() :\n",
725
+ " 'n/a'} - `));\n",
726
+ " const percent = span('0% done');\n",
727
+ " li.appendChild(percent);\n",
728
+ "\n",
729
+ " outputElement.appendChild(li);\n",
730
+ "\n",
731
+ " const fileDataPromise = new Promise((resolve) => {\n",
732
+ " const reader = new FileReader();\n",
733
+ " reader.onload = (e) => {\n",
734
+ " resolve(e.target.result);\n",
735
+ " };\n",
736
+ " reader.readAsArrayBuffer(file);\n",
737
+ " });\n",
738
+ " // Wait for the data to be ready.\n",
739
+ " let fileData = yield {\n",
740
+ " promise: fileDataPromise,\n",
741
+ " response: {\n",
742
+ " action: 'continue',\n",
743
+ " }\n",
744
+ " };\n",
745
+ "\n",
746
+ " // Use a chunked sending to avoid message size limits. See b/62115660.\n",
747
+ " let position = 0;\n",
748
+ " do {\n",
749
+ " const length = Math.min(fileData.byteLength - position, MAX_PAYLOAD_SIZE);\n",
750
+ " const chunk = new Uint8Array(fileData, position, length);\n",
751
+ " position += length;\n",
752
+ "\n",
753
+ " const base64 = btoa(String.fromCharCode.apply(null, chunk));\n",
754
+ " yield {\n",
755
+ " response: {\n",
756
+ " action: 'append',\n",
757
+ " file: file.name,\n",
758
+ " data: base64,\n",
759
+ " },\n",
760
+ " };\n",
761
+ "\n",
762
+ " let percentDone = fileData.byteLength === 0 ?\n",
763
+ " 100 :\n",
764
+ " Math.round((position / fileData.byteLength) * 100);\n",
765
+ " percent.textContent = `${percentDone}% done`;\n",
766
+ "\n",
767
+ " } while (position < fileData.byteLength);\n",
768
+ " }\n",
769
+ "\n",
770
+ " // All done.\n",
771
+ " yield {\n",
772
+ " response: {\n",
773
+ " action: 'complete',\n",
774
+ " }\n",
775
+ " };\n",
776
+ "}\n",
777
+ "\n",
778
+ "scope.google = scope.google || {};\n",
779
+ "scope.google.colab = scope.google.colab || {};\n",
780
+ "scope.google.colab._files = {\n",
781
+ " _uploadFiles,\n",
782
+ " _uploadFilesContinue,\n",
783
+ "};\n",
784
+ "})(self);\n",
785
+ "</script> "
786
+ ]
787
+ },
788
+ "metadata": {}
789
+ },
790
+ {
791
+ "output_type": "stream",
792
+ "name": "stdout",
793
+ "text": [
794
+ "Saving instructor_note_2.docx to instructor_note_2.docx\n",
795
+ "saved file instructor_note_2.docx\n"
796
+ ]
797
+ }
798
+ ]
799
+ },
800
+ {
801
+ "cell_type": "code",
802
+ "source": [
803
+ "# Might need some way to make pdf file to load more readable\n",
804
+ "# process_file('/content/instrutor_note.docx')\n",
805
+ "# process_file('/content/Big Data & Economics.pdf')\n",
806
+ "# process_file('/content/Big Data & Economics.pdf')"
807
+ ],
808
+ "metadata": {
809
+ "id": "LJX1AKTMyVm8"
810
+ },
811
+ "execution_count": null,
812
+ "outputs": []
813
+ },
814
+ {
815
+ "cell_type": "markdown",
816
+ "source": [
817
+ "### Gradio Interface V1"
818
+ ],
819
+ "metadata": {
820
+ "id": "c4s5o8baE6wN"
821
+ }
822
+ },
823
+ {
824
+ "cell_type": "code",
825
+ "source": [
826
+ "# @title\n",
827
+ "# with gr.Blocks() as demo:\n",
828
+ "# gr.Markdown(\"# Oral Exam App\")\n",
829
+ "# with gr.Box():\n",
830
+ "# gr.HTML(\"\"\"Embed your OpenAI API key below; if you haven't created one already, visit\n",
831
+ "# platform.openai.com/account/api-keys\n",
832
+ "# to sign up for an account and get your personal API key\"\"\",\n",
833
+ "# elem_classes=\"textbox_label\")\n",
834
+ "# input = gr.Textbox(show_label=False, type=\"password\", container=False,\n",
835
+ "# placeholder=\"●●●●●●●●●●●●●●●●●\")\n",
836
+ "# input.change(fn=embed_key, inputs=input, outputs=None)\n",
837
+ "\n",
838
+ "# with gr.Blocks():\n",
839
+ "# gr.Markdown(\"## Upload your audio file or start recording\")\n",
840
+ "\n",
841
+ "# with gr.Row():\n",
842
+ "\n",
843
+ "\n",
844
+ "# with gr.Column():\n",
845
+ "# file_input = gr.Files(label=\"Load a mp3 file\",\n",
846
+ "# file_types=['.mp3'], type=\"file\",\n",
847
+ "# elem_classes=\"short-height\")\n",
848
+ "# record_inputs = gr.Audio(source=\"microphone\", type=\"filepath\")\n",
849
+ "\n",
850
+ "# with gr.Column():\n",
851
+ "# outputs_transcribe=gr.Textbox(label=\"Transcription\")\n",
852
+ "\n",
853
+ "# with gr.Row():\n",
854
+ "# btn1 = gr.Button(value=\"Transcribe recorded audio\")\n",
855
+ "# btn1.click(transcribe, inputs=record_inputs, outputs=outputs_transcribe)\n",
856
+ "# btn2 = gr.Button(value=\"Transcribe uploaded audio\")\n",
857
+ "# btn2.click(transcribe, inputs=file_input, outputs=outputs_transcribe)\n",
858
+ "\n",
859
+ "# outputs_translate=gr.Textbox(label=\"Translation\")\n",
860
+ "# btn3 = gr.Button(value=\"Translate\")\n",
861
+ "# btn3.click(translate, inputs=outputs_transcribe, outputs=outputs_translate)\n",
862
+ "\n",
863
+ "# demo.launch()\n"
864
+ ],
865
+ "metadata": {
866
+ "id": "ZkfJXCGDFhdw",
867
+ "cellView": "form"
868
+ },
869
+ "execution_count": null,
870
+ "outputs": []
871
+ },
872
+ {
873
+ "cell_type": "markdown",
874
+ "source": [
875
+ "### baseline functionality V1"
876
+ ],
877
+ "metadata": {
878
+ "id": "AnkuosJ7Vw4z"
879
+ }
880
+ },
881
+ {
882
+ "cell_type": "code",
883
+ "source": [
884
+ "# @title\n",
885
+ "with gr.Blocks() as demo:\n",
886
+ " gr.Markdown(\"# Oral Exam App\")\n",
887
+ " gr.Markdown(\"## OpenAI API key\")\n",
888
+ " with gr.Box():\n",
889
+ " gr.HTML(\"\"\"Embed your OpenAI API key below; if you haven't created one already, visit\n",
890
+ " platform.openai.com/account/api-keys\n",
891
+ " to sign up for an account and get your personal API key\"\"\",\n",
892
+ " elem_classes=\"textbox_label\")\n",
893
+ " input = gr.Textbox(show_label=False, type=\"password\", container=False,\n",
894
+ " placeholder=\"●●●●●●●●●●●●●●●●●\")\n",
895
+ " input.change(fn=embed_key, inputs=input, outputs=None)\n",
896
+ "\n",
897
+ " with gr.Blocks():\n",
898
+ " #########################\n",
899
+ " #########Context#########\n",
900
+ " #########################\n",
901
+ " with gr.Accordion(\"Context section\"):\n",
902
+ " ### Should also allow vector stores\n",
903
+ " gr.Markdown(\"## Please upload the context document(s) for Oral exam\")\n",
904
+ " context_input = gr.File(label=\"Click to upload context file\",\n",
905
+ " file_count=\"multiple\",\n",
906
+ " file_types=[\".txt\", \".docx\", \".pdf\"])\n",
907
+ " outputs_context=gr.Textbox(label=\"Context\")\n",
908
+ " context_input.change(fn=process_file, inputs=context_input, outputs=outputs_context)\n",
909
+ " # upload_button = gr.Button(value=\"Show context\")\n",
910
+ " # upload_button.click(process_file, context_input, outputs_context)\n",
911
+ "\n",
912
+ " #########################\n",
913
+ " #######Main Audio########\n",
914
+ " #########################\n",
915
+ " with gr.Accordion(\"Main audio section\"):\n",
916
+ " gr.Markdown(\"## Upload your audio file or start recording\")\n",
917
+ " with gr.Column():\n",
918
+ " ## uploading files seem not working (don't know why)\n",
919
+ " with gr.Row():\n",
920
+ " file_input = gr.Audio(label=\"Upload Audio\", source=\"upload\", type=\"filepath\")\n",
921
+ " record_inputs = gr.Audio(label=\"Record Audio\", source=\"microphone\", type=\"filepath\")\n",
922
+ "\n",
923
+ " gr.Markdown(\"## Transcribe the audio uploaded or recorded\")\n",
924
+ " outputs_transcribe=gr.Textbox(label=\"Transcription\")\n",
925
+ "\n",
926
+ " file_input.change(fn=transcribe, inputs=file_input, outputs=outputs_transcribe)\n",
927
+ " record_inputs.change(fn=transcribe, inputs=record_inputs, outputs=outputs_transcribe)\n",
928
+ "\n",
929
+ " #########################\n",
930
+ " ###Question Generation###\n",
931
+ " #########################\n",
932
+ " with gr.Accordion(\"Question section\"):\n",
933
+ " gr.Markdown(\"## Questions\")\n",
934
+ " with gr.Row():\n",
935
+ " with gr.Column():\n",
936
+ " outputs_qa=gr.Textbox(label=\"Generate questions\")\n",
937
+ " btn3 = gr.Button(value=\"Generate questions\")\n",
938
+ " btn3.click(generate_questions, inputs=[context_input, outputs_transcribe], outputs=outputs_qa)\n",
939
+ "\n",
940
+ " ######################### Need additional work to include these questions when click button #########################\n",
941
+ " with gr.Column():\n",
942
+ " submit_question=gr.Textbox(label=\"Use existing questions\")\n",
943
+ " btn4 = gr.Button(value=\"Use these questions\")\n",
944
+ " # btn4.click(use_this_question, inputs=outputs_transcribe, outputs=None)\n",
945
+ "\n",
946
+ " #########################\n",
947
+ " #########Audio QA########\n",
948
+ " #########################\n",
949
+ " with gr.Accordion(\"Audio QA section\"):\n",
950
+ " gr.Markdown(\"## Question answering\")\n",
951
+ " ##### This may be iterative\n",
952
+ " with gr.Row():\n",
953
+ " file_input2 = gr.Audio(label=\"Upload Audio\", source=\"upload\", type=\"filepath\")\n",
954
+ " record_inputs2 = gr.Audio(label=\"Record Audio\", source=\"microphone\", type=\"filepath\")\n",
955
+ "\n",
956
+ " gr.Markdown(\"## Transcribe the audio uploaded or recorded\")\n",
957
+ " outputs_transcribe2=gr.Textbox(label=\"Transcription\")\n",
958
+ " file_input2.change(fn=transcribe, inputs=file_input2, outputs=outputs_transcribe2)\n",
959
+ " record_inputs2.change(fn=transcribe, inputs=record_inputs2, outputs=outputs_transcribe2)\n",
960
+ "\n",
961
+ " #########################\n",
962
+ " #######Evaluation########\n",
963
+ " #########################\n",
964
+ " with gr.Accordion(\"Evaluation section\"):\n",
965
+ " gr.Markdown(\"## Evaluation\")\n",
966
+ " with gr.Tab(\"General evalution\"):\n",
967
+ " evalution=gr.Textbox(label=\"AI Evaluation\")\n",
968
+ " btn5 = gr.Button(value=\"Evaluate\")\n",
969
+ " btn5.click(ai_evaluate, inputs=[context_input, record_inputs,record_inputs2, outputs_qa], outputs=evalution)\n",
970
+ " with gr.Tab(\"Quantitative evalution\"):\n",
971
+ " table_output = gr.Dataframe(label = \"Some kind of evaluation metrics?\")\n",
972
+ " btn6 = gr.Button(value=\"Evaluate\")\n",
973
+ " btn6.click(ai_evaluate, inputs=[context_input, record_inputs,record_inputs2, outputs_qa], outputs=table_output)\n",
974
+ "\n",
975
+ " demo.launch()\n",
976
+ " # demo.launch(share=True)\n",
977
+ " # demo.launch(debug=True)"
978
+ ],
979
+ "metadata": {
980
+ "colab": {
981
+ "base_uri": "https://localhost:8080/",
982
+ "height": 616
983
+ },
984
+ "id": "EAPljDMYVy3u",
985
+ "outputId": "1f347376-14e8-48ea-e531-295a4fefd6cd",
986
+ "cellView": "form"
987
+ },
988
+ "execution_count": null,
989
+ "outputs": [
990
+ {
991
+ "output_type": "stream",
992
+ "name": "stdout",
993
+ "text": [
994
+ "Colab notebook detected. To show errors in colab notebook, set debug=True in launch()\n",
995
+ "Note: opening Chrome Inspector may crash demo inside Colab notebooks.\n",
996
+ "\n",
997
+ "To create a public link, set `share=True` in `launch()`.\n"
998
+ ]
999
+ },
1000
+ {
1001
+ "output_type": "display_data",
1002
+ "data": {
1003
+ "text/plain": [
1004
+ "<IPython.core.display.Javascript object>"
1005
+ ],
1006
+ "application/javascript": [
1007
+ "(async (port, path, width, height, cache, element) => {\n",
1008
+ " if (!google.colab.kernel.accessAllowed && !cache) {\n",
1009
+ " return;\n",
1010
+ " }\n",
1011
+ " element.appendChild(document.createTextNode(''));\n",
1012
+ " const url = await google.colab.kernel.proxyPort(port, {cache});\n",
1013
+ "\n",
1014
+ " const external_link = document.createElement('div');\n",
1015
+ " external_link.innerHTML = `\n",
1016
+ " <div style=\"font-family: monospace; margin-bottom: 0.5rem\">\n",
1017
+ " Running on <a href=${new URL(path, url).toString()} target=\"_blank\">\n",
1018
+ " https://localhost:${port}${path}\n",
1019
+ " </a>\n",
1020
+ " </div>\n",
1021
+ " `;\n",
1022
+ " element.appendChild(external_link);\n",
1023
+ "\n",
1024
+ " const iframe = document.createElement('iframe');\n",
1025
+ " iframe.src = new URL(path, url).toString();\n",
1026
+ " iframe.height = height;\n",
1027
+ " iframe.allow = \"autoplay; camera; microphone; clipboard-read; clipboard-write;\"\n",
1028
+ " iframe.width = width;\n",
1029
+ " iframe.style.border = 0;\n",
1030
+ " element.appendChild(iframe);\n",
1031
+ " })(7862, \"/\", \"100%\", 500, false, window.element)"
1032
+ ]
1033
+ },
1034
+ "metadata": {}
1035
+ }
1036
+ ]
1037
+ },
1038
+ {
1039
+ "cell_type": "markdown",
1040
+ "source": [
1041
+ "### Baseline Functionality V2"
1042
+ ],
1043
+ "metadata": {
1044
+ "id": "YAKl-5P4dHEF"
1045
+ }
1046
+ },
1047
+ {
1048
+ "cell_type": "code",
1049
+ "source": [
1050
+ "# @title\n",
1051
+ "with gr.Blocks() as demo:\n",
1052
+ " gr.Markdown(\"# Oral Exam App\")\n",
1053
+ " gr.Markdown(\"## OpenAI API key\")\n",
1054
+ " with gr.Box():\n",
1055
+ " gr.HTML(\"\"\"Embed your OpenAI API key below; if you haven't created one already, visit\n",
1056
+ " platform.openai.com/account/api-keys\n",
1057
+ " to sign up for an account and get your personal API key\"\"\",\n",
1058
+ " elem_classes=\"textbox_label\")\n",
1059
+ " input = gr.Textbox(show_label=False, type=\"password\", container=False,\n",
1060
+ " placeholder=\"●●●●●●●●●●●●●●●●●\")\n",
1061
+ " input.change(fn=embed_key, inputs=input, outputs=None)\n",
1062
+ "\n",
1063
+ " with gr.Blocks():\n",
1064
+ " #########################\n",
1065
+ " #########Context#########\n",
1066
+ " #########################\n",
1067
+ " with gr.Accordion(\"Context section\"):\n",
1068
+ " ### Should also allow vector stores\n",
1069
+ " gr.Markdown(\"## Please upload the context document(s) for Oral exam\")\n",
1070
+ " context_input = gr.File(label=\"Click to upload context file\",\n",
1071
+ " file_count=\"multiple\",\n",
1072
+ " file_types=[\".txt\", \".docx\", \".pdf\"])\n",
1073
+ " outputs_context=gr.Textbox(label=\"Context\")\n",
1074
+ " context_input.change(fn=process_file, inputs=context_input, outputs=outputs_context)\n",
1075
+ " # upload_button = gr.Button(value=\"Show context\")\n",
1076
+ " # upload_button.click(process_file, context_input, outputs_context)\n",
1077
+ "\n",
1078
+ " #########################\n",
1079
+ " ###Question Generation###\n",
1080
+ " #########################\n",
1081
+ " with gr.Accordion(\"Question section\"):\n",
1082
+ " gr.Markdown(\"## Questions\")\n",
1083
+ " with gr.Row():\n",
1084
+ " with gr.Column():\n",
1085
+ " outputs_qa=gr.Textbox(label=\"Generate questions\")\n",
1086
+ " btn1 = gr.Button(value=\"Generate questions\")\n",
1087
+ " btn1.click(generate_questions_v2, inputs=outputs_context, outputs=outputs_qa)\n",
1088
+ "\n",
1089
+ " ######################### Need additional work to include these questions when click button #########################\n",
1090
+ " with gr.Column():\n",
1091
+ " submit_question=gr.Textbox(label=\"Use existing questions\")\n",
1092
+ " btn4 = gr.Button(value=\"Use these questions\")\n",
1093
+ " # btn4.click(use_this_question, inputs=outputs_transcribe, outputs=None)\n",
1094
+ "\n",
1095
+ " #########################\n",
1096
+ " #######Main Audio########\n",
1097
+ " #########################\n",
1098
+ " with gr.Accordion(\"Main audio section\"):\n",
1099
+ " gr.Markdown(\"## Upload your audio file or start recording\")\n",
1100
+ " with gr.Column():\n",
1101
+ " ## uploading files seem not working (don't know why)\n",
1102
+ " with gr.Row():\n",
1103
+ " file_input = gr.Audio(label=\"Upload Audio\", source=\"upload\", type=\"filepath\")\n",
1104
+ " record_inputs = gr.Audio(label=\"Record Audio\", source=\"microphone\", type=\"filepath\")\n",
1105
+ "\n",
1106
+ " gr.Markdown(\"## Transcribe the audio uploaded or recorded\")\n",
1107
+ " outputs_transcribe=gr.Textbox(label=\"Transcription\")\n",
1108
+ "\n",
1109
+ " file_input.change(fn=transcribe, inputs=file_input, outputs=outputs_transcribe)\n",
1110
+ " record_inputs.change(fn=transcribe, inputs=record_inputs, outputs=outputs_transcribe)\n",
1111
+ "\n",
1112
+ " #########################\n",
1113
+ " #######Evaluation########\n",
1114
+ " #########################\n",
1115
+ " with gr.Accordion(\"Evaluation section\"):\n",
1116
+ " gr.Markdown(\"## Evaluation\")\n",
1117
+ " with gr.Tab(\"General evalution\"):\n",
1118
+ " evalution=gr.Textbox(label=\"AI Evaluation\")\n",
1119
+ " btn5 = gr.Button(value=\"Evaluate\")\n",
1120
+ " btn5.click(ai_evaluate_v2, inputs=[outputs_context, outputs_transcribe, outputs_qa], outputs=evalution)\n",
1121
+ " with gr.Tab(\"Quantitative evalution\"):\n",
1122
+ " table_output = gr.Dataframe(label = \"Some kind of evaluation metrics?\")\n",
1123
+ " btn6 = gr.Button(value=\"Evaluate\")\n",
1124
+ " btn6.click(ai_evaluate_v2, inputs=[outputs_context, outputs_transcribe, outputs_qa], outputs=table_output)\n",
1125
+ "\n",
1126
+ " demo.launch()"
1127
+ ],
1128
+ "metadata": {
1129
+ "colab": {
1130
+ "base_uri": "https://localhost:8080/",
1131
+ "height": 706
1132
+ },
1133
+ "id": "04KxUQgUcTrm",
1134
+ "outputId": "66a9f8c8-36fe-4792-b7d6-3befa6f09269",
1135
+ "cellView": "form",
1136
+ "collapsed": true
1137
+ },
1138
+ "execution_count": 23,
1139
+ "outputs": [
1140
+ {
1141
+ "output_type": "stream",
1142
+ "name": "stderr",
1143
+ "text": [
1144
+ "/usr/local/lib/python3.10/dist-packages/gradio/utils.py:833: UserWarning: Expected 2 arguments for function <function generate_questions_v2 at 0x7aa8748f9bd0>, received 1.\n",
1145
+ " warnings.warn(\n",
1146
+ "/usr/local/lib/python3.10/dist-packages/gradio/utils.py:837: UserWarning: Expected at least 2 arguments for function <function generate_questions_v2 at 0x7aa8748f9bd0>, received 1.\n",
1147
+ " warnings.warn(\n"
1148
+ ]
1149
+ },
1150
+ {
1151
+ "output_type": "stream",
1152
+ "name": "stdout",
1153
+ "text": [
1154
+ "Colab notebook detected. To show errors in colab notebook, set debug=True in launch()\n",
1155
+ "Note: opening Chrome Inspector may crash demo inside Colab notebooks.\n",
1156
+ "\n",
1157
+ "To create a public link, set `share=True` in `launch()`.\n"
1158
+ ]
1159
+ },
1160
+ {
1161
+ "output_type": "display_data",
1162
+ "data": {
1163
+ "text/plain": [
1164
+ "<IPython.core.display.Javascript object>"
1165
+ ],
1166
+ "application/javascript": [
1167
+ "(async (port, path, width, height, cache, element) => {\n",
1168
+ " if (!google.colab.kernel.accessAllowed && !cache) {\n",
1169
+ " return;\n",
1170
+ " }\n",
1171
+ " element.appendChild(document.createTextNode(''));\n",
1172
+ " const url = await google.colab.kernel.proxyPort(port, {cache});\n",
1173
+ "\n",
1174
+ " const external_link = document.createElement('div');\n",
1175
+ " external_link.innerHTML = `\n",
1176
+ " <div style=\"font-family: monospace; margin-bottom: 0.5rem\">\n",
1177
+ " Running on <a href=${new URL(path, url).toString()} target=\"_blank\">\n",
1178
+ " https://localhost:${port}${path}\n",
1179
+ " </a>\n",
1180
+ " </div>\n",
1181
+ " `;\n",
1182
+ " element.appendChild(external_link);\n",
1183
+ "\n",
1184
+ " const iframe = document.createElement('iframe');\n",
1185
+ " iframe.src = new URL(path, url).toString();\n",
1186
+ " iframe.height = height;\n",
1187
+ " iframe.allow = \"autoplay; camera; microphone; clipboard-read; clipboard-write;\"\n",
1188
+ " iframe.width = width;\n",
1189
+ " iframe.style.border = 0;\n",
1190
+ " element.appendChild(iframe);\n",
1191
+ " })(7863, \"/\", \"100%\", 500, false, window.element)"
1192
+ ]
1193
+ },
1194
+ "metadata": {}
1195
+ }
1196
+ ]
1197
+ },
1198
+ {
1199
+ "cell_type": "code",
1200
+ "source": [
1201
+ "def prompt_select(selection, number, length):\n",
1202
+ " if selection == \"Random\":\n",
1203
+ " prompt = f\"Please design a {number} question quiz based on the context provided and the inputted learning objectives (if applicable). The types of questions should be randomized (including multiple choice, short answer, true/false, short answer, etc.). Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide 1 question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect, and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right.\"\n",
1204
+ " elif selection == \"Fill in the Blank\":\n",
1205
+ " prompt = f\"Create a {number} question fill in the blank quiz refrencing the context provided. The quiz should reflect the learning objectives (if inputted). The 'blank' part of the question should appear as '________'. The answers should reflect what word(s) should go in the blank an accurate statement. An example is the follow: 'The author of the article is ______.' The question should be a statement. Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide ONE question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect,and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right.\"\n",
1206
+ " elif selection == \"Short Answer\":\n",
1207
+ " prompt = f\"Please design a {number} question quiz about which reflects the learning objectives (if inputted). The questions should be short answer. Expect the correct answers to be {length} sentences long. Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide ONE question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect, and then give me additional chances to respond until I get the correct choice. Explain why the correct answer is right.\"\n",
1208
+ " else:\n",
1209
+ " prompt = f\"Please design a {number} question {selection.lower()} quiz based on the context provided and the inputted learning objectives (if applicable). Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide 1 question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect, and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right.\"\n",
1210
+ " return prompt\n",
1211
+ "\n",
1212
+ "\n",
1213
+ "# Function to save prompts (premade or custom) and return in the user input box in the chatbot`\n",
1214
+ "saved_text = \"\"\n",
1215
+ "def save_text(text):\n",
1216
+ " global saved_text\n",
1217
+ " saved_text = text\n",
1218
+ "\n",
1219
+ "def return_text():\n",
1220
+ " # Return the saved text\n",
1221
+ " return saved_text"
1222
+ ],
1223
+ "metadata": {
1224
+ "id": "wF80F1wU80rU"
1225
+ },
1226
+ "execution_count": 14,
1227
+ "outputs": []
1228
+ },
1229
+ {
1230
+ "cell_type": "markdown",
1231
+ "source": [
1232
+ "### Baseline Functionality V3"
1233
+ ],
1234
+ "metadata": {
1235
+ "id": "F5-Ja2evCE4X"
1236
+ }
1237
+ },
1238
+ {
1239
+ "cell_type": "markdown",
1240
+ "source": [
1241
+ "Updated Question Selection and Chatbot Feature"
1242
+ ],
1243
+ "metadata": {
1244
+ "id": "rr8YlzcJCKv4"
1245
+ }
1246
+ },
1247
+ {
1248
+ "cell_type": "code",
1249
+ "source": [
1250
+ "with gr.Blocks() as demo:\n",
1251
+ " gr.Markdown(\"# Oral Exam App\")\n",
1252
+ " gr.Markdown(\"## OpenAI API key\")\n",
1253
+ " with gr.Box():\n",
1254
+ " gr.HTML(\"\"\"Embed your OpenAI API key below; if you haven't created one already, visit\n",
1255
+ " platform.openai.com/account/api-keys\n",
1256
+ " to sign up for an account and get your personal API key\"\"\",\n",
1257
+ " elem_classes=\"textbox_label\")\n",
1258
+ " input = gr.Textbox(show_label=False, type=\"password\", container=False,\n",
1259
+ " placeholder=\"●●●●●●●●●●●●●●●●●\")\n",
1260
+ " input.change(fn=embed_key, inputs=input, outputs=None)\n",
1261
+ "\n",
1262
+ " with gr.Blocks():\n",
1263
+ " #########################\n",
1264
+ " #########Context#########\n",
1265
+ " #########################\n",
1266
+ " with gr.Accordion(\"Context section\"):\n",
1267
+ " ### Should also allow vector stores\n",
1268
+ " gr.Markdown(\"## Please upload the context document(s) for Oral exam\")\n",
1269
+ " context_input = gr.File(label=\"Click to upload context file\",\n",
1270
+ " file_count=\"multiple\",\n",
1271
+ " file_types=[\".txt\", \".docx\", \".pdf\"])\n",
1272
+ " outputs_context=gr.Textbox(label=\"Context\")\n",
1273
+ " context_input.change(fn=process_file, inputs=context_input, outputs=outputs_context)\n",
1274
+ " # upload_button = gr.Button(value=\"Show context\")\n",
1275
+ " # upload_button.click(process_file, context_input, outputs_context)\n",
1276
+ "\n",
1277
+ " with gr.Blocks():\n",
1278
+ " gr.Markdown(\"\"\"\n",
1279
+ " ## Generate a Premade Prompt\n",
1280
+ " Select your type and number of desired questions. Click \"Generate Prompt\" to get your premade prompt,\n",
1281
+ " and then \"Insert Prompt into Chat\" to copy the text into the chat interface below. \\\n",
1282
+ " You can also copy the prompt using the icon in the upper right corner and paste directly into the input box when interacting with the model.\n",
1283
+ " \"\"\")\n",
1284
+ " with gr.Row():\n",
1285
+ " with gr.Column():\n",
1286
+ " question_type = gr.Dropdown([\"Multiple Choice\", \"True or False\", \"Short Answer\", \"Fill in the Blank\", \"Random\"], label=\"Question Type\")\n",
1287
+ " number_of_questions = gr.Textbox(label=\"Enter desired number of questions\")\n",
1288
+ " sa_desired_length = gr.Dropdown([\"1-2\", \"3-4\", \"5-6\", \"6 or more\"], label = \"For short answer questions only, choose the desired sentence length for answers. The default value is 1-2 sentences.\")\n",
1289
+ " with gr.Column():\n",
1290
+ " prompt_button = gr.Button(\"Generate Prompt\")\n",
1291
+ " premade_prompt_output = gr.Textbox(label=\"Generated prompt (save or copy)\", show_copy_button=True)\n",
1292
+ " prompt_button.click(prompt_select,\n",
1293
+ " inputs=[question_type, number_of_questions, sa_desired_length],\n",
1294
+ " outputs=premade_prompt_output)\n",
1295
+ " ########################\n",
1296
+ " ##Question Generation###\n",
1297
+ " ########################\n",
1298
+ " with gr.Accordion(\"Question section\"):\n",
1299
+ " gr.Markdown(\"## Questions\")\n",
1300
+ " with gr.Row():\n",
1301
+ " with gr.Column():\n",
1302
+ " outputs_qa=gr.Textbox(label=\"Generate questions\")\n",
1303
+ " btn1 = gr.Button(value=\"Generate questions\")\n",
1304
+ " btn1.click(generate_questions_v2, inputs=[outputs_context, premade_prompt_output], outputs=outputs_qa)\n",
1305
+ "\n",
1306
+ " ######################### Need additional work to include these questions when click button #########################\n",
1307
+ " with gr.Column():\n",
1308
+ " submit_question=gr.Textbox(label=\"Use existing questions\")\n",
1309
+ " btn4 = gr.Button(value=\"Use these questions\")\n",
1310
+ " # btn4.click(use_this_question, inputs=outputs_transcribe, outputs=None)\n",
1311
+ "\n",
1312
+ " #########################\n",
1313
+ " #######Main Audio########\n",
1314
+ " #########################\n",
1315
+ " with gr.Accordion(\"Main audio section\"):\n",
1316
+ " gr.Markdown(\"## Upload your audio file or start recording\")\n",
1317
+ " with gr.Column():\n",
1318
+ " ## uploading files seem not working (don't know why)\n",
1319
+ " with gr.Row():\n",
1320
+ " file_input = gr.Audio(label=\"Upload Audio\", source=\"upload\", type=\"filepath\")\n",
1321
+ " record_inputs = gr.Audio(label=\"Record Audio\", source=\"microphone\", type=\"filepath\")\n",
1322
+ "\n",
1323
+ " gr.Markdown(\"## Transcribe the audio uploaded or recorded\")\n",
1324
+ " outputs_transcribe=gr.Textbox(label=\"Transcription\")\n",
1325
+ "\n",
1326
+ " file_input.change(fn=transcribe, inputs=file_input, outputs=outputs_transcribe)\n",
1327
+ " record_inputs.change(fn=transcribe, inputs=record_inputs, outputs=outputs_transcribe)\n",
1328
+ "\n",
1329
+ " #########################\n",
1330
+ " #######Evaluation########\n",
1331
+ " #########################\n",
1332
+ " with gr.Accordion(\"Evaluation section\"):\n",
1333
+ " gr.Markdown(\"## Evaluation\")\n",
1334
+ " with gr.Tab(\"General evalution\"):\n",
1335
+ " evalution=gr.Textbox(label=\"AI Evaluation\")\n",
1336
+ " btn5 = gr.Button(value=\"Evaluate\")\n",
1337
+ " btn5.click(ai_evaluate_v2, inputs=[outputs_context, outputs_transcribe, outputs_qa], outputs=evalution)\n",
1338
+ " with gr.Tab(\"Quantitative evalution\"):\n",
1339
+ " table_output = gr.Dataframe(label = \"Some kind of evaluation metrics?\")\n",
1340
+ " btn6 = gr.Button(value=\"Evaluate\")\n",
1341
+ " btn6.click(ai_evaluate_v2, inputs=[outputs_context, outputs_transcribe, outputs_qa], outputs=table_output)\n",
1342
+ "\n",
1343
+ "\n",
1344
+ " # Chatbot (https://gradio.app/creating-a-chatbot/)\n",
1345
+ " '''\n",
1346
+ " with gr.Blocks():\n",
1347
+ " gr.Markdown(\"\"\"\n",
1348
+ " ## Chat with the Model\n",
1349
+ " Click \"Display Prompt\" to display the premade or custom prompt that you created earlier. Then, continue chatting with the model.\n",
1350
+ " \"\"\")\n",
1351
+ " with gr.Row():\n",
1352
+ " show_prompt_block = gr.Button(\"Display Prompt\")\n",
1353
+ " '''\n",
1354
+ " gr.Markdown(\"## Chat with the Model\")\n",
1355
+ " with gr.Row(equal_height=True):\n",
1356
+ " with gr.Column(scale=2):\n",
1357
+ " chatbot = gr.Chatbot()\n",
1358
+ " with gr.Row():\n",
1359
+ " user_chat_input = gr.Textbox(label=\"User input\", scale=9)\n",
1360
+ " user_chat_input.submit(return_text, inputs=None, outputs=user_chat_input)\n",
1361
+ " user_chat_submit = gr.Button(\"Ask/answer model\", scale=1)\n",
1362
+ " #show_prompt_block.click(return_text, inputs=None, outputs=user_chat_input)\n",
1363
+ "\n",
1364
+ " # TODO Move the sources so it's displayed to the right of the chat bot,\n",
1365
+ " # with the sources taking up about 1/3rd of the horizontal space\n",
1366
+ " # with gr.Box(elem_id=\"sources-container\", scale=1):\n",
1367
+ " # # TODO: Display document sources in a nicer format?\n",
1368
+ " # gr.HTML(value=\"<h3 id='sources'>Sources</h3>\")\n",
1369
+ " # sources_output = []\n",
1370
+ " # for i in range(num_sources):\n",
1371
+ " # source_elem = gr.HTML(visible=False)\n",
1372
+ " # sources_output.append(source_elem)\n",
1373
+ "\n",
1374
+ "demo.launch()"
1375
+ ],
1376
+ "metadata": {
1377
+ "colab": {
1378
+ "base_uri": "https://localhost:8080/",
1379
+ "height": 616
1380
+ },
1381
+ "id": "Y7-3JFuZ8H5k",
1382
+ "outputId": "ea99ce65-7b79-4d39-dd88-44785b0d6615"
1383
+ },
1384
+ "execution_count": 24,
1385
+ "outputs": [
1386
+ {
1387
+ "output_type": "stream",
1388
+ "name": "stdout",
1389
+ "text": [
1390
+ "Colab notebook detected. To show errors in colab notebook, set debug=True in launch()\n",
1391
+ "Note: opening Chrome Inspector may crash demo inside Colab notebooks.\n",
1392
+ "\n",
1393
+ "To create a public link, set `share=True` in `launch()`.\n"
1394
+ ]
1395
+ },
1396
+ {
1397
+ "output_type": "display_data",
1398
+ "data": {
1399
+ "text/plain": [
1400
+ "<IPython.core.display.Javascript object>"
1401
+ ],
1402
+ "application/javascript": [
1403
+ "(async (port, path, width, height, cache, element) => {\n",
1404
+ " if (!google.colab.kernel.accessAllowed && !cache) {\n",
1405
+ " return;\n",
1406
+ " }\n",
1407
+ " element.appendChild(document.createTextNode(''));\n",
1408
+ " const url = await google.colab.kernel.proxyPort(port, {cache});\n",
1409
+ "\n",
1410
+ " const external_link = document.createElement('div');\n",
1411
+ " external_link.innerHTML = `\n",
1412
+ " <div style=\"font-family: monospace; margin-bottom: 0.5rem\">\n",
1413
+ " Running on <a href=${new URL(path, url).toString()} target=\"_blank\">\n",
1414
+ " https://localhost:${port}${path}\n",
1415
+ " </a>\n",
1416
+ " </div>\n",
1417
+ " `;\n",
1418
+ " element.appendChild(external_link);\n",
1419
+ "\n",
1420
+ " const iframe = document.createElement('iframe');\n",
1421
+ " iframe.src = new URL(path, url).toString();\n",
1422
+ " iframe.height = height;\n",
1423
+ " iframe.allow = \"autoplay; camera; microphone; clipboard-read; clipboard-write;\"\n",
1424
+ " iframe.width = width;\n",
1425
+ " iframe.style.border = 0;\n",
1426
+ " element.appendChild(iframe);\n",
1427
+ " })(7864, \"/\", \"100%\", 500, false, window.element)"
1428
+ ]
1429
+ },
1430
+ "metadata": {}
1431
+ },
1432
+ {
1433
+ "output_type": "execute_result",
1434
+ "data": {
1435
+ "text/plain": []
1436
+ },
1437
+ "metadata": {},
1438
+ "execution_count": 24
1439
+ }
1440
+ ]
1441
+ },
1442
+ {
1443
+ "cell_type": "markdown",
1444
+ "source": [
1445
+ "### What's left\n",
1446
+ "- vector store (link) upload\n",
1447
+ "- submit question section need to be linked with ai_evaluate function"
1448
+ ],
1449
+ "metadata": {
1450
+ "id": "g2EVIogW69Fd"
1451
+ }
1452
+ }
1453
+ ]
1454
+ }
UI_design_oral_exam_chatbot.ipynb ADDED
@@ -0,0 +1,1004 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "id": "view-in-github",
7
+ "colab_type": "text"
8
+ },
9
+ "source": [
10
+ "<a href=\"https://colab.research.google.com/github/vanderbilt-data-science/lo-achievement/blob/124-implement-baseline-functionality-for-oral-exam-module/UI_design_oral_exam_chatbot.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "markdown",
15
+ "metadata": {
16
+ "id": "PIbogPXyM0wr"
17
+ },
18
+ "source": [
19
+ "# Project IO Achievement - UI Design (Oral Exam)"
20
+ ]
21
+ },
22
+ {
23
+ "cell_type": "markdown",
24
+ "metadata": {
25
+ "id": "x_Vp8SiKM4p1"
26
+ },
27
+ "source": [
28
+ "## Problem Definition\n",
29
+ "\n",
30
+ "The v1 functionality for the Oral Exam module requires the following:\n",
31
+ "\n",
32
+ "1. Upload or generation of questions: either the user should upload a set of questions or we should allow the model to generate the questions. The user should pick or it should be inherent if there is no upload of questions. Note that we must also allow for context to be uploaded (vector store, vector store link, specific documents)\n",
33
+ "2. The model should prompt the user with a question and pause.\n",
34
+ "The user should respond by audio.\n",
35
+ "3. This should continue on until some final point where the exam is over.\n",
36
+ "\n",
37
+ "Then:\n",
38
+ "\n",
39
+ "1. We should use Whisper to do the transcription, and\n",
40
+ "2. Send the transcription, questions, and context for GPT4 for evaluation\n",
41
+ "Return the evaluation.\n",
42
+ "3. This will primarily be work on a user interface."
43
+ ]
44
+ },
45
+ {
46
+ "cell_type": "markdown",
47
+ "metadata": {
48
+ "id": "o_60X8H3NEne"
49
+ },
50
+ "source": [
51
+ "## Libraries\n",
52
+ "\n",
53
+ "This section will install and import some important libraries such as Langchain, openai, Gradio, and so on"
54
+ ]
55
+ },
56
+ {
57
+ "cell_type": "code",
58
+ "execution_count": 1,
59
+ "metadata": {
60
+ "id": "pxcqXgg2aAN7"
61
+ },
62
+ "outputs": [],
63
+ "source": [
64
+ "# install libraries here\n",
65
+ "# -q flag for \"quiet\" install\n",
66
+ "%%capture\n",
67
+ "!pip install -q langchain\n",
68
+ "!pip install -q openai\n",
69
+ "!pip install -q gradio\n",
70
+ "# !pip install -q datasets\n",
71
+ "!pip install -q torchaudio\n",
72
+ "!pip install -q git+https://github.com/openai/whisper.git\n",
73
+ "!pip install -q docx\n",
74
+ "!pip install -q PyPDF2\n",
75
+ "!pip install -q python-docx"
76
+ ]
77
+ },
78
+ {
79
+ "cell_type": "code",
80
+ "execution_count": 2,
81
+ "metadata": {
82
+ "id": "pEjM1tLsMZBq"
83
+ },
84
+ "outputs": [],
85
+ "source": [
86
+ "# import libraries here\n",
87
+ "from langchain.llms import OpenAI\n",
88
+ "from langchain.prompts import PromptTemplate\n",
89
+ "from langchain.document_loaders import TextLoader\n",
90
+ "from langchain.indexes import VectorstoreIndexCreator\n",
91
+ "from langchain import ConversationChain, LLMChain, PromptTemplate\n",
92
+ "from langchain.chat_models import ChatOpenAI\n",
93
+ "from langchain.memory import ConversationBufferWindowMemory\n",
94
+ "from langchain.prompts import ChatPromptTemplate\n",
95
+ "from langchain.text_splitter import CharacterTextSplitter\n",
96
+ "from langchain.embeddings import OpenAIEmbeddings\n",
97
+ "import openai\n",
98
+ "import os\n",
99
+ "from getpass import getpass\n",
100
+ "# from IPython.display import display, Javascript, HTML\n",
101
+ "# from google.colab.output import eval_js\n",
102
+ "# from base64 import b64decode\n",
103
+ "# import ipywidgets as widgets\n",
104
+ "# from IPython.display import clear_output\n",
105
+ "import time\n",
106
+ "import requests\n",
107
+ "# from datasets import load_dataset\n",
108
+ "# from torchaudio.transforms import Resample\n",
109
+ "import whisper\n",
110
+ "import numpy as np\n",
111
+ "import torch\n",
112
+ "import librosa\n",
113
+ "# from datasets import load_dataset\n",
114
+ "#from jiwer import wer\n",
115
+ "import pandas as pd\n",
116
+ "import gradio as gr\n",
117
+ "from docx import Document\n",
118
+ "import PyPDF2\n",
119
+ "from pydub import AudioSegment\n",
120
+ "import tempfile"
121
+ ]
122
+ },
123
+ {
124
+ "cell_type": "markdown",
125
+ "metadata": {
126
+ "id": "03KLZGI_a5W5"
127
+ },
128
+ "source": [
129
+ "## API Keys\n",
130
+ "\n",
131
+ "Use these cells to load the API keys required for this notebook. The below code cell uses the `getpass` library."
132
+ ]
133
+ },
134
+ {
135
+ "cell_type": "code",
136
+ "execution_count": 3,
137
+ "metadata": {
138
+ "colab": {
139
+ "base_uri": "https://localhost:8080/"
140
+ },
141
+ "id": "5smcWj4DbFgy",
142
+ "outputId": "6bc91507-cd3c-4808-8976-811d7fc7cb29"
143
+ },
144
+ "outputs": [
145
+ {
146
+ "name": "stdout",
147
+ "output_type": "stream",
148
+ "text": [
149
+ "··········\n"
150
+ ]
151
+ }
152
+ ],
153
+ "source": [
154
+ "openai_api_key = getpass()\n",
155
+ "os.environ[\"OPENAI_API_KEY\"] = openai_api_key\n",
156
+ "openai.api_key = openai_api_key"
157
+ ]
158
+ },
159
+ {
160
+ "cell_type": "markdown",
161
+ "metadata": {
162
+ "id": "pMo9x8u4AEV1"
163
+ },
164
+ "source": [
165
+ "## Prompt Design"
166
+ ]
167
+ },
168
+ {
169
+ "cell_type": "code",
170
+ "execution_count": 4,
171
+ "metadata": {
172
+ "colab": {
173
+ "base_uri": "https://localhost:8080/"
174
+ },
175
+ "id": "UgnCZRMhADvo",
176
+ "outputId": "462e62c7-a618-4549-e651-858514757235"
177
+ },
178
+ "outputs": [
179
+ {
180
+ "output_type": "execute_result",
181
+ "data": {
182
+ "text/plain": [
183
+ "ChatOpenAI(cache=None, verbose=False, callbacks=None, callback_manager=None, tags=None, metadata=None, client=<class 'openai.api_resources.chat_completion.ChatCompletion'>, model_name='gpt-4', temperature=0.0, model_kwargs={}, openai_api_key='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', openai_api_base='', openai_organization='', openai_proxy='', request_timeout=None, max_retries=6, streaming=False, n=1, max_tokens=None, tiktoken_model_name=None)"
184
+ ]
185
+ },
186
+ "metadata": {},
187
+ "execution_count": 4
188
+ }
189
+ ],
190
+ "source": [
191
+ "chat = ChatOpenAI(temperature=0.0, model_name='gpt-4')\n",
192
+ "chat"
193
+ ]
194
+ },
195
+ {
196
+ "cell_type": "markdown",
197
+ "source": [
198
+ "### Chatbot Prompts"
199
+ ],
200
+ "metadata": {
201
+ "id": "2tTNiyU-ZcDU"
202
+ }
203
+ },
204
+ {
205
+ "cell_type": "code",
206
+ "execution_count": 15,
207
+ "metadata": {
208
+ "colab": {
209
+ "base_uri": "https://localhost:8080/"
210
+ },
211
+ "id": "r-VmK_7vHrmw",
212
+ "outputId": "8c314a8f-dad5-47b9-ddba-f6009a73b80d"
213
+ },
214
+ "outputs": [
215
+ {
216
+ "output_type": "execute_result",
217
+ "data": {
218
+ "text/plain": [
219
+ "['history', 'input', 'instruction', 'questions']"
220
+ ]
221
+ },
222
+ "metadata": {},
223
+ "execution_count": 15
224
+ }
225
+ ],
226
+ "source": [
227
+ "template_string3 = \"\"\"\n",
228
+ "Please ask me the following questions in sequence, and after I provide the answer, \\\n",
229
+ "please give me some feedback. Here is the instruction for feedback: {instruction}. If no instruction is provided, please provide feedback based on your judgement. \\\n",
230
+ "Just ask me the question, and please do not show any other text (no need for greetings for example) \\\n",
231
+ "Here are the questions that you can will me: {questions}. \\\n",
232
+ "Here are the chat history: {history}. \\\n",
233
+ "{input}\n",
234
+ "\n",
235
+ "Once all questions are answered, thank the user and give overall feedback for the question answering part.\n",
236
+ "\"\"\"\n",
237
+ "prompt_template3 = ChatPromptTemplate.from_template(template_string3)\n",
238
+ "prompt_template3.messages[0].prompt.input_variables"
239
+ ]
240
+ },
241
+ {
242
+ "cell_type": "code",
243
+ "execution_count": 16,
244
+ "metadata": {
245
+ "colab": {
246
+ "base_uri": "https://localhost:8080/",
247
+ "height": 72
248
+ },
249
+ "id": "XaK7D5B4bMYv",
250
+ "outputId": "55f495b5-d8d0-4a80-b5c4-128dba34eebe"
251
+ },
252
+ "outputs": [
253
+ {
254
+ "output_type": "execute_result",
255
+ "data": {
256
+ "text/plain": [
257
+ "'\\nPlease ask me the following questions in sequence, and after I provide the answer, please give me some feedback. Here is the instruction for feedback: {instruction}. If no instruction is provided, please provide feedback based on your judgement. Just ask me the question, and please do not show any other text (no need for greetings for example) Here are the questions that you can will me: {questions}. Here are the chat history: {history}. {input}\\n\\nOnce all questions are answered, thank the user and give overall feedback for the question answering part.\\n'"
258
+ ],
259
+ "application/vnd.google.colaboratory.intrinsic+json": {
260
+ "type": "string"
261
+ }
262
+ },
263
+ "metadata": {},
264
+ "execution_count": 16
265
+ }
266
+ ],
267
+ "source": [
268
+ "prompt_template3.messages[0].prompt.template"
269
+ ]
270
+ },
271
+ {
272
+ "cell_type": "markdown",
273
+ "metadata": {
274
+ "id": "l4o8R5eUE1n8"
275
+ },
276
+ "source": [
277
+ "### Functions"
278
+ ]
279
+ },
280
+ {
281
+ "cell_type": "code",
282
+ "execution_count": 7,
283
+ "metadata": {
284
+ "id": "ABN0X9xQHeii"
285
+ },
286
+ "outputs": [],
287
+ "source": [
288
+ "def embed_key(openai_api_key):\n",
289
+ " os.environ[\"OPENAI_API_KEY\"] = openai_api_key\n",
290
+ "\n",
291
+ "def transcribe(audio_file_path):\n",
292
+ " try:\n",
293
+ " with open(audio_file_path, \"rb\") as audio_file:\n",
294
+ " # Call OpenAI's Whisper model for transcription\n",
295
+ " transcript = openai.Audio.transcribe(\"whisper-1\", audio_file)\n",
296
+ " transcribed_text = transcript[\"text\"]\n",
297
+ " return transcribed_text\n",
298
+ " except:\n",
299
+ " return \"Your answer will be transcribed here\"\n",
300
+ "\n",
301
+ "def process_file(files):\n",
302
+ " for file in files:\n",
303
+ " try:\n",
304
+ " extension = file.name.split('.')[-1].lower()\n",
305
+ " if extension == 'docx':\n",
306
+ " doc = Document(file.name)\n",
307
+ " full_text = []\n",
308
+ " for paragraph in doc.paragraphs:\n",
309
+ " full_text.append(paragraph.text)\n",
310
+ " return '\\n'.join(full_text)\n",
311
+ "\n",
312
+ " elif extension == 'pdf':\n",
313
+ " pdf_file = open(file.name, 'rb')\n",
314
+ " reader = PyPDF2.PdfReader(pdf_file)\n",
315
+ " num_pages = len(reader.pages)\n",
316
+ " full_text = []\n",
317
+ " for page in range(num_pages):\n",
318
+ " page_obj = reader.pages[page]\n",
319
+ " full_text.append(page_obj.extract_text())\n",
320
+ " pdf_file.close()\n",
321
+ " return '\\n'.join(full_text)\n",
322
+ "\n",
323
+ " elif extension == 'txt':\n",
324
+ " with open(file.name, 'r') as txt_file:\n",
325
+ " full_text = txt_file.read()\n",
326
+ " return full_text\n",
327
+ "\n",
328
+ " else:\n",
329
+ " return \"Unsupported file type\"\n",
330
+ " except FileNotFoundError:\n",
331
+ " return \"File not found\"\n",
332
+ " except PermissionError:\n",
333
+ " return \"Permission denied\"\n",
334
+ "\n",
335
+ "def generate_questions(text, prompt):\n",
336
+ " test_input1 = question_template.format_messages(\n",
337
+ " context = text,\n",
338
+ " pre_prompt = prompt)\n",
339
+ "\n",
340
+ " response = chat(test_input1)\n",
341
+ " return response.content\n",
342
+ "\n",
343
+ "\n",
344
+ "def ai_evaluate(context, audio_transcript, QA, instructions):\n",
345
+ " test_input1 = evaluate_template.format_messages(\n",
346
+ " context = context,\n",
347
+ " transcript = audio_transcript,\n",
348
+ " QA = QA,\n",
349
+ " instructions = instructions)\n",
350
+ "\n",
351
+ " response = chat(test_input1)\n",
352
+ " return response.content\n",
353
+ "\n",
354
+ "def upload_file(files):\n",
355
+ " file_paths = [file.name for file in files]\n",
356
+ " return file_paths\n",
357
+ "\n",
358
+ "def use_these_questions(input):\n",
359
+ " return input\n",
360
+ "\n",
361
+ "################################\n",
362
+ "\n",
363
+ "def add_text(history, text, prompt = template_string3):\n",
364
+ " new_history = [(prompt, None)] + history + [(text, None)]\n",
365
+ " return new_history, gr.update(value=\"\", interactive=False)\n",
366
+ "\n",
367
+ "# def add_file(history, file):\n",
368
+ "# history = history + [((file.name,), None)]\n",
369
+ "# return history\n",
370
+ "\n",
371
+ "\n",
372
+ "def bot_initialize(input, instruction_feedback, questions_used, history):\n",
373
+ "\n",
374
+ " template_string3 = \"\"\"\n",
375
+ " Please ask me the following questions in sequence, and after I provide the answer, \\\n",
376
+ " please give me some feedback. Here is the instruction for feedback: {instruction}. If no instruction is provided, please provide feedback based on your judgement. \\\n",
377
+ " Here are the questions that you can ask me: {questions}. \\\n",
378
+ " Here are the chat history: {history}. \\\n",
379
+ " {input} \\\n",
380
+ "\n",
381
+ " *** Remember, just ask me the question, give feedbacks, and ask the next questions. Do not forget to ask the next question after feedbacks. \\\n",
382
+ " \"\"\"\n",
383
+ " prompt_template3 = ChatPromptTemplate.from_template(template_string3)\n",
384
+ "\n",
385
+ " test_input1 = prompt_template3.format_messages(\n",
386
+ " instruction = instruction_feedback,\n",
387
+ " history = history,\n",
388
+ " questions = questions_used,\n",
389
+ " input = input)\n",
390
+ "\n",
391
+ " response = chat(test_input1)\n",
392
+ " return response.content\n",
393
+ "\n",
394
+ "# def initialize(instruction_feedback, questions_used, chat_history, ready):\n",
395
+ "# test_input1 = prompt_template3.format_messages(\n",
396
+ "# instruction = instruction_feedback,\n",
397
+ "# chat_history = chat_history,\n",
398
+ "# questions = questions_used,\n",
399
+ "# ready = ready)\n",
400
+ "# response = chat(test_input1)\n",
401
+ "# return response.content\n",
402
+ "\n",
403
+ "# def bot(history):\n",
404
+ "# response = \"**That's cool!**\"\n",
405
+ "# history[-1][1] = \"\"\n",
406
+ "# for character in response:\n",
407
+ "# history[-1][1] += character\n",
408
+ "# time.sleep(0.05)\n",
409
+ "# yield history\n",
410
+ "\n",
411
+ "def message_and_history(input, instruction_feedback, questions_used, history):\n",
412
+ " history = history or []\n",
413
+ " s = list(sum(history, ()))\n",
414
+ " s.append(input)\n",
415
+ " inp = ' '.join(s)\n",
416
+ " output = bot_initialize(inp, instruction_feedback, questions_used, history)\n",
417
+ " history.append((input, output))\n",
418
+ " return history, history\n",
419
+ "\n",
420
+ "def prompt_select(selection, number, length):\n",
421
+ " if selection == \"Random\":\n",
422
+ " prompt = f\"Please design a {number} question quiz based on the context provided and the inputted learning objectives (if applicable).\"\n",
423
+ " elif selection == \"Fill in the Blank\":\n",
424
+ " prompt = f\"Create a {number} question fill in the blank quiz refrencing the context provided. The quiz should reflect the learning objectives (if inputted). The 'blank' part of the question should appear as '________'. The answers should reflect what word(s) should go in the blank an accurate statement. An example is the follow: 'The author of the article is ______.' The question should be a statement.\"\n",
425
+ " elif selection == \"Short Answer\":\n",
426
+ " prompt = f\"Please design a {number} question quiz about which reflects the learning objectives (if inputted). The questions should be short answer. Expect the correct answers to be {length} sentences long.\"\n",
427
+ " else:\n",
428
+ " prompt = f\"Please design a {number} question {selection.lower()} quiz based on the context provided and the inputted learning objectives (if applicable).\"\n",
429
+ " return prompt\n",
430
+ "\n",
431
+ "# def prompt_select(selection, number, length):\n",
432
+ "# if selection == \"Random\":\n",
433
+ "# prompt = f\"Please design a {number} question quiz based on the context provided and the inputted learning objectives (if applicable). The types of questions should be randomized (including multiple choice, short answer, true/false, short answer, etc.). Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide 1 question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect, and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right.\"\n",
434
+ "# elif selection == \"Fill in the Blank\":\n",
435
+ "# prompt = f\"Create a {number} question fill in the blank quiz refrencing the context provided. The quiz should reflect the learning objectives (if inputted). The 'blank' part of the question should appear as '________'. The answers should reflect what word(s) should go in the blank an accurate statement. An example is the follow: 'The author of the article is ______.' The question should be a statement. Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide ONE question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect,and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right.\"\n",
436
+ "# elif selection == \"Short Answer\":\n",
437
+ "# prompt = f\"Please design a {number} question quiz about which reflects the learning objectives (if inputted). The questions should be short answer. Expect the correct answers to be {length} sentences long. Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide ONE question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect, and then give me additional chances to respond until I get the correct choice. Explain why the correct answer is right.\"\n",
438
+ "# else:\n",
439
+ "# prompt = f\"Please design a {number} question {selection.lower()} quiz based on the context provided and the inputted learning objectives (if applicable). Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide 1 question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect, and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right.\"\n",
440
+ "# return prompt"
441
+ ]
442
+ },
443
+ {
444
+ "cell_type": "markdown",
445
+ "metadata": {
446
+ "id": "8PzIpcfg4-X0"
447
+ },
448
+ "source": [
449
+ "## Integrate Prompts from LO project"
450
+ ]
451
+ },
452
+ {
453
+ "cell_type": "markdown",
454
+ "metadata": {
455
+ "id": "E5vWxcm25EAC"
456
+ },
457
+ "source": [
458
+ "### Creating a Chain for Short Answer Generation"
459
+ ]
460
+ },
461
+ {
462
+ "cell_type": "markdown",
463
+ "metadata": {
464
+ "id": "y95FExV-5IqI"
465
+ },
466
+ "source": [
467
+ "In this example, the context would include the poem \"The Road Not Taken\" by Robert Frost"
468
+ ]
469
+ },
470
+ {
471
+ "cell_type": "code",
472
+ "execution_count": 8,
473
+ "metadata": {
474
+ "id": "j_qEXDWQ5RSW"
475
+ },
476
+ "outputs": [],
477
+ "source": [
478
+ "# This is what I used to test the function 'generate_questions_v2'\n",
479
+ "template_string = \"\"\"\n",
480
+ "You are a world-class tutor helping students to perform better on oral and written exams though interactive experiences.\"\n",
481
+ "\n",
482
+ "The following text should be used as the basis for the instructions which follow: {context} \\\n",
483
+ "\n",
484
+ "The following is the guideline for generating the questiion: {pre_prompt} \\\n",
485
+ "\n",
486
+ "The output should be formatted as following:\n",
487
+ "\n",
488
+ "Question 1: ...\n",
489
+ "Question 2: ...\n",
490
+ "Question 3: ...\n",
491
+ "...\n",
492
+ "\"\"\""
493
+ ]
494
+ },
495
+ {
496
+ "cell_type": "code",
497
+ "execution_count": 9,
498
+ "metadata": {
499
+ "colab": {
500
+ "base_uri": "https://localhost:8080/"
501
+ },
502
+ "id": "yWOM1XdC5UhQ",
503
+ "outputId": "69d781f7-fb0c-4dde-9085-ddb9908b82af"
504
+ },
505
+ "outputs": [
506
+ {
507
+ "output_type": "execute_result",
508
+ "data": {
509
+ "text/plain": [
510
+ "['context', 'pre_prompt']"
511
+ ]
512
+ },
513
+ "metadata": {},
514
+ "execution_count": 9
515
+ }
516
+ ],
517
+ "source": [
518
+ "question_template = ChatPromptTemplate.from_template(template_string)\n",
519
+ "question_template.messages[0].prompt.input_variables"
520
+ ]
521
+ },
522
+ {
523
+ "cell_type": "code",
524
+ "execution_count": 10,
525
+ "metadata": {
526
+ "id": "4Mc1ZC3jaydQ"
527
+ },
528
+ "outputs": [],
529
+ "source": [
530
+ "# @title\n",
531
+ "con = \"\"\" Two roads diverged in a yellow wood,\n",
532
+ "And sorry I could not travel both\n",
533
+ "And be one traveler, long I stood\n",
534
+ "And looked down one as far as I could\n",
535
+ "To where it bent in the undergrowth;\n",
536
+ "Then took the other, as just as fair,\n",
537
+ "And having perhaps the better claim,\n",
538
+ "Because it was grassy and wanted wear;\n",
539
+ "Though as for that the passing there\n",
540
+ "Had worn them really about the same,\n",
541
+ "And both that morning equally lay\n",
542
+ "In leaves no step had trodden black.\n",
543
+ "Oh, I kept the first for another day!\n",
544
+ "Yet knowing how way leads on to way,\n",
545
+ "I doubted if I should ever come back.\n",
546
+ "I shall be telling this with a sigh\n",
547
+ "Somewhere ages and ages hence:\n",
548
+ "Two roads diverged in a wood, and I—\n",
549
+ "I took the one less traveled by,\n",
550
+ "And that has made all the difference.\n",
551
+ "—-Robert Frost—-\n",
552
+ "Education Place: http://www.eduplace.com \"\"\"\n",
553
+ "\n",
554
+ "pre = \"Please design a 3 question quiz about which reflects the learning objectives (if inputted). The questions should be short answer. Expect the correct answers to be sentences long. Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide ONE question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect, and then give me additional chances to respond until I get the correct choice. Explain why the correct answer is right.\"\n"
555
+ ]
556
+ },
557
+ {
558
+ "cell_type": "code",
559
+ "execution_count": 11,
560
+ "metadata": {
561
+ "colab": {
562
+ "base_uri": "https://localhost:8080/",
563
+ "height": 36
564
+ },
565
+ "id": "KXssPFyEbG3f",
566
+ "outputId": "8a5389f7-3ec6-4332-d9ae-f45ff5781caf"
567
+ },
568
+ "outputs": [
569
+ {
570
+ "output_type": "execute_result",
571
+ "data": {
572
+ "text/plain": [
573
+ "'Question 1: What is the main theme of Robert Frost\\'s poem \"The Road Not Taken\"?'"
574
+ ],
575
+ "application/vnd.google.colaboratory.intrinsic+json": {
576
+ "type": "string"
577
+ }
578
+ },
579
+ "metadata": {},
580
+ "execution_count": 11
581
+ }
582
+ ],
583
+ "source": [
584
+ "generate_questions(con,pre)"
585
+ ]
586
+ },
587
+ {
588
+ "cell_type": "markdown",
589
+ "metadata": {
590
+ "id": "DMTybR3PVuoC"
591
+ },
592
+ "source": [
593
+ "### Creating a Chain for AI Evaluation"
594
+ ]
595
+ },
596
+ {
597
+ "cell_type": "code",
598
+ "execution_count": 12,
599
+ "metadata": {
600
+ "id": "Wc-3XAFQVxO_"
601
+ },
602
+ "outputs": [],
603
+ "source": [
604
+ "template_evaluation = \"\"\"\n",
605
+ "Given\n",
606
+ "1. The follwing context of the oral exam/presentation: {context} \\\n",
607
+ "\n",
608
+ "2. The answer from the student: {transcript} \\\n",
609
+ "\n",
610
+ "3. The Questions asked to the student and student answers {QA} \\\n",
611
+ "\n",
612
+ "Please evaluate the students performance based on {instructions} \\\n",
613
+ "\n",
614
+ "If no instruction is provided, you can evaluate based on your judgement of the students performance.\n",
615
+ "\n",
616
+ "\"\"\""
617
+ ]
618
+ },
619
+ {
620
+ "cell_type": "code",
621
+ "execution_count": 13,
622
+ "metadata": {
623
+ "colab": {
624
+ "base_uri": "https://localhost:8080/"
625
+ },
626
+ "id": "FZXeYNSVVy5g",
627
+ "outputId": "94a1a4be-03a6-4c20-97b8-4333910991fa"
628
+ },
629
+ "outputs": [
630
+ {
631
+ "output_type": "execute_result",
632
+ "data": {
633
+ "text/plain": [
634
+ "['QA', 'context', 'instructions', 'transcript']"
635
+ ]
636
+ },
637
+ "metadata": {},
638
+ "execution_count": 13
639
+ }
640
+ ],
641
+ "source": [
642
+ "# @title\n",
643
+ "evaluate_template = ChatPromptTemplate.from_template(template_evaluation)\n",
644
+ "evaluate_template.messages[0].prompt.input_variables"
645
+ ]
646
+ },
647
+ {
648
+ "cell_type": "markdown",
649
+ "metadata": {
650
+ "id": "a3WUL_hFyMkr"
651
+ },
652
+ "source": [
653
+ "### Test process_file"
654
+ ]
655
+ },
656
+ {
657
+ "cell_type": "code",
658
+ "execution_count": null,
659
+ "metadata": {
660
+ "colab": {
661
+ "base_uri": "https://localhost:8080/",
662
+ "height": 157
663
+ },
664
+ "id": "LJX1AKTMyVm8",
665
+ "outputId": "24f8de7a-f456-47bc-b3b5-0284bbd7076f"
666
+ },
667
+ "outputs": [
668
+ {
669
+ "data": {
670
+ "application/vnd.google.colaboratory.intrinsic+json": {
671
+ "type": "string"
672
+ },
673
+ "text/plain": [
674
+ "\"\\ufeffHello,\\n\\n\\nWe are so excited for this semester’s partnership with Data Science Institute and Next Steps at Vanderbilt. Jonathan Wade will be interning Mondays and Wednesdays 2-5 pm and Fridays 12-4 pm starting Monday, January 31st. Jessica will be job coaching on Mondays and Wednesdays from 2-5 pm. It also used to be on Fridays from 2-4 pm but not anymore.\\n\\n\\nBelow is important information and reminders:\\n\\n\\nLocation: 1400 18th Ave S Building, Suite 2000, Nashville, TN 37212\\n\\n\\nDress: Business casual attire (This includes items such as dress pants, khakis, polos and dress shirts)\\n\\n\\nImportant Dates:\\n\\n\\nVanderbilt University is off for Spring Break March 5th - March 13th. No internships will take place these days.\\n\\n\\nAll internships end by Thursday, April 28th.\\n\\n\\nImportant COVID-19 Information: Attached is a document outlining the COVID-19 guidelines all Vanderbilt University students must follow, including Next Steps interns and job coaches while at internship sites. Please note these may change given the evolving nature of the pandemic and any changes will be communicated to internship sites, interns, and job coaches as needed.\\n\\n\\nCareer Development Resource Guide: I am also attaching the Next Steps Career Development Resource guide that outlines student expectations and provides helpful information and resources for site supervisors.\\n\\n\\nInternship Coordinator: Lynda Tricia is the coordinator for this internship and is the main point of contact for any questions or concerns that arise.\\n\\n\\nFinally, below you will find everyone's contact information for your convenience.\\n\\n\\nContacts:\\n\\n\\nIntern: Jonathan Wade, jonathan.j.wade@vanderbilt.edu, 613-472-3867\\n\\n\\nSupervisor: Ruben Miller, ruben.k.miller@vanderbilt.edu, 216-574-3176\\n\\n\\nJob Coach: Jessica Cho, jessica.cho@vanderbilt.edu, 615-999-1134\\n\\n\\nInternship Coordinator: Lynda Tricia, lynda.z.tricia@vanderbilt.edu, 606-415-9999\\n\\n\\nPlease let us know if there are any questions. Thank you!\\n\\n\\n\\n\\nNext Steps at Vanderbilt - Safety Guidelines for Internships\\nMore information on Vanderbilt’s Health and Safety Protocols can be found here: https://www.vanderbilt.edu/coronavirus/community/undergraduate-students/\\nMasks: All Next Steps interns and job coaches must wear masks indoors at internships even if the jobsite does not require masks.\\nInterns and job coaches should have a well-fitted mask that completely covers your nose and mouth, preferably a KN95, KF94 or FFP2 version.\\nLunch Breaks: If an intern or job coach needs a lunch break, they can remove their mask just when they eat or drink. They must be physically distanced from other co-workers when eating. \\nSymptom Monitoring: All interns and job coaches must be free of ANY symptoms related to COVID-19 to go to the internship. If an intern or job coach has symptoms, they should stay home, notify the Next Steps staff and internship supervisor, and get tested at Vanderbilt Student Health or with their medical provider.\\nAccording to the CDC, symptoms may appear 2 to 14 days after exposure to the virus. These include:\\n* Fever or chills\\n* Cough\\n* Shortness of breath or difficulty breathing\\n* Fatigue\\n* Muscle or body aches\\n* Headache\\n* New loss of taste or smell\\n* Sore throat\\n* Congestion or runny nose\\n* Nausea or vomiting\\n* Diarrhea\\n\\n\\nIf an intern or job coach tests positive: If an intern or job coach receives a COVID-19 positive test result, regardless of vaccination status, they should complete the following webform. The webform goes directly to the Vanderbilt Command Center.\\nThe intern or job coach will receive direct communication from the Command Center about their isolation (if they tested positive) or quarantine period (if considered a close contact) and will be instructed to contact Student Health or Occupational Health if they develop symptoms.\\nClose Contact/Quarantine: Interns or job coaches who are a close contact of someone who tests positive should complete the following webform. The webform goes directly to the Command Center.\\n* Close contacts who are unvaccinated will quarantine for 10 days based on CDC guidance.\\no Additional requirements are in place for days 10 to 14 following exposure including:\\n* For days 10 to 14 after last exposure, unvaccinated Vanderbilt community members identified as close contacts must not unmask at any time in public.\\n* Individuals should eat alone or complete any activities alone that require removing a mask in a private space during those four days between day 10-14.\\n* Close contacts who are vaccinated and asymptomatic will not have to quarantine but are recommended to monitor their symptoms and to get a COVID-19 test 5-7 days after last exposure. If asymptomatic, testing can be done at the VU testing center. If individuals develop symptoms, they should test at Student Health, Occupational Health or VUMC or other testing location in the community.\\n* Close contacts who are vaccinated and symptomatic may have to quarantine based on severity of symptoms and specific living situations. This determination will be made by their medical provider in consultation with the Command Center.\\nJob Coaching Supports: If a job coach tests positive and is unable to provide on-site supports, Next Steps staff will follow the procedures outlined below.\\n1. Identify another job coach or Next Steps staff member who can provide job coaching on-site during some of the student’s internship hours.\\n2. Identify another job coach or staff member who can check-in virtually with the student and supervisor during their shift. \\n3. Or, work with the student and supervisor to ensure natural supports are in place if the student must work without support of a job coach during that time period.\""
675
+ ]
676
+ },
677
+ "execution_count": 69,
678
+ "metadata": {},
679
+ "output_type": "execute_result"
680
+ }
681
+ ],
682
+ "source": [
683
+ "# Might need some way to make pdf file to load more readable\n",
684
+ "# process_file('/content/instrutor_note.docx')\n",
685
+ "# process_file('/content/Big Data & Economics.pdf')\n",
686
+ "# process_file('/content/Big Data & Economics.pdf')\n",
687
+ "process_file('/content/Anonymized Job Coach QA Test Doc (1).txt')"
688
+ ]
689
+ },
690
+ {
691
+ "cell_type": "markdown",
692
+ "metadata": {
693
+ "id": "M6IzVTjz5cex"
694
+ },
695
+ "source": [
696
+ "## UI Design\n"
697
+ ]
698
+ },
699
+ {
700
+ "cell_type": "markdown",
701
+ "metadata": {
702
+ "id": "u2SY4Akt_t8h"
703
+ },
704
+ "source": [
705
+ "### Chatbot V2"
706
+ ]
707
+ },
708
+ {
709
+ "cell_type": "code",
710
+ "execution_count": 17,
711
+ "metadata": {
712
+ "colab": {
713
+ "base_uri": "https://localhost:8080/",
714
+ "height": 853
715
+ },
716
+ "id": "6ENnsKlD_uOC",
717
+ "outputId": "7b694aa8-e314-4502-a590-3245ebc3e1e0"
718
+ },
719
+ "outputs": [
720
+ {
721
+ "output_type": "stream",
722
+ "name": "stdout",
723
+ "text": [
724
+ "Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().\n",
725
+ "Note: opening Chrome Inspector may crash demo inside Colab notebooks.\n",
726
+ "\n",
727
+ "To create a public link, set `share=True` in `launch()`.\n"
728
+ ]
729
+ },
730
+ {
731
+ "output_type": "display_data",
732
+ "data": {
733
+ "text/plain": [
734
+ "<IPython.core.display.Javascript object>"
735
+ ],
736
+ "application/javascript": [
737
+ "(async (port, path, width, height, cache, element) => {\n",
738
+ " if (!google.colab.kernel.accessAllowed && !cache) {\n",
739
+ " return;\n",
740
+ " }\n",
741
+ " element.appendChild(document.createTextNode(''));\n",
742
+ " const url = await google.colab.kernel.proxyPort(port, {cache});\n",
743
+ "\n",
744
+ " const external_link = document.createElement('div');\n",
745
+ " external_link.innerHTML = `\n",
746
+ " <div style=\"font-family: monospace; margin-bottom: 0.5rem\">\n",
747
+ " Running on <a href=${new URL(path, url).toString()} target=\"_blank\">\n",
748
+ " https://localhost:${port}${path}\n",
749
+ " </a>\n",
750
+ " </div>\n",
751
+ " `;\n",
752
+ " element.appendChild(external_link);\n",
753
+ "\n",
754
+ " const iframe = document.createElement('iframe');\n",
755
+ " iframe.src = new URL(path, url).toString();\n",
756
+ " iframe.height = height;\n",
757
+ " iframe.allow = \"autoplay; camera; microphone; clipboard-read; clipboard-write;\"\n",
758
+ " iframe.width = width;\n",
759
+ " iframe.style.border = 0;\n",
760
+ " element.appendChild(iframe);\n",
761
+ " })(7860, \"/\", \"100%\", 500, false, window.element)"
762
+ ]
763
+ },
764
+ "metadata": {}
765
+ },
766
+ {
767
+ "output_type": "stream",
768
+ "name": "stderr",
769
+ "text": [
770
+ "/usr/local/lib/python3.10/dist-packages/gradio/processing_utils.py:188: UserWarning: Trying to convert audio automatically from int32 to 16-bit int format.\n",
771
+ " warnings.warn(warning.format(data.dtype))\n",
772
+ "/usr/local/lib/python3.10/dist-packages/gradio/processing_utils.py:188: UserWarning: Trying to convert audio automatically from int32 to 16-bit int format.\n",
773
+ " warnings.warn(warning.format(data.dtype))\n",
774
+ "/usr/local/lib/python3.10/dist-packages/gradio/processing_utils.py:188: UserWarning: Trying to convert audio automatically from int32 to 16-bit int format.\n",
775
+ " warnings.warn(warning.format(data.dtype))\n",
776
+ "/usr/local/lib/python3.10/dist-packages/gradio/processing_utils.py:188: UserWarning: Trying to convert audio automatically from int32 to 16-bit int format.\n",
777
+ " warnings.warn(warning.format(data.dtype))\n",
778
+ "/usr/local/lib/python3.10/dist-packages/gradio/processing_utils.py:188: UserWarning: Trying to convert audio automatically from int32 to 16-bit int format.\n",
779
+ " warnings.warn(warning.format(data.dtype))\n"
780
+ ]
781
+ },
782
+ {
783
+ "output_type": "stream",
784
+ "name": "stdout",
785
+ "text": [
786
+ "Keyboard interruption in main thread... closing server.\n"
787
+ ]
788
+ },
789
+ {
790
+ "output_type": "execute_result",
791
+ "data": {
792
+ "text/plain": [
793
+ "'\\nWhat Are the Different Types of Machine Learning?\\nHow Do You Handle Missing or Corrupted Data in a Dataset?\\nHow Can You Choose a Classifier Based on a Training Set Data Size?\\nExplain the Confusion Matrix with Respect to Machine Learning Algorithms.\\nWhat Are the Differences Between Machine Learning and Deep Learning\\n'"
794
+ ],
795
+ "application/vnd.google.colaboratory.intrinsic+json": {
796
+ "type": "string"
797
+ }
798
+ },
799
+ "metadata": {},
800
+ "execution_count": 17
801
+ }
802
+ ],
803
+ "source": [
804
+ "with gr.Blocks(theme=gr.themes.Monochrome()) as demo:\n",
805
+ " gr.Markdown(\"# Oral Exam App\")\n",
806
+ " gr.Markdown(\"## OpenAI API key\")\n",
807
+ " with gr.Box():\n",
808
+ " gr.HTML(\"\"\"Embed your OpenAI API key below; if you haven't created one already, visit\n",
809
+ " platform.openai.com/account/api-keys\n",
810
+ " to sign up for an account and get your personal API key\"\"\",\n",
811
+ " elem_classes=\"textbox_label\")\n",
812
+ " input = gr.Textbox(show_label=False, type=\"password\", container=False,\n",
813
+ " placeholder=\"●●●●●●●●●●●●●●●●●\")\n",
814
+ " input.change(fn=embed_key, inputs=input, outputs=None)\n",
815
+ "\n",
816
+ " with gr.Blocks():\n",
817
+ " #########################\n",
818
+ " #########Context#########\n",
819
+ " #########################\n",
820
+ " with gr.Accordion(\"Context section\"):\n",
821
+ " ### Should also allow vector stores\n",
822
+ " gr.Markdown(\"## Please upload the context document(s) for Oral exam\")\n",
823
+ " context_input = gr.File(label=\"Click to upload context file\",\n",
824
+ " file_count=\"multiple\",\n",
825
+ " file_types=[\".txt\", \".docx\", \".pdf\"])\n",
826
+ " outputs_context=gr.Textbox(label=\"Context\")\n",
827
+ " context_input.change(fn=process_file, inputs=context_input, outputs=outputs_context)\n",
828
+ " # upload_button = gr.Button(value=\"Show context\")\n",
829
+ " # upload_button.click(process_file, context_input, outputs_context)\n",
830
+ "\n",
831
+ " with gr.Blocks():\n",
832
+ " gr.Markdown(\"\"\"\n",
833
+ " ## Generate a Premade Prompt\n",
834
+ " Select your type and number of desired questions. Click \"Generate Prompt\" to get your premade prompt,\n",
835
+ " and then \"Insert Prompt into Chat\" to copy the text into the chat interface below. \\\n",
836
+ " You can also copy the prompt using the icon in the upper right corner and paste directly into the input box when interacting with the model.\n",
837
+ " \"\"\")\n",
838
+ " with gr.Row():\n",
839
+ " with gr.Column():\n",
840
+ " question_type = gr.Dropdown([\"Multiple Choice\", \"True or False\", \"Short Answer\", \"Fill in the Blank\", \"Random\"], label=\"Question Type\")\n",
841
+ " number_of_questions = gr.Textbox(label=\"Enter desired number of questions\")\n",
842
+ " sa_desired_length = gr.Dropdown([\"1-2\", \"3-4\", \"5-6\", \"6 or more\"], label = \"For short answer questions only, choose the desired sentence length for answers. The default value is 1-2 sentences.\")\n",
843
+ " with gr.Column():\n",
844
+ " prompt_button = gr.Button(\"Generate Prompt\")\n",
845
+ " premade_prompt_output = gr.Textbox(label=\"Generated prompt (save or copy)\", show_copy_button=True)\n",
846
+ " prompt_button.click(prompt_select,\n",
847
+ " inputs=[question_type, number_of_questions, sa_desired_length],\n",
848
+ " outputs=premade_prompt_output)\n",
849
+ "\n",
850
+ " #########################\n",
851
+ " #######Main Audio########\n",
852
+ " #########################\n",
853
+ " with gr.Accordion(\"Main audio section\"):\n",
854
+ " gr.Markdown(\"## Upload your audio file or start recording\")\n",
855
+ "\n",
856
+ " with gr.Column():\n",
857
+ " with gr.Row():\n",
858
+ " file_input = gr.Audio(label=\"Upload Audio\", source=\"upload\", type=\"filepath\")\n",
859
+ " record_inputs = gr.Audio(label=\"Record Audio\", source=\"microphone\", type=\"filepath\")\n",
860
+ "\n",
861
+ " gr.Markdown(\"## Transcribe the audio uploaded or recorded\")\n",
862
+ " outputs_transcribe=gr.Textbox(label=\"Transcription\")\n",
863
+ "\n",
864
+ " file_input.change(fn=transcribe, inputs=file_input, outputs=outputs_transcribe)\n",
865
+ " record_inputs.change(fn=transcribe, inputs=record_inputs, outputs=outputs_transcribe)\n",
866
+ "\n",
867
+ " # #########################\n",
868
+ " # ###Question Generation###\n",
869
+ " # #########################\n",
870
+ " # with gr.Accordion(\"Question section\"):\n",
871
+ " # gr.Markdown(\"## Questions\")\n",
872
+ " # with gr.Column():\n",
873
+ " # outputs_qa=gr.Textbox(label=\"Generate questions or Use your own questions\")\n",
874
+ " # btn3 = gr.Button(value=\"Generate questions\")\n",
875
+ " # btn3.click(generate_questions, inputs=context_input, outputs=outputs_qa)\n",
876
+ "\n",
877
+ "\n",
878
+ " ########################\n",
879
+ " ##Question Generation###\n",
880
+ " ########################\n",
881
+ " with gr.Accordion(\"Question section\"):\n",
882
+ " gr.Markdown(\"## Questions\")\n",
883
+ " with gr.Row():\n",
884
+ " with gr.Column():\n",
885
+ " outputs_qa=gr.Textbox(label=\"Generate questions or Use your own questions\")\n",
886
+ " btn1 = gr.Button(value=\"Generate questions\")\n",
887
+ " btn1.click(generate_questions, inputs=[outputs_context, premade_prompt_output], outputs=outputs_qa)\n",
888
+ "\n",
889
+ " # with gr.Column():\n",
890
+ " # submit_question=gr.Textbox(label=\"Use existing questions\")\n",
891
+ " # btn4 = gr.Button(value=\"Use these questions\")\n",
892
+ " # btn4.click(use_this_question, inputs=outputs_transcribe, outputs=None)\n",
893
+ "\n",
894
+ "\n",
895
+ " #########################\n",
896
+ " #######Instruction#######\n",
897
+ " #########################\n",
898
+ " instruction_qa_input = gr.File(label=\"Click to upload instruction file\",\n",
899
+ " file_count=\"multiple\",\n",
900
+ " file_types=[\".txt\", \".docx\", \".pdf\"])\n",
901
+ " instruction_qa=gr.Textbox(label=\"Or please enter the instruction for question/answering section\")\n",
902
+ " instruction_qa.change(fn=process_file, inputs=context_input, outputs=outputs_context)\n",
903
+ "\n",
904
+ "\n",
905
+ " #########################\n",
906
+ " #########Audio QA########\n",
907
+ " #########################\n",
908
+ " with gr.Accordion(\"Audio QA section\"):\n",
909
+ " gr.Markdown(\"## Question answering\")\n",
910
+ " gr.Markdown(\"### When you are ready to answer questions, press the 'I am ready' button\")\n",
911
+ " ##### This may be iterative\n",
912
+ " chatbot = gr.Chatbot([],\n",
913
+ " elem_id=\"chatbot\",\n",
914
+ " height=300)\n",
915
+ " state = gr.State()\n",
916
+ " message = gr.Textbox(show_label=False,\n",
917
+ " placeholder=\"Your answer will be transcribed here\",\n",
918
+ " container=False)\n",
919
+ " ready_button = gr.Button(value=\"I am ready\")\n",
920
+ " ready_button.click(message_and_history, inputs=[message, instruction_qa, outputs_qa, state], outputs=[chatbot, state])\n",
921
+ "\n",
922
+ " hidden = gr.Textbox(visible = False)\n",
923
+ " btn_record = gr.Audio(label=\"Record Audio\", source=\"microphone\", type=\"filepath\")\n",
924
+ " btn_record.change(fn=transcribe, inputs=btn_record, outputs=message)\n",
925
+ " btn_record.clear(use_these_questions, inputs = hidden, outputs = message)\n",
926
+ "\n",
927
+ " submit = gr.Button(\"Submit\")\n",
928
+ " submit.click(message_and_history,\n",
929
+ " inputs=[message, instruction_qa, outputs_qa, state],\n",
930
+ " outputs=[chatbot, state])\n",
931
+ "\n",
932
+ " message_records = gr.Textbox(show_label=False,\n",
933
+ " container=False)\n",
934
+ " show_records = gr.Button(\"Show QA history\")\n",
935
+ " show_records.click(use_these_questions,\n",
936
+ " inputs=state,\n",
937
+ " outputs=message_records)\n",
938
+ "\n",
939
+ " #########################\n",
940
+ " #######Evaluation########\n",
941
+ " #########################\n",
942
+ " with gr.Accordion(\"Evaluation section\"):\n",
943
+ " gr.Markdown(\"## Evaluation\")\n",
944
+ " with gr.Tab(\"General evalution\"):\n",
945
+ " evalution=gr.Textbox(label=\"AI Evaluation\")\n",
946
+ " btn5 = gr.Button(value=\"Evaluate\")\n",
947
+ " btn5.click(ai_evaluate, inputs=[outputs_context, outputs_transcribe, message_records, instruction_qa], outputs=evalution)\n",
948
+ " with gr.Tab(\"Quantitative evalution\"):\n",
949
+ " table_output = gr.Dataframe(label = \"Some kind of evaluation metrics?\")\n",
950
+ " btn6 = gr.Button(value=\"Evaluate\")\n",
951
+ " # btn6.click(ai_evaluate, inputs=[outputs_context, message_records, outputs_qa], outputs=table_output)\n",
952
+ "\n",
953
+ " # demo.launch()\n",
954
+ " # demo.launch(share=True)\n",
955
+ " demo.launch(debug=True)\n",
956
+ "\n",
957
+ "'''\n",
958
+ "What Are the Different Types of Machine Learning?\n",
959
+ "How Do You Handle Missing or Corrupted Data in a Dataset?\n",
960
+ "How Can You Choose a Classifier Based on a Training Set Data Size?\n",
961
+ "Explain the Confusion Matrix with Respect to Machine Learning Algorithms.\n",
962
+ "What Are the Differences Between Machine Learning and Deep Learning\n",
963
+ "'''"
964
+ ]
965
+ },
966
+ {
967
+ "cell_type": "markdown",
968
+ "metadata": {
969
+ "id": "g2EVIogW69Fd"
970
+ },
971
+ "source": [
972
+ "## What's left\n",
973
+ "- vector store (link) upload\n",
974
+ "- how to not show the warning when transcribing\n",
975
+ "- better prompt for evaluation\n",
976
+ "- try ChatInterface of Gradio"
977
+ ]
978
+ },
979
+ {
980
+ "cell_type": "code",
981
+ "source": [],
982
+ "metadata": {
983
+ "id": "-YwOAtNANrx_"
984
+ },
985
+ "execution_count": null,
986
+ "outputs": []
987
+ }
988
+ ],
989
+ "metadata": {
990
+ "colab": {
991
+ "provenance": [],
992
+ "include_colab_link": true
993
+ },
994
+ "kernelspec": {
995
+ "display_name": "Python 3",
996
+ "name": "python3"
997
+ },
998
+ "language_info": {
999
+ "name": "python"
1000
+ }
1001
+ },
1002
+ "nbformat": 4,
1003
+ "nbformat_minor": 0
1004
+ }
basic_UI_design_oral_exam.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
exploring_other_media_document_sources.ipynb ADDED
@@ -0,0 +1,459 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "colab_type": "text",
7
+ "id": "view-in-github"
8
+ },
9
+ "source": [
10
+ "<a href=\"https://colab.research.google.com/github/vanderbilt-data-science/lo-achievement/blob/main/exploring_other_media_document_sources.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "markdown",
15
+ "metadata": {
16
+ "id": "x_Vp8SiKM4p1"
17
+ },
18
+ "source": [
19
+ "# Exploring Alternative Media Document Sources\n",
20
+ "Test how one could get YouTube videos or websites as sources for documents in a vector store.\n",
21
+ "\n",
22
+ "- YouTube: https://python.langchain.com/docs/modules/data_connection/document_loaders/integrations/youtube_audio\n",
23
+ "- Websites:\n",
24
+ " - https://js.langchain.com/docs/modules/indexes/document_loaders/examples/web_loaders/\n",
25
+ " - https://python.langchain.com/docs/modules/data_connection/document_loaders/integrations/web_base\n",
26
+ " - Extracting relevant information from website: https://www.oncrawl.com/technical-seo/extract-relevant-text-content-from-html-page/\n",
27
+ "\n"
28
+ ]
29
+ },
30
+ {
31
+ "cell_type": "markdown",
32
+ "metadata": {
33
+ "id": "o_60X8H3NEne"
34
+ },
35
+ "source": [
36
+ "## Libraries"
37
+ ]
38
+ },
39
+ {
40
+ "cell_type": "code",
41
+ "execution_count": null,
42
+ "metadata": {
43
+ "colab": {
44
+ "base_uri": "https://localhost:8080/"
45
+ },
46
+ "id": "pxcqXgg2aAN7",
47
+ "outputId": "0bb1c0aa-99f7-4d8d-a66f-992ea54eb5ff"
48
+ },
49
+ "outputs": [],
50
+ "source": [
51
+ "# install libraries here\n",
52
+ "# -q flag for \"quiet\" install\n",
53
+ "!pip install -q langchain\n",
54
+ "!pip install -q openai\n",
55
+ "!pip install -q unstructured\n",
56
+ "!pip install -q tiktoken\n",
57
+ "!pip install typing_extensions==4.5.0"
58
+ ]
59
+ },
60
+ {
61
+ "cell_type": "code",
62
+ "execution_count": null,
63
+ "metadata": {
64
+ "colab": {
65
+ "base_uri": "https://localhost:8080/",
66
+ "height": 784
67
+ },
68
+ "id": "mwpl3jYJoGo7",
69
+ "outputId": "cbae7a1a-b4a4-4d32-f837-dcafb3d01746"
70
+ },
71
+ "outputs": [],
72
+ "source": [
73
+ "%pip install -q trafilatura\n",
74
+ "%pip install -q justext"
75
+ ]
76
+ },
77
+ {
78
+ "cell_type": "code",
79
+ "execution_count": null,
80
+ "metadata": {
81
+ "colab": {
82
+ "base_uri": "https://localhost:8080/"
83
+ },
84
+ "id": "NU-7ynWHvwfM",
85
+ "outputId": "cdd9a4db-1bd0-471c-d31e-e0c16aff0c98"
86
+ },
87
+ "outputs": [],
88
+ "source": [
89
+ "%pip install yt_dlp\n",
90
+ "%pip install pydub"
91
+ ]
92
+ },
93
+ {
94
+ "cell_type": "code",
95
+ "execution_count": 2,
96
+ "metadata": {
97
+ "id": "pEjM1tLsMZBq"
98
+ },
99
+ "outputs": [],
100
+ "source": [
101
+ "# import libraries here\n",
102
+ "import os\n",
103
+ "import time\n",
104
+ "import pprint\n",
105
+ "from getpass import getpass\n",
106
+ "\n",
107
+ "from langchain.docstore.document import Document\n",
108
+ "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
109
+ "from langchain.embeddings import OpenAIEmbeddings\n",
110
+ "\n",
111
+ "from langchain.document_loaders.unstructured import UnstructuredFileLoader"
112
+ ]
113
+ },
114
+ {
115
+ "cell_type": "code",
116
+ "execution_count": 5,
117
+ "metadata": {
118
+ "id": "0U6N_9xFsOcw"
119
+ },
120
+ "outputs": [],
121
+ "source": [
122
+ "from langchain.document_loaders.generic import GenericLoader\n",
123
+ "from langchain.document_loaders.parsers import OpenAIWhisperParser\n",
124
+ "from langchain.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader"
125
+ ]
126
+ },
127
+ {
128
+ "cell_type": "code",
129
+ "execution_count": 40,
130
+ "metadata": {
131
+ "id": "JRw367IwryWd"
132
+ },
133
+ "outputs": [],
134
+ "source": [
135
+ "from langchain.document_loaders import WebBaseLoader\n",
136
+ "import trafilatura\n",
137
+ "import requests\n",
138
+ "import justext"
139
+ ]
140
+ },
141
+ {
142
+ "cell_type": "markdown",
143
+ "metadata": {
144
+ "id": "n0BTyPI_srMg"
145
+ },
146
+ "source": []
147
+ },
148
+ {
149
+ "cell_type": "code",
150
+ "execution_count": 3,
151
+ "metadata": {
152
+ "id": "NOX639OA2pOh"
153
+ },
154
+ "outputs": [],
155
+ "source": [
156
+ "# Export requirements.txt (if needed)\n",
157
+ "%pip freeze > requirements.txt"
158
+ ]
159
+ },
160
+ {
161
+ "cell_type": "markdown",
162
+ "metadata": {
163
+ "id": "03KLZGI_a5W5"
164
+ },
165
+ "source": [
166
+ "## API Keys\n",
167
+ "\n",
168
+ "Use these cells to load the API keys required for this notebook. The below code cell uses the `getpass` library."
169
+ ]
170
+ },
171
+ {
172
+ "cell_type": "code",
173
+ "execution_count": 4,
174
+ "metadata": {
175
+ "colab": {
176
+ "base_uri": "https://localhost:8080/"
177
+ },
178
+ "id": "5smcWj4DbFgy",
179
+ "outputId": "969c13c3-2b77-4d7b-aa69-e24629894e6a"
180
+ },
181
+ "outputs": [
182
+ {
183
+ "name": "stdout",
184
+ "output_type": "stream",
185
+ "text": [
186
+ "··········\n"
187
+ ]
188
+ }
189
+ ],
190
+ "source": [
191
+ "openai_api_key = getpass()\n",
192
+ "os.environ[\"OPENAI_API_KEY\"] = openai_api_key"
193
+ ]
194
+ },
195
+ {
196
+ "cell_type": "code",
197
+ "execution_count": 35,
198
+ "metadata": {
199
+ "id": "Jgh9igPesX3F"
200
+ },
201
+ "outputs": [],
202
+ "source": [
203
+ "def splitter(text):\n",
204
+ " # Split input text\n",
205
+ " text_splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=150)\n",
206
+ " splits = text_splitter.split_text(text)\n",
207
+ " return splits"
208
+ ]
209
+ },
210
+ {
211
+ "cell_type": "markdown",
212
+ "metadata": {
213
+ "id": "F2W_fMfRUJj2"
214
+ },
215
+ "source": [
216
+ "## YouTube"
217
+ ]
218
+ },
219
+ {
220
+ "cell_type": "code",
221
+ "execution_count": 36,
222
+ "metadata": {
223
+ "id": "xm2aHrWdvztG"
224
+ },
225
+ "outputs": [],
226
+ "source": [
227
+ "def youtube_transcript(urls, save_dir = \"content\"):\n",
228
+ " # Transcribe the videos to text\n",
229
+ " # save_dir: directory to save audio files\n",
230
+ " youtube_loader = GenericLoader(YoutubeAudioLoader(urls, save_dir), OpenAIWhisperParser())\n",
231
+ " youtube_docs = youtube_loader.load()\n",
232
+ " # Combine doc\n",
233
+ " combined_docs = [doc.page_content for doc in youtube_docs]\n",
234
+ " text = \" \".join(combined_docs)\n",
235
+ " return text"
236
+ ]
237
+ },
238
+ {
239
+ "cell_type": "code",
240
+ "execution_count": null,
241
+ "metadata": {
242
+ "id": "u3pRTWrBv_oJ"
243
+ },
244
+ "outputs": [],
245
+ "source": [
246
+ "# Two Karpathy lecture videos\n",
247
+ "urls = [\"https://youtu.be/kCc8FmEb1nY\", \"https://youtu.be/VMj-3S1tku0\"]\n",
248
+ "youtube_text = youtube_transcript(urls)\n",
249
+ "youtube_text"
250
+ ]
251
+ },
252
+ {
253
+ "cell_type": "markdown",
254
+ "metadata": {
255
+ "id": "wsMTgKjnUmql"
256
+ },
257
+ "source": [
258
+ "## Websites"
259
+ ]
260
+ },
261
+ {
262
+ "cell_type": "code",
263
+ "execution_count": 25,
264
+ "metadata": {
265
+ "id": "aaIuM970pupK"
266
+ },
267
+ "outputs": [],
268
+ "source": [
269
+ "url = \"https://www.espn.com/\""
270
+ ]
271
+ },
272
+ {
273
+ "cell_type": "markdown",
274
+ "metadata": {
275
+ "id": "B2CW1oIgp5w3"
276
+ },
277
+ "source": [
278
+ "### WebBaseLoader"
279
+ ]
280
+ },
281
+ {
282
+ "cell_type": "code",
283
+ "execution_count": 42,
284
+ "metadata": {
285
+ "id": "MYw1qpovlnxe"
286
+ },
287
+ "outputs": [],
288
+ "source": [
289
+ "def website_webbase(url):\n",
290
+ " website_loader = WebBaseLoader(url)\n",
291
+ " website_data = website_loader.load()\n",
292
+ " # Combine doc\n",
293
+ " combined_docs = [doc.page_content for doc in website_data]\n",
294
+ " text = \" \".join(combined_docs)\n",
295
+ " return text"
296
+ ]
297
+ },
298
+ {
299
+ "cell_type": "code",
300
+ "execution_count": 43,
301
+ "metadata": {
302
+ "colab": {
303
+ "base_uri": "https://localhost:8080/",
304
+ "height": 139
305
+ },
306
+ "id": "M2A-lEpasbVo",
307
+ "outputId": "0936e800-b254-47a8-b865-ed10bec191b1"
308
+ },
309
+ "outputs": [
310
+ {
311
+ "data": {
312
+ "application/vnd.google.colaboratory.intrinsic+json": {
313
+ "type": "string"
314
+ },
315
+ "text/plain": [
316
+ "\"\\n\\n\\n\\n\\n\\n\\n\\n\\nESPN - Serving Sports Fans. Anytime. Anywhere.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n Skip to main content\\n \\n\\n Skip to navigation\\n \\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n<\\n\\n>\\n\\n\\n\\n\\n\\n\\n\\n\\n\\nMenuESPN\\n\\n\\nSearch\\n\\n\\n\\nscores\\n\\n\\n\\nNFLNBANHLMLBSoccerTennis…NCAAFNCAAMNCAAWSports BettingBoxingCFLNCAACricketF1GolfHorseMMANASCARNBA G LeagueOlympic SportsPLLRacingRN BBRN FBRugbyWNBAWWEX GamesXFLMore ESPNFantasyListenWatchESPN+\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n \\n\\nSUBSCRIBE NOW\\n\\n\\n\\n\\n\\nThe Ultimate Fighter: Season 31\\n\\n\\n\\n\\n\\n\\n\\nWimbledon: Select Courts\\n\\n\\n\\n\\n\\n\\n\\nNBA Summer League: Select Games\\n\\n\\n\\n\\n\\n\\n\\nProjecting Messi's Performance In MLS\\n\\n\\nQuick Links\\n\\n\\n\\n\\nNBA Summer League\\n\\n\\n\\n\\n\\n\\n\\nNBA Free Agency Buzz\\n\\n\\n\\n\\n\\n\\n\\nNBA Trade Machine\\n\\n\\n\\n\\n\\n\\n\\n2023 MLB Draft\\n\\n\\n\\n\\n\\n\\n\\n2023 MLB All-Star Weekend\\n\\n\\n\\n\\n\\n\\n\\nNHL Free Agency\\n\\n\\n\\n\\n\\n\\n\\nWomen's World Cup\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch PGA TOUR\\n\\n\\n\\n\\n\\n\\nFavorites\\n\\n\\n\\n\\n\\n\\n Manage Favorites\\n \\n\\n\\n\\nCustomize ESPNSign UpLog InESPN Sites\\n\\n\\n\\n\\nESPN Deportes\\n\\n\\n\\n\\n\\n\\n\\nAndscape\\n\\n\\n\\n\\n\\n\\n\\nespnW\\n\\n\\n\\n\\n\\n\\n\\nESPNFC\\n\\n\\n\\n\\n\\n\\n\\nX Games\\n\\n\\n\\n\\n\\n\\n\\nSEC Network\\n\\n\\nESPN Apps\\n\\n\\n\\n\\nESPN\\n\\n\\n\\n\\n\\n\\n\\nESPN Fantasy\\n\\n\\nFollow ESPN\\n\\n\\n\\n\\nFacebook\\n\\n\\n\\n\\n\\n\\n\\nTwitter\\n\\n\\n\\n\\n\\n\\n\\nInstagram\\n\\n\\n\\n\\n\\n\\n\\nSnapchat\\n\\n\\n\\n\\n\\n\\n\\nYouTube\\n\\n\\n\\n\\n\\n\\n\\nThe ESPN Daily Podcast\\n\\n\\nMeet all 23 USWNT players going to the World Cup: Fun facts, insightful stats and moreFrom Alex Morgan to Megan Rapinoe, get to know everyone on the Women's World Cup roster on and off the field.7hCaitlin MurrayIllustration by ESPNNotable absentees: From Becky Sauerbrunn to Beth MeadThere are over 60 players who won't play at the Women's World Cup because of injury.11hSophie LawsonHow Mia Hamm ended up playing as a goalie at a World CupWomen's World Cup 2023: Schedule, teams, venues, moreTOP HEADLINESKamara agrees to plea deal in Vegas assault caseMLBPA wants tweak to pitch timer before playoffsNorthwestern to keep assistant coaches for 2023Surfing star Jones dies after accident with boardMessi lands in U.S. ahead of Inter Miami unveilingDiet to discipline: Zion knows he can do moreDjokovic ties Federer with 46th Slam semifinalTour-PIF talks eyed Norman exit, Tiger LIV teamWho are the NFL's best cornerbacks?MLB All-Star GameMeet the All-Star Game first-timersFrom rookies to breakouts to one guy finally getting his due, here's what to expect from MLB's most notable new All-Stars.10hJesse RogersAP Photo/Ted WarrenAll-Star Game: Predictions and much moreBaseball's best are in Seattle. Here are the matchups we want to see and what our experts think will happen.3hESPNTHE TALK OF SUMMER LEAGUEChet Holmgren, Chris Paul and more buzz from VegasOur NBA insiders have the latest from Vegas, including Holmgren's return, the in-season tournament and Paul's fit on the Warriors.7hNBA insidersPhoto by Chris Gardner/Getty ImagesNOT BUYING INTO ZION'S WORDSZion's diet comments not well received by Kendrick Perkins and Richard Jefferson49m1:47WIMBLEDON SCOREBOARDTUESDAY'S MATCHESSee AllBIG UPSET AT WIMBLEDONElina Svitolina takes down No. 1 seed Iga Swiatek5h0:55Takeaways from Swiatek's surprising loss to SvitolinaIga Swiatek came into Wimbledon as a favorite to win -- but grass once again proved to be her downfall.3hAlyssa RoenigkSURVEYING EVERY POSITIONWho are the NFL's best cornerbacks? Execs, coaches and scouts help rank 2023's top 10Who are the best corners in the NFL? Execs, coaches and scouts from around the league ranked their top 10 in our annual summer series.10hJeremy FowlerPhoto by Ethan Miller/Getty ImagesRanking the best players at every position: Execs make their picksFor the fourth straight year, we asked execs, coaches, scouts and players to name their top 10 at all positions.10hJeremy FowlerFITZGERALD OUT AT NORTHWESTERNCOLLEGE FOOTBALLRece Davis 'shocked' by Northwestern hazing claims under Fitzgerald19h3:22BIG 12 MEDIA DAYSCOLLEGE FOOTBALLLast stand for Texas and Oklahoma, newcomers to watch and expansion talkThe Big 12 will start a busy month of media days on Wednesday, July 12. Here are the biggest questions facing the conference ahead of the 2023 season.1mBill Connelly and Dave WilsonRaymond Carlin/Icon Sportswire Top HeadlinesKamara agrees to plea deal in Vegas assault caseMLBPA wants tweak to pitch timer before playoffsNorthwestern to keep assistant coaches for 2023Surfing star Jones dies after accident with boardMessi lands in U.S. ahead of Inter Miami unveilingDiet to discipline: Zion knows he can do moreDjokovic ties Federer with 46th Slam semifinalTour-PIF talks eyed Norman exit, Tiger LIV teamWho are the NFL's best cornerbacks?Favorites FantasyManage FavoritesFantasy HomeCustomize ESPNSign UpLog InICYMI0:38J-Rod puts on a show with record 41 HRs in Round 1Mariners star Julio Rodriguez electrifies the Seattle crowd with a record 41 home runs in Round 1 of the Home Run Derby. Best of ESPN+Illustration by ESPNRanking the NFL's best players at every position for 2023: Execs, coaches, scouts pick their top 10 at every positionBest corners in the NFL? Edge rushers? Linebackers? For the fourth straight year, we asked execs, coaches, scouts and players to name their top 10.AP Photo/John LocherThree takeaways from Victor Wembanyama's second gameKevin Pelton looks at what worked for Wembanyama on Sunday compared to Friday and where the 19-year-old can continue to improve.Illustration by ESPN2024 NFL mock draft: Jordan Reid's early first-round predictionsThree QBs in the top 10? A run on offensive linemen? Impact defenders galore? Here are Jordan Reid's early projections for next year's 32 first-round picks. Trending NowBettmann/Getty ImagesInside the worst team in NBA history, the 1972-73 SixersThe Philadelphia 76ers started the 1972-73 season by losing 21 of 23 games. They'd finish with the worst record in NBA history, 9-73. This is the story of what the team learned about themselves through turmoil.Ron Chenoy-USA TODAY SportsCups of coffee: Seven former NFL players remember their one and only gameTheir NFL careers lasted a single game. Seven members of a unique professional club discuss their journeys.Illustration by MASAThe FC 100 for 2023: Haaland, Mbappe lead our list of best men's soccer playersAfter a brief hiatus thanks to the winter World Cup in Qatar, ESPN presents its seventh annual ranking of the best men's players and coaches in world soccer! Welcome to FC 100. How to Watch on ESPN+(AP Photo/Koji Sasahara, File)How to watch the PGA Tour, Masters, PGA Championship and FedEx Cup playoffs on ESPN, ESPN+Here's everything you need to know about how to watch the PGA Tour, Masters, PGA Championship and FedEx Cup playoffs on ESPN and ESPN+. Sign up for FREE!Create A LeagueJoin a Public LeagueReactivate a LeaguePractice with a Mock DraftSign up to play the #1 Fantasy game!Create A LeagueJoin Public LeagueReactivateMock Draft NowSign up for FREE!Create A LeagueJoin a Public LeagueReactivate a LeaguePractice With a Mock DraftSign up for FREE!Create A LeagueJoin a Public LeaguePractice With a Mock Draft\\n\\nESPN+\\n\\n\\n\\n\\nThe Ultimate Fighter: Season 31\\n\\n\\n\\n\\n\\n\\n\\nWimbledon: Select Courts\\n\\n\\n\\n\\n\\n\\n\\nNBA Summer League: Select Games\\n\\n\\n\\n\\n\\n\\n\\nProjecting Messi's Performance In MLS\\n\\n\\nQuick Links\\n\\n\\n\\n\\nNBA Summer League\\n\\n\\n\\n\\n\\n\\n\\nNBA Free Agency Buzz\\n\\n\\n\\n\\n\\n\\n\\nNBA Trade Machine\\n\\n\\n\\n\\n\\n\\n\\n2023 MLB Draft\\n\\n\\n\\n\\n\\n\\n\\n2023 MLB All-Star Weekend\\n\\n\\n\\n\\n\\n\\n\\nNHL Free Agency\\n\\n\\n\\n\\n\\n\\n\\nWomen's World Cup\\n\\n\\n\\n\\n\\n\\n\\nHow To Watch PGA TOUR\\n\\n\\nESPN Sites\\n\\n\\n\\n\\nESPN Deportes\\n\\n\\n\\n\\n\\n\\n\\nAndscape\\n\\n\\n\\n\\n\\n\\n\\nespnW\\n\\n\\n\\n\\n\\n\\n\\nESPNFC\\n\\n\\n\\n\\n\\n\\n\\nX Games\\n\\n\\n\\n\\n\\n\\n\\nSEC Network\\n\\n\\nESPN Apps\\n\\n\\n\\n\\nESPN\\n\\n\\n\\n\\n\\n\\n\\nESPN Fantasy\\n\\n\\nFollow ESPN\\n\\n\\n\\n\\nFacebook\\n\\n\\n\\n\\n\\n\\n\\nTwitter\\n\\n\\n\\n\\n\\n\\n\\nInstagram\\n\\n\\n\\n\\n\\n\\n\\nSnapchat\\n\\n\\n\\n\\n\\n\\n\\nYouTube\\n\\n\\n\\n\\n\\n\\n\\nThe ESPN Daily Podcast\\n\\n\\nTerms of UsePrivacy PolicyYour US State Privacy RightsChildren's Online Privacy PolicyInterest-Based AdsAbout Nielsen MeasurementDo Not Sell or Share My Personal InformationContact UsDisney Ad Sales SiteWork for ESPNCopyright: © ESPN Enterprises, Inc. All rights reserved.\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\\n\""
317
+ ]
318
+ },
319
+ "execution_count": 43,
320
+ "metadata": {},
321
+ "output_type": "execute_result"
322
+ }
323
+ ],
324
+ "source": [
325
+ "webbase_text = website_webbase(url)\n",
326
+ "webbase_text"
327
+ ]
328
+ },
329
+ {
330
+ "cell_type": "markdown",
331
+ "metadata": {
332
+ "id": "fLFNd6S8oAlD"
333
+ },
334
+ "source": [
335
+ "### Trafilatura Parsing\n",
336
+ "\n",
337
+ "[Tralifatura](https://trafilatura.readthedocs.io/en/latest/) is a Python and command-line utility which attempts to extracts the most relevant information from a given website. "
338
+ ]
339
+ },
340
+ {
341
+ "cell_type": "code",
342
+ "execution_count": 44,
343
+ "metadata": {
344
+ "id": "H3gtJjSfoK5C"
345
+ },
346
+ "outputs": [],
347
+ "source": [
348
+ "def website_trafilatura(url):\n",
349
+ " downloaded = trafilatura.fetch_url(url)\n",
350
+ " return trafilatura.extract(downloaded)"
351
+ ]
352
+ },
353
+ {
354
+ "cell_type": "code",
355
+ "execution_count": 45,
356
+ "metadata": {
357
+ "colab": {
358
+ "base_uri": "https://localhost:8080/",
359
+ "height": 52
360
+ },
361
+ "id": "ft8QDKTTsekG",
362
+ "outputId": "7eefb402-470d-4ff6-df72-4834a610f17a"
363
+ },
364
+ "outputs": [
365
+ {
366
+ "data": {
367
+ "application/vnd.google.colaboratory.intrinsic+json": {
368
+ "type": "string"
369
+ },
370
+ "text/plain": [
371
+ "'|Sports|\\n|scores||News|\\n|© 2023 ESPN Internet Ventures. Terms of Use and Privacy Policy and Safety Information / Your California Privacy Rights are applicable to you. All rights reserved.|\\n|\\nMore From ESPN:\\n|\\nESPN en Español | Andscape | FiveThirtyEight | ESPN FC | ESPNCricinfo'"
372
+ ]
373
+ },
374
+ "execution_count": 45,
375
+ "metadata": {},
376
+ "output_type": "execute_result"
377
+ }
378
+ ],
379
+ "source": [
380
+ "trafilatura_text = website_trafilatura(url)\n",
381
+ "trafilatura_text"
382
+ ]
383
+ },
384
+ {
385
+ "cell_type": "markdown",
386
+ "metadata": {
387
+ "id": "evXJEtZtobn0"
388
+ },
389
+ "source": [
390
+ "### jusText\n",
391
+ "\n",
392
+ "[jusText](https://pypi.org/project/jusText/) is another Python library for extracting content from a website."
393
+ ]
394
+ },
395
+ {
396
+ "cell_type": "code",
397
+ "execution_count": 46,
398
+ "metadata": {
399
+ "id": "AfahISIvph_Y"
400
+ },
401
+ "outputs": [],
402
+ "source": [
403
+ "def website_justext(url):\n",
404
+ " response = requests.get(url)\n",
405
+ " paragraphs = justext.justext(response.content, justext.get_stoplist(\"English\"))\n",
406
+ " content = [paragraph.text for paragraph in paragraphs \\\n",
407
+ " if not paragraph.is_boilerplate]\n",
408
+ " text = \" \".join(content)\n",
409
+ " return text"
410
+ ]
411
+ },
412
+ {
413
+ "cell_type": "code",
414
+ "execution_count": 47,
415
+ "metadata": {
416
+ "colab": {
417
+ "base_uri": "https://localhost:8080/",
418
+ "height": 52
419
+ },
420
+ "id": "eB_BdMjXsg3T",
421
+ "outputId": "b65bfc3f-2538-4b0a-99da-926855f9e21d"
422
+ },
423
+ "outputs": [
424
+ {
425
+ "data": {
426
+ "application/vnd.google.colaboratory.intrinsic+json": {
427
+ "type": "string"
428
+ },
429
+ "text/plain": [
430
+ "\"Trending Now The Philadelphia 76ers started the 1972-73 season by losing 21 of 23 games. They'd finish with the worst record in NBA history, 9-73. This is the story of what the team learned about themselves through turmoil.\""
431
+ ]
432
+ },
433
+ "execution_count": 47,
434
+ "metadata": {},
435
+ "output_type": "execute_result"
436
+ }
437
+ ],
438
+ "source": [
439
+ "justext_text = website_justext(url)\n",
440
+ "justext_text"
441
+ ]
442
+ }
443
+ ],
444
+ "metadata": {
445
+ "colab": {
446
+ "include_colab_link": true,
447
+ "provenance": []
448
+ },
449
+ "kernelspec": {
450
+ "display_name": "Python 3",
451
+ "name": "python3"
452
+ },
453
+ "language_info": {
454
+ "name": "python"
455
+ }
456
+ },
457
+ "nbformat": 4,
458
+ "nbformat_minor": 0
459
+ }
grading_from_json.ipynb ADDED
@@ -0,0 +1,606 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "colab_type": "text",
7
+ "id": "view-in-github"
8
+ },
9
+ "source": [
10
+ "<a href=\"https://colab.research.google.com/github/vanderbilt-data-science/lo-achievement/blob/main/grading_from_json.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "code",
15
+ "execution_count": null,
16
+ "metadata": {
17
+ "id": "kfO7rE64ZTI_"
18
+ },
19
+ "outputs": [],
20
+ "source": [
21
+ "!pip install openai"
22
+ ]
23
+ },
24
+ {
25
+ "cell_type": "code",
26
+ "execution_count": 2,
27
+ "metadata": {
28
+ "id": "f26sZpe-MCCj"
29
+ },
30
+ "outputs": [],
31
+ "source": [
32
+ "import json\n",
33
+ "import openai\n",
34
+ "import os\n",
35
+ "import pandas as pd"
36
+ ]
37
+ },
38
+ {
39
+ "cell_type": "code",
40
+ "execution_count": 4,
41
+ "metadata": {
42
+ "colab": {
43
+ "base_uri": "https://localhost:8080/",
44
+ "height": 614
45
+ },
46
+ "id": "BVTr_mR0XIJI",
47
+ "outputId": "897e41a0-d5e1-4b5f-d254-0a6e0f6aa3fa"
48
+ },
49
+ "outputs": [
50
+ {
51
+ "data": {
52
+ "text/html": [
53
+ "\n",
54
+ " <div id=\"df-e24b7014-4d98-4fc5-9ff1-07fa5c26ba5e\">\n",
55
+ " <div class=\"colab-df-container\">\n",
56
+ " <div>\n",
57
+ "<style scoped>\n",
58
+ " .dataframe tbody tr th:only-of-type {\n",
59
+ " vertical-align: middle;\n",
60
+ " }\n",
61
+ "\n",
62
+ " .dataframe tbody tr th {\n",
63
+ " vertical-align: top;\n",
64
+ " }\n",
65
+ "\n",
66
+ " .dataframe thead th {\n",
67
+ " text-align: right;\n",
68
+ " }\n",
69
+ "</style>\n",
70
+ "<table border=\"1\" class=\"dataframe\">\n",
71
+ " <thead>\n",
72
+ " <tr style=\"text-align: right;\">\n",
73
+ " <th></th>\n",
74
+ " <th>timestamp</th>\n",
75
+ " <th>author</th>\n",
76
+ " <th>message</th>\n",
77
+ " </tr>\n",
78
+ " </thead>\n",
79
+ " <tbody>\n",
80
+ " <tr>\n",
81
+ " <th>0</th>\n",
82
+ " <td>2023-06-07 08:16:00+00:00</td>\n",
83
+ " <td>assistant</td>\n",
84
+ " <td>Question 1:\\nWhich of the following statements...</td>\n",
85
+ " </tr>\n",
86
+ " <tr>\n",
87
+ " <th>1</th>\n",
88
+ " <td>2023-06-07 08:16:30+00:00</td>\n",
89
+ " <td>user</td>\n",
90
+ " <td>C</td>\n",
91
+ " </tr>\n",
92
+ " <tr>\n",
93
+ " <th>2</th>\n",
94
+ " <td>2023-06-07 08:17:00+00:00</td>\n",
95
+ " <td>assistant</td>\n",
96
+ " <td>Correct! Option C is the correct answer...</td>\n",
97
+ " </tr>\n",
98
+ " <tr>\n",
99
+ " <th>3</th>\n",
100
+ " <td>2023-06-07 08:17:30+00:00</td>\n",
101
+ " <td>assistant</td>\n",
102
+ " <td>Question 2:\\nWhich of the following expenses a...</td>\n",
103
+ " </tr>\n",
104
+ " <tr>\n",
105
+ " <th>4</th>\n",
106
+ " <td>2023-06-07 08:18:00+00:00</td>\n",
107
+ " <td>user</td>\n",
108
+ " <td>A</td>\n",
109
+ " </tr>\n",
110
+ " <tr>\n",
111
+ " <th>5</th>\n",
112
+ " <td>2023-06-07 08:18:30+00:00</td>\n",
113
+ " <td>assistant</td>\n",
114
+ " <td>I'm sorry, but your answer is incorrect...</td>\n",
115
+ " </tr>\n",
116
+ " <tr>\n",
117
+ " <th>6</th>\n",
118
+ " <td>2023-06-07 08:19:00+00:00</td>\n",
119
+ " <td>assistant</td>\n",
120
+ " <td>Question 2 (Revised):\\nWhich of the following ...</td>\n",
121
+ " </tr>\n",
122
+ " <tr>\n",
123
+ " <th>7</th>\n",
124
+ " <td>2023-06-07 08:19:30+00:00</td>\n",
125
+ " <td>user</td>\n",
126
+ " <td>D</td>\n",
127
+ " </tr>\n",
128
+ " <tr>\n",
129
+ " <th>8</th>\n",
130
+ " <td>2023-06-07 08:20:00+00:00</td>\n",
131
+ " <td>assistant</td>\n",
132
+ " <td>Correct! Option D is the correct answer...</td>\n",
133
+ " </tr>\n",
134
+ " <tr>\n",
135
+ " <th>9</th>\n",
136
+ " <td>2023-06-07 08:20:30+00:00</td>\n",
137
+ " <td>assistant</td>\n",
138
+ " <td>Question 3:\\nWhat is the purpose of capitalizi...</td>\n",
139
+ " </tr>\n",
140
+ " <tr>\n",
141
+ " <th>10</th>\n",
142
+ " <td>2023-06-07 08:21:00+00:00</td>\n",
143
+ " <td>user</td>\n",
144
+ " <td>C</td>\n",
145
+ " </tr>\n",
146
+ " <tr>\n",
147
+ " <th>11</th>\n",
148
+ " <td>2023-06-07 08:21:30+00:00</td>\n",
149
+ " <td>assistant</td>\n",
150
+ " <td>Correct! Option C is the correct answer...</td>\n",
151
+ " </tr>\n",
152
+ " <tr>\n",
153
+ " <th>12</th>\n",
154
+ " <td>2023-06-07 08:22:00+00:00</td>\n",
155
+ " <td>assistant</td>\n",
156
+ " <td>Question 4:\\nWhich financial statement provide...</td>\n",
157
+ " </tr>\n",
158
+ " <tr>\n",
159
+ " <th>13</th>\n",
160
+ " <td>2023-06-07 08:22:30+00:00</td>\n",
161
+ " <td>user</td>\n",
162
+ " <td>C</td>\n",
163
+ " </tr>\n",
164
+ " <tr>\n",
165
+ " <th>14</th>\n",
166
+ " <td>2023-06-07 08:23:00+00:00</td>\n",
167
+ " <td>assistant</td>\n",
168
+ " <td>Correct! Option C is the correct answer...</td>\n",
169
+ " </tr>\n",
170
+ " <tr>\n",
171
+ " <th>15</th>\n",
172
+ " <td>2023-06-07 08:23:30+00:00</td>\n",
173
+ " <td>assistant</td>\n",
174
+ " <td>Question 5:\\nWhat is the purpose of the matchi...</td>\n",
175
+ " </tr>\n",
176
+ " <tr>\n",
177
+ " <th>16</th>\n",
178
+ " <td>2023-06-07 08:24:00+00:00</td>\n",
179
+ " <td>user</td>\n",
180
+ " <td>B</td>\n",
181
+ " </tr>\n",
182
+ " <tr>\n",
183
+ " <th>17</th>\n",
184
+ " <td>2023-06-07 08:24:30+00:00</td>\n",
185
+ " <td>assistant</td>\n",
186
+ " <td>Correct! Option B is the correct answer...</td>\n",
187
+ " </tr>\n",
188
+ " </tbody>\n",
189
+ "</table>\n",
190
+ "</div>\n",
191
+ " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-e24b7014-4d98-4fc5-9ff1-07fa5c26ba5e')\"\n",
192
+ " title=\"Convert this dataframe to an interactive table.\"\n",
193
+ " style=\"display:none;\">\n",
194
+ " \n",
195
+ " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
196
+ " width=\"24px\">\n",
197
+ " <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
198
+ " <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
199
+ " </svg>\n",
200
+ " </button>\n",
201
+ " \n",
202
+ " <style>\n",
203
+ " .colab-df-container {\n",
204
+ " display:flex;\n",
205
+ " flex-wrap:wrap;\n",
206
+ " gap: 12px;\n",
207
+ " }\n",
208
+ "\n",
209
+ " .colab-df-convert {\n",
210
+ " background-color: #E8F0FE;\n",
211
+ " border: none;\n",
212
+ " border-radius: 50%;\n",
213
+ " cursor: pointer;\n",
214
+ " display: none;\n",
215
+ " fill: #1967D2;\n",
216
+ " height: 32px;\n",
217
+ " padding: 0 0 0 0;\n",
218
+ " width: 32px;\n",
219
+ " }\n",
220
+ "\n",
221
+ " .colab-df-convert:hover {\n",
222
+ " background-color: #E2EBFA;\n",
223
+ " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
224
+ " fill: #174EA6;\n",
225
+ " }\n",
226
+ "\n",
227
+ " [theme=dark] .colab-df-convert {\n",
228
+ " background-color: #3B4455;\n",
229
+ " fill: #D2E3FC;\n",
230
+ " }\n",
231
+ "\n",
232
+ " [theme=dark] .colab-df-convert:hover {\n",
233
+ " background-color: #434B5C;\n",
234
+ " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
235
+ " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
236
+ " fill: #FFFFFF;\n",
237
+ " }\n",
238
+ " </style>\n",
239
+ "\n",
240
+ " <script>\n",
241
+ " const buttonEl =\n",
242
+ " document.querySelector('#df-e24b7014-4d98-4fc5-9ff1-07fa5c26ba5e button.colab-df-convert');\n",
243
+ " buttonEl.style.display =\n",
244
+ " google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
245
+ "\n",
246
+ " async function convertToInteractive(key) {\n",
247
+ " const element = document.querySelector('#df-e24b7014-4d98-4fc5-9ff1-07fa5c26ba5e');\n",
248
+ " const dataTable =\n",
249
+ " await google.colab.kernel.invokeFunction('convertToInteractive',\n",
250
+ " [key], {});\n",
251
+ " if (!dataTable) return;\n",
252
+ "\n",
253
+ " const docLinkHtml = 'Like what you see? Visit the ' +\n",
254
+ " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
255
+ " + ' to learn more about interactive tables.';\n",
256
+ " element.innerHTML = '';\n",
257
+ " dataTable['output_type'] = 'display_data';\n",
258
+ " await google.colab.output.renderOutput(dataTable, element);\n",
259
+ " const docLink = document.createElement('div');\n",
260
+ " docLink.innerHTML = docLinkHtml;\n",
261
+ " element.appendChild(docLink);\n",
262
+ " }\n",
263
+ " </script>\n",
264
+ " </div>\n",
265
+ " </div>\n",
266
+ " "
267
+ ],
268
+ "text/plain": [
269
+ " timestamp author \\\n",
270
+ "0 2023-06-07 08:16:00+00:00 assistant \n",
271
+ "1 2023-06-07 08:16:30+00:00 user \n",
272
+ "2 2023-06-07 08:17:00+00:00 assistant \n",
273
+ "3 2023-06-07 08:17:30+00:00 assistant \n",
274
+ "4 2023-06-07 08:18:00+00:00 user \n",
275
+ "5 2023-06-07 08:18:30+00:00 assistant \n",
276
+ "6 2023-06-07 08:19:00+00:00 assistant \n",
277
+ "7 2023-06-07 08:19:30+00:00 user \n",
278
+ "8 2023-06-07 08:20:00+00:00 assistant \n",
279
+ "9 2023-06-07 08:20:30+00:00 assistant \n",
280
+ "10 2023-06-07 08:21:00+00:00 user \n",
281
+ "11 2023-06-07 08:21:30+00:00 assistant \n",
282
+ "12 2023-06-07 08:22:00+00:00 assistant \n",
283
+ "13 2023-06-07 08:22:30+00:00 user \n",
284
+ "14 2023-06-07 08:23:00+00:00 assistant \n",
285
+ "15 2023-06-07 08:23:30+00:00 assistant \n",
286
+ "16 2023-06-07 08:24:00+00:00 user \n",
287
+ "17 2023-06-07 08:24:30+00:00 assistant \n",
288
+ "\n",
289
+ " message \n",
290
+ "0 Question 1:\\nWhich of the following statements... \n",
291
+ "1 C \n",
292
+ "2 Correct! Option C is the correct answer... \n",
293
+ "3 Question 2:\\nWhich of the following expenses a... \n",
294
+ "4 A \n",
295
+ "5 I'm sorry, but your answer is incorrect... \n",
296
+ "6 Question 2 (Revised):\\nWhich of the following ... \n",
297
+ "7 D \n",
298
+ "8 Correct! Option D is the correct answer... \n",
299
+ "9 Question 3:\\nWhat is the purpose of capitalizi... \n",
300
+ "10 C \n",
301
+ "11 Correct! Option C is the correct answer... \n",
302
+ "12 Question 4:\\nWhich financial statement provide... \n",
303
+ "13 C \n",
304
+ "14 Correct! Option C is the correct answer... \n",
305
+ "15 Question 5:\\nWhat is the purpose of the matchi... \n",
306
+ "16 B \n",
307
+ "17 Correct! Option B is the correct answer... "
308
+ ]
309
+ },
310
+ "execution_count": 4,
311
+ "metadata": {},
312
+ "output_type": "execute_result"
313
+ }
314
+ ],
315
+ "source": [
316
+ "df = pd.read_json('demo_json.json')\n",
317
+ "pd.read_json('demo_json.json')"
318
+ ]
319
+ },
320
+ {
321
+ "cell_type": "code",
322
+ "execution_count": 5,
323
+ "metadata": {
324
+ "id": "anSNlvqlXh6i"
325
+ },
326
+ "outputs": [],
327
+ "source": [
328
+ "openai.api_key = \"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\""
329
+ ]
330
+ },
331
+ {
332
+ "cell_type": "code",
333
+ "execution_count": 8,
334
+ "metadata": {
335
+ "colab": {
336
+ "base_uri": "https://localhost:8080/",
337
+ "height": 627
338
+ },
339
+ "id": "udujJrX6SryU",
340
+ "outputId": "9b182162-7c1c-4d5a-be56-16947ddcda33"
341
+ },
342
+ "outputs": [
343
+ {
344
+ "data": {
345
+ "text/html": [
346
+ "\n",
347
+ " <div id=\"df-5123f950-1dca-46a6-be4d-dab5de1f8899\">\n",
348
+ " <div class=\"colab-df-container\">\n",
349
+ " <div>\n",
350
+ "<style scoped>\n",
351
+ " .dataframe tbody tr th:only-of-type {\n",
352
+ " vertical-align: middle;\n",
353
+ " }\n",
354
+ "\n",
355
+ " .dataframe tbody tr th {\n",
356
+ " vertical-align: top;\n",
357
+ " }\n",
358
+ "\n",
359
+ " .dataframe thead th {\n",
360
+ " text-align: right;\n",
361
+ " }\n",
362
+ "</style>\n",
363
+ "<table border=\"1\" class=\"dataframe\">\n",
364
+ " <thead>\n",
365
+ " <tr style=\"text-align: right;\">\n",
366
+ " <th></th>\n",
367
+ " <th>Question</th>\n",
368
+ " <th>Correct Answer</th>\n",
369
+ " <th>User Answer</th>\n",
370
+ " <th>Evaluation</th>\n",
371
+ " <th>Score</th>\n",
372
+ " </tr>\n",
373
+ " </thead>\n",
374
+ " <tbody>\n",
375
+ " <tr>\n",
376
+ " <th>0</th>\n",
377
+ " <td>Question 1:\\nWhich of the following statements...</td>\n",
378
+ " <td>C</td>\n",
379
+ " <td>C</td>\n",
380
+ " <td>correct.</td>\n",
381
+ " <td>1</td>\n",
382
+ " </tr>\n",
383
+ " <tr>\n",
384
+ " <th>1</th>\n",
385
+ " <td>Question 2 (Revised):\\nWhich of the following ...</td>\n",
386
+ " <td>D</td>\n",
387
+ " <td>D</td>\n",
388
+ " <td>incorrect. the correct answer is d, software d...</td>\n",
389
+ " <td>1</td>\n",
390
+ " </tr>\n",
391
+ " <tr>\n",
392
+ " <th>2</th>\n",
393
+ " <td>Question 3:\\nWhat is the purpose of capitalizi...</td>\n",
394
+ " <td>C</td>\n",
395
+ " <td>C</td>\n",
396
+ " <td>incorrect. the correct answer is b.</td>\n",
397
+ " <td>1</td>\n",
398
+ " </tr>\n",
399
+ " <tr>\n",
400
+ " <th>3</th>\n",
401
+ " <td>Question 4:\\nWhich financial statement provide...</td>\n",
402
+ " <td>C</td>\n",
403
+ " <td>C</td>\n",
404
+ " <td>correct</td>\n",
405
+ " <td>2</td>\n",
406
+ " </tr>\n",
407
+ " <tr>\n",
408
+ " <th>4</th>\n",
409
+ " <td>Question 5:\\nWhat is the purpose of the matchi...</td>\n",
410
+ " <td>B</td>\n",
411
+ " <td>B</td>\n",
412
+ " <td>correct</td>\n",
413
+ " <td>3</td>\n",
414
+ " </tr>\n",
415
+ " </tbody>\n",
416
+ "</table>\n",
417
+ "</div>\n",
418
+ " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-5123f950-1dca-46a6-be4d-dab5de1f8899')\"\n",
419
+ " title=\"Convert this dataframe to an interactive table.\"\n",
420
+ " style=\"display:none;\">\n",
421
+ " \n",
422
+ " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
423
+ " width=\"24px\">\n",
424
+ " <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
425
+ " <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
426
+ " </svg>\n",
427
+ " </button>\n",
428
+ " \n",
429
+ " <style>\n",
430
+ " .colab-df-container {\n",
431
+ " display:flex;\n",
432
+ " flex-wrap:wrap;\n",
433
+ " gap: 12px;\n",
434
+ " }\n",
435
+ "\n",
436
+ " .colab-df-convert {\n",
437
+ " background-color: #E8F0FE;\n",
438
+ " border: none;\n",
439
+ " border-radius: 50%;\n",
440
+ " cursor: pointer;\n",
441
+ " display: none;\n",
442
+ " fill: #1967D2;\n",
443
+ " height: 32px;\n",
444
+ " padding: 0 0 0 0;\n",
445
+ " width: 32px;\n",
446
+ " }\n",
447
+ "\n",
448
+ " .colab-df-convert:hover {\n",
449
+ " background-color: #E2EBFA;\n",
450
+ " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
451
+ " fill: #174EA6;\n",
452
+ " }\n",
453
+ "\n",
454
+ " [theme=dark] .colab-df-convert {\n",
455
+ " background-color: #3B4455;\n",
456
+ " fill: #D2E3FC;\n",
457
+ " }\n",
458
+ "\n",
459
+ " [theme=dark] .colab-df-convert:hover {\n",
460
+ " background-color: #434B5C;\n",
461
+ " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
462
+ " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
463
+ " fill: #FFFFFF;\n",
464
+ " }\n",
465
+ " </style>\n",
466
+ "\n",
467
+ " <script>\n",
468
+ " const buttonEl =\n",
469
+ " document.querySelector('#df-5123f950-1dca-46a6-be4d-dab5de1f8899 button.colab-df-convert');\n",
470
+ " buttonEl.style.display =\n",
471
+ " google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
472
+ "\n",
473
+ " async function convertToInteractive(key) {\n",
474
+ " const element = document.querySelector('#df-5123f950-1dca-46a6-be4d-dab5de1f8899');\n",
475
+ " const dataTable =\n",
476
+ " await google.colab.kernel.invokeFunction('convertToInteractive',\n",
477
+ " [key], {});\n",
478
+ " if (!dataTable) return;\n",
479
+ "\n",
480
+ " const docLinkHtml = 'Like what you see? Visit the ' +\n",
481
+ " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
482
+ " + ' to learn more about interactive tables.';\n",
483
+ " element.innerHTML = '';\n",
484
+ " dataTable['output_type'] = 'display_data';\n",
485
+ " await google.colab.output.renderOutput(dataTable, element);\n",
486
+ " const docLink = document.createElement('div');\n",
487
+ " docLink.innerHTML = docLinkHtml;\n",
488
+ " element.appendChild(docLink);\n",
489
+ " }\n",
490
+ " </script>\n",
491
+ " </div>\n",
492
+ " </div>\n",
493
+ " "
494
+ ],
495
+ "text/plain": [
496
+ " Question Correct Answer \\\n",
497
+ "0 Question 1:\\nWhich of the following statements... C \n",
498
+ "1 Question 2 (Revised):\\nWhich of the following ... D \n",
499
+ "2 Question 3:\\nWhat is the purpose of capitalizi... C \n",
500
+ "3 Question 4:\\nWhich financial statement provide... C \n",
501
+ "4 Question 5:\\nWhat is the purpose of the matchi... B \n",
502
+ "\n",
503
+ " User Answer Evaluation Score \n",
504
+ "0 C correct. 1 \n",
505
+ "1 D incorrect. the correct answer is d, software d... 1 \n",
506
+ "2 C incorrect. the correct answer is b. 1 \n",
507
+ "3 C correct 2 \n",
508
+ "4 B correct 3 "
509
+ ]
510
+ },
511
+ "execution_count": 8,
512
+ "metadata": {},
513
+ "output_type": "execute_result"
514
+ }
515
+ ],
516
+ "source": [
517
+ "# Initialize necessary variables\n",
518
+ "prompt = \"\"\n",
519
+ "question = \"\"\n",
520
+ "correct_answer = \"\"\n",
521
+ "user_answer = \"\"\n",
522
+ "\n",
523
+ "# Initialize score\n",
524
+ "score = 0\n",
525
+ "\n",
526
+ "# Initialize an empty list to hold row data\n",
527
+ "row_data = []\n",
528
+ "\n",
529
+ "for index, row in df.iterrows():\n",
530
+ " author = row['author']\n",
531
+ " message = row['message']\n",
532
+ "\n",
533
+ " # Choose the appropriate prompt based on the author\n",
534
+ " if author == 'assistant':\n",
535
+ " if 'Question' in message:\n",
536
+ " question = message\n",
537
+ " user_answer = '' # Reset user_answer after a new question\n",
538
+ " elif 'Correct! Option' in message:\n",
539
+ " correct_answer = message.split('Option ')[1][0]\n",
540
+ " if user_answer: # If user_answer exists, make the API call\n",
541
+ " prompt = f\"Given the following question:\\n{question}\\nThe student responded with: {user_answer}\\nIs the student's response correct or incorrect?\"\n",
542
+ "\n",
543
+ " # Make an API call to OpenAI\n",
544
+ " api_response = openai.Completion.create(\n",
545
+ " engine='text-davinci-003',\n",
546
+ " prompt=prompt,\n",
547
+ " max_tokens=100,\n",
548
+ " temperature=0.7,\n",
549
+ " n=1,\n",
550
+ " stop=None\n",
551
+ " )\n",
552
+ "\n",
553
+ " # Extract and evaluate the generated response\n",
554
+ " generated_response = api_response.choices[0].text.strip().lower()\n",
555
+ "\n",
556
+ " # Update score based on generated_response\n",
557
+ " if 'correct' in generated_response and 'incorrect' not in generated_response:\n",
558
+ " score += 1\n",
559
+ "\n",
560
+ " # Create a dictionary for the current row\n",
561
+ " row_dict = {\n",
562
+ " 'Question': question,\n",
563
+ " 'Correct Answer': correct_answer,\n",
564
+ " 'User Answer': user_answer,\n",
565
+ " 'Evaluation': generated_response,\n",
566
+ " 'Score': score\n",
567
+ " }\n",
568
+ " # Append the row dictionary to row_data\n",
569
+ " row_data.append(row_dict)\n",
570
+ "\n",
571
+ " elif author == 'user':\n",
572
+ " user_answer = message\n",
573
+ "\n",
574
+ "# Create a DataFrame from row_data\n",
575
+ "output_df = pd.DataFrame(row_data)\n",
576
+ "output_df\n"
577
+ ]
578
+ }
579
+ ],
580
+ "metadata": {
581
+ "colab": {
582
+ "authorship_tag": "ABX9TyOn+FniXzrkHNKH5uAKgyUD",
583
+ "include_colab_link": true,
584
+ "provenance": []
585
+ },
586
+ "kernelspec": {
587
+ "display_name": "Python 3 (ipykernel)",
588
+ "language": "python",
589
+ "name": "python3"
590
+ },
591
+ "language_info": {
592
+ "codemirror_mode": {
593
+ "name": "ipython",
594
+ "version": 3
595
+ },
596
+ "file_extension": ".py",
597
+ "mimetype": "text/x-python",
598
+ "name": "python",
599
+ "nbconvert_exporter": "python",
600
+ "pygments_lexer": "ipython3",
601
+ "version": "3.8.16"
602
+ }
603
+ },
604
+ "nbformat": 4,
605
+ "nbformat_minor": 4
606
+ }
gradio_application.ipynb ADDED
@@ -0,0 +1,855 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "colab_type": "text",
7
+ "id": "view-in-github"
8
+ },
9
+ "source": [
10
+ "<a href=\"https://colab.research.google.com/github/vanderbilt-data-science/lo-achievement/blob/update_UI_with_prompts/gradio_application.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "markdown",
15
+ "metadata": {
16
+ "id": "x_Vp8SiKM4p1"
17
+ },
18
+ "source": [
19
+ "# Gradio Interface Draft\n",
20
+ "\n",
21
+ "The goal of this notebook is to show a draft of a comprehensive Gradio interface though which students can interface with an LLM and self-study."
22
+ ]
23
+ },
24
+ {
25
+ "cell_type": "markdown",
26
+ "metadata": {
27
+ "id": "o_60X8H3NEne"
28
+ },
29
+ "source": [
30
+ "## Libraries"
31
+ ]
32
+ },
33
+ {
34
+ "cell_type": "code",
35
+ "execution_count": 1,
36
+ "metadata": {
37
+ "id": "pxcqXgg2aAN7"
38
+ },
39
+ "outputs": [
40
+ {
41
+ "name": "stdout",
42
+ "output_type": "stream",
43
+ "text": [
44
+ "\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
45
+ "\u001b[0m\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
46
+ "\u001b[0m\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
47
+ "\u001b[0m\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
48
+ "\u001b[0m\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
49
+ "\u001b[0m\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
50
+ "anaconda-project 0.10.1 requires ruamel-yaml, which is not installed.\n",
51
+ "conda-repo-cli 1.0.4 requires pathlib, which is not installed.\n",
52
+ "daal4py 2021.3.0 requires daal==2021.2.3, which is not installed.\n",
53
+ "spyder 5.1.5 requires pyqt5<5.13, which is not installed.\n",
54
+ "spyder 5.1.5 requires pyqtwebengine<5.13, which is not installed.\n",
55
+ "cookiecutter 1.7.2 requires MarkupSafe<2.0.0, but you have markupsafe 2.1.3 which is incompatible.\n",
56
+ "numba 0.54.1 requires numpy<1.21,>=1.17, but you have numpy 1.26.0 which is incompatible.\n",
57
+ "pyppeteer 1.0.2 requires websockets<11.0,>=10.0, but you have websockets 11.0.3 which is incompatible.\n",
58
+ "scipy 1.7.1 requires numpy<1.23.0,>=1.16.5, but you have numpy 1.26.0 which is incompatible.\n",
59
+ "transformers 4.25.1 requires tokenizers!=0.11.3,<0.14,>=0.11.1, but you have tokenizers 0.14.0 which is incompatible.\u001b[0m\u001b[31m\n",
60
+ "\u001b[0m\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
61
+ "\u001b[0mCollecting Pillow==9.0.0\n",
62
+ " Downloading Pillow-9.0.0-cp39-cp39-macosx_10_10_x86_64.whl (3.0 MB)\n",
63
+ "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━��━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.0/3.0 MB\u001b[0m \u001b[31m9.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m00:01\u001b[0m\n",
64
+ "\u001b[?25h\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
65
+ "\u001b[0mInstalling collected packages: Pillow\n",
66
+ " Attempting uninstall: Pillow\n",
67
+ " Found existing installation: Pillow 8.4.0\n",
68
+ " Uninstalling Pillow-8.4.0:\n",
69
+ " Successfully uninstalled Pillow-8.4.0\n",
70
+ "Successfully installed Pillow-9.0.0\n",
71
+ "\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
72
+ "\u001b[0m"
73
+ ]
74
+ }
75
+ ],
76
+ "source": [
77
+ "# install libraries here\n",
78
+ "# -q flag for \"quiet\" install\n",
79
+ "!pip install -q langchain\n",
80
+ "!pip install -q openai\n",
81
+ "!pip install -q gradio\n",
82
+ "!pip install -q unstructured\n",
83
+ "!pip install -q chromadb\n",
84
+ "!pip install -q tiktoken\n",
85
+ "!pip install Pillow==9.0.0\n",
86
+ "!pip install -q reportlab"
87
+ ]
88
+ },
89
+ {
90
+ "cell_type": "code",
91
+ "execution_count": 2,
92
+ "metadata": {
93
+ "id": "pEjM1tLsMZBq"
94
+ },
95
+ "outputs": [
96
+ {
97
+ "ename": "ModuleNotFoundError",
98
+ "evalue": "No module named 'langchain'",
99
+ "output_type": "error",
100
+ "traceback": [
101
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
102
+ "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)",
103
+ "Cell \u001b[0;32mIn[2], line 6\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mtime\u001b[39;00m\n\u001b[1;32m 4\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mgetpass\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m getpass\n\u001b[0;32m----> 6\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocstore\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m Document\n\u001b[1;32m 7\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mprompts\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m PromptTemplate\n\u001b[1;32m 8\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mlangchain\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mdocument_loaders\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m TextLoader\n",
104
+ "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'langchain'"
105
+ ]
106
+ }
107
+ ],
108
+ "source": [
109
+ "# import libraries here\n",
110
+ "import os\n",
111
+ "import time\n",
112
+ "from getpass import getpass\n",
113
+ "\n",
114
+ "from langchain.docstore.document import Document\n",
115
+ "from langchain.prompts import PromptTemplate\n",
116
+ "from langchain.document_loaders import TextLoader\n",
117
+ "from langchain.indexes import VectorstoreIndexCreator\n",
118
+ "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
119
+ "from langchain.embeddings import OpenAIEmbeddings\n",
120
+ "\n",
121
+ "from langchain.document_loaders.unstructured import UnstructuredFileLoader\n",
122
+ "from langchain.vectorstores import Chroma\n",
123
+ "from langchain.chains import RetrievalQA, RetrievalQAWithSourcesChain\n",
124
+ "from langchain.chains.qa_with_sources import load_qa_with_sources_chain\n",
125
+ "#from langchain.chains import ConversationalRetrievalChain\n",
126
+ "\n",
127
+ "from langchain.llms import OpenAI\n",
128
+ "from langchain.chat_models import ChatOpenAI\n",
129
+ "\n",
130
+ "import gradio as gr\n",
131
+ "from sqlalchemy import TEXT # TODO Why is sqlalchemy imported\n",
132
+ "\n",
133
+ "import pprint\n",
134
+ "\n",
135
+ "import json\n",
136
+ "from google.colab import files\n",
137
+ "from reportlab.pdfgen.canvas import Canvas"
138
+ ]
139
+ },
140
+ {
141
+ "cell_type": "markdown",
142
+ "metadata": {
143
+ "id": "n0BTyPI_srMg"
144
+ },
145
+ "source": [
146
+ "# Export requirements.txt (if needed)"
147
+ ]
148
+ },
149
+ {
150
+ "cell_type": "code",
151
+ "execution_count": 4,
152
+ "metadata": {
153
+ "id": "NOX639OA2pOh"
154
+ },
155
+ "outputs": [],
156
+ "source": [
157
+ "!pip freeze > requirements.txt"
158
+ ]
159
+ },
160
+ {
161
+ "cell_type": "markdown",
162
+ "metadata": {
163
+ "id": "03KLZGI_a5W5"
164
+ },
165
+ "source": [
166
+ "## API Keys\n",
167
+ "\n",
168
+ "Use these cells to load the API keys required for this notebook. The below code cell uses the `getpass` library."
169
+ ]
170
+ },
171
+ {
172
+ "cell_type": "code",
173
+ "execution_count": 3,
174
+ "metadata": {
175
+ "colab": {
176
+ "base_uri": "https://localhost:8080/"
177
+ },
178
+ "id": "5smcWj4DbFgy",
179
+ "outputId": "0aec236a-dfc8-4afb-d97c-ff324b85bb70"
180
+ },
181
+ "outputs": [
182
+ {
183
+ "name": "stdout",
184
+ "output_type": "stream",
185
+ "text": [
186
+ "··········\n"
187
+ ]
188
+ }
189
+ ],
190
+ "source": [
191
+ "openai_api_key = getpass()\n",
192
+ "os.environ[\"OPENAI_API_KEY\"] = openai_api_key"
193
+ ]
194
+ },
195
+ {
196
+ "cell_type": "code",
197
+ "execution_count": 4,
198
+ "metadata": {
199
+ "id": "fck1RVxD8xSX"
200
+ },
201
+ "outputs": [],
202
+ "source": [
203
+ "llm = ChatOpenAI(model_name = 'gpt-3.5-turbo-16k')\n",
204
+ "# TODO OpenAI() or ChatOpenAI()?"
205
+ ]
206
+ },
207
+ {
208
+ "cell_type": "markdown",
209
+ "metadata": {
210
+ "id": "UzaYNFFT4AwX"
211
+ },
212
+ "source": [
213
+ "# Interface"
214
+ ]
215
+ },
216
+ {
217
+ "cell_type": "code",
218
+ "execution_count": 5,
219
+ "metadata": {
220
+ "id": "N9i8zsMmbcd3"
221
+ },
222
+ "outputs": [],
223
+ "source": [
224
+ "global db # Document-storing object (vector store / index)\n",
225
+ "global qa # Question-answer object; retrieves from `db`\n",
226
+ "global srcs # List of source documents fragments referenced by vector store\n",
227
+ "num_sources = 100 # Maximum number of source documents which can be shown\n",
228
+ "\n",
229
+ "srcs = []\n",
230
+ "# See https://github.com/hwchase17/langchain/discussions/3786 for discussion\n",
231
+ "# of which splitter to use\n",
232
+ "text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)"
233
+ ]
234
+ },
235
+ {
236
+ "cell_type": "code",
237
+ "execution_count": 6,
238
+ "metadata": {
239
+ "id": "fmFg2_iyvRKI"
240
+ },
241
+ "outputs": [],
242
+ "source": [
243
+ "### Source Display ###\n",
244
+ "\n",
245
+ "def format_source_reference(document, index):\n",
246
+ " \"\"\"Return a HTML element which contains the `document` info and can be\n",
247
+ " referenced by the `index`.\"\"\"\n",
248
+ " if 'source' in document.metadata:\n",
249
+ " source_filepath, source_name = os.path.split(document.metadata['source'])\n",
250
+ " else:\n",
251
+ " source_name = \"text box\"\n",
252
+ " return f\"<p id='source{index}'>[{index+1}] <em>{source_name}</em><br>...{document.page_content}...</p>\"\n",
253
+ "\n",
254
+ "def format_source_citation(index):\n",
255
+ " \"\"\"Return a Markdown element which references the element with the given\n",
256
+ " `index`.\"\"\"\n",
257
+ " return f\"[{index+1}](#source{index})\"\n",
258
+ "\n",
259
+ "def update_source_list(added_sources):\n",
260
+ " \"\"\"Add any new sources to the list `srcs`; return a list of integers\n",
261
+ " containing the index references for each of the given sources in `sources`.\"\"\"\n",
262
+ " source_indices = []\n",
263
+ " for source in added_sources:\n",
264
+ " if source not in srcs: srcs.append(source)\n",
265
+ " source_indices.append(srcs.index(source))\n",
266
+ " return source_indices\n",
267
+ "\n",
268
+ "def get_source_display_updates(sources=srcs):\n",
269
+ " \"\"\"Dynamically update the output for the given list of components to show\n",
270
+ " the items as contained within the documents in `sources`.\n",
271
+ " See https://github.com/gradio-app/gradio/issues/2066 for what I'm copying.\"\"\"\n",
272
+ " # TODO Currently displays the first 100 sources only, which could be an issue\n",
273
+ " # TODO Display an alert when the source display is being cut off (over 100\n",
274
+ " # sources are stored)\n",
275
+ " update_list = []\n",
276
+ " for i in range(num_sources):\n",
277
+ " if i < len(sources):\n",
278
+ " source_text = format_source_reference(sources[i], i)\n",
279
+ " update_list.append(gr.update(visible=True, value=source_text))\n",
280
+ " else:\n",
281
+ " update_list.append(gr.update(visible=False))\n",
282
+ " return update_list"
283
+ ]
284
+ },
285
+ {
286
+ "cell_type": "code",
287
+ "execution_count": 7,
288
+ "metadata": {
289
+ "id": "2HnXKhgPccPV"
290
+ },
291
+ "outputs": [],
292
+ "source": [
293
+ "def embed_key(openai_api_key):\n",
294
+ " os.environ[\"OPENAI_API_KEY\"] = openai_api_key\n",
295
+ "\n",
296
+ "### Helper Functions ###\n",
297
+ "\n",
298
+ "def create_vector_store_from_document_segments(document_segments):\n",
299
+ " \"\"\"Updates the global `db` and `qa` variables to contain the given sections\n",
300
+ " of text (generated from the create_vector_store_from_file and\n",
301
+ " create_vector_store_from_text_input functions).\"\"\"\n",
302
+ " global db, qa\n",
303
+ "\n",
304
+ " # TODO: Add useful metadata that indicates from *where* in the document\n",
305
+ " # the segment is pulled (ideally, show what page it's from, if possible)\n",
306
+ " # This requires changing how the info is displayed in the UI, but it\n",
307
+ " # shouldn't be a big change\n",
308
+ " embeddings = OpenAIEmbeddings()\n",
309
+ " db = Chroma.from_documents(document_segments, embeddings)\n",
310
+ " retriever = db.as_retriever()\n",
311
+ "\n",
312
+ " qa = RetrievalQAWithSourcesChain.from_chain_type(llm=llm,\n",
313
+ " chain_type=\"stuff\",\n",
314
+ " retriever=retriever,\n",
315
+ " return_source_documents=True)\n",
316
+ "\n",
317
+ "def get_document_segments_from_files(files):\n",
318
+ " \"\"\"Returns a list of document segments taken from all of the files in the\n",
319
+ " given list.\"\"\"\n",
320
+ " # TODO Modify the metadata for each document to specify the location; see\n",
321
+ " # colab.research.google.com/drive/142CCh-Mh1AdodIGD7PyL86slFLOcVVeI\n",
322
+ " all_document_segments = []\n",
323
+ " for file in files:\n",
324
+ " loader = UnstructuredFileLoader(file.name)\n",
325
+ " documents = loader.load()\n",
326
+ " document_segments = text_splitter.split_documents(documents)\n",
327
+ " all_document_segments.extend(document_segments)\n",
328
+ " return all_document_segments\n",
329
+ "\n",
330
+ "def get_document_segments_from_text(text):\n",
331
+ " \"\"\"Returns a list of document segments taken from the text.\"\"\"\n",
332
+ " return [Document(page_content=page, metadata={\"source\":\"text box\"})\n",
333
+ " for page in text_splitter.split_text(text)]\n",
334
+ "\n",
335
+ "def create_vector_store_from_file(file):\n",
336
+ " \"\"\"Updates the global `db` and `qa` variables to contain the given file.\"\"\"\n",
337
+ " # TODO change the code to allow input of multiple files (see previous\n",
338
+ " # implementation of this in some other notebook I made)\n",
339
+ " loader = UnstructuredFileLoader(file.name)\n",
340
+ " documents = loader.load()\n",
341
+ " document_segments = text_splitter.split_documents(documents)\n",
342
+ " create_vector_store_from_document_segments(document_segments)\n",
343
+ "\n",
344
+ "def create_vector_store_from_files(files):\n",
345
+ " \"\"\"Updates the global `db` and `qa` variables to contain the given files.\"\"\"\n",
346
+ " document_segments = get_document_segments_from_files(files)\n",
347
+ " create_vector_store_from_document_segments(document_segments)\n",
348
+ "\n",
349
+ "def create_vector_store_from_text(text):\n",
350
+ " \"\"\"Updates the global `db` and `qa` variables to contain the given string.\"\"\"\n",
351
+ " document_segments = get_document_segments_from_text(text)\n",
352
+ " create_vector_store_from_document_segments(document_segments)\n",
353
+ "\n",
354
+ "# TODO I keep on getting the following error (or something similar):\n",
355
+ "# openai.error.InvalidRequestError: This model's maximum context length is 4097\n",
356
+ "# tokens, however you requested 4333 tokens (4077 in your prompt; 256 for the\n",
357
+ "# completion). Please reduce your prompt; or completion length.\n",
358
+ "\n",
359
+ "def construct_prompt(prompt, learning_objective):\n",
360
+ " # TODO Use prompt templates instead of just adding the strings together\n",
361
+ " return prompt + \" \" + learning_objective\n",
362
+ "\n",
363
+ "def generate_response(prompt, learning_objective=''):\n",
364
+ " \"\"\"Get the LLM response to the given prompt with the given learning objective,\n",
365
+ " referencing the global `qa` variable.\"\"\"\n",
366
+ " global db, qa\n",
367
+ " qa_prompt = construct_prompt(prompt, learning_objective)\n",
368
+ " result = qa({\"question\": qa_prompt})\n",
369
+ " return result[\"answer\"], result[\"source_documents\"]\n",
370
+ "\n",
371
+ "def generate_response_with_document_update(prompt, learning_objective=''):\n",
372
+ " \"\"\"Get the LLM response to the given prompt with the given learning objective,\n",
373
+ " referencing the global `qa` variable, and also return the document updates.\"\"\"\n",
374
+ " answer, sources = generate_response(prompt, learning_objective)\n",
375
+ " update_source_list(sources)\n",
376
+ " return [answer, *get_source_display_updates()]"
377
+ ]
378
+ },
379
+ {
380
+ "cell_type": "code",
381
+ "execution_count": 8,
382
+ "metadata": {
383
+ "id": "HMQcRCGjY_7p"
384
+ },
385
+ "outputs": [],
386
+ "source": [
387
+ "### Source Display ###\n",
388
+ "\n",
389
+ "def format_source_reference(document, index):\n",
390
+ " \"\"\"Return a HTML element which contains the `document` info and can be\n",
391
+ " referenced by the `index`.\"\"\"\n",
392
+ " source_filepath, source_filename = os.path.split(document.metadata['source'])\n",
393
+ " return f\"<p id='source{index}'>[{index+1}] <em>{source_filename}</em><br>...{document.page_content}...</p>\"\n",
394
+ "\n",
395
+ "def format_source_citation(index):\n",
396
+ " \"\"\"Return a Markdown element which references the element with the given\n",
397
+ " `index`.\"\"\"\n",
398
+ " return f\"[{index+1}](#source{index})\"\n",
399
+ "\n",
400
+ "def update_source_list(added_sources):\n",
401
+ " \"\"\"Add any new sources to the list `srcs`; return a list of integers\n",
402
+ " containing the index references for each of the given sources in `sources`.\"\"\"\n",
403
+ " source_indices = []\n",
404
+ " for source in added_sources:\n",
405
+ " if source not in srcs: srcs.append(source)\n",
406
+ " source_indices.append(srcs.index(source))\n",
407
+ " return source_indices\n",
408
+ "\n",
409
+ "def get_source_display_updates(sources=srcs):\n",
410
+ " \"\"\"Dynamically update the output for the given list of components to show\n",
411
+ " the items as contained within the documents in `sources`.\n",
412
+ " See https://github.com/gradio-app/gradio/issues/2066 for what I'm copying.\"\"\"\n",
413
+ " # TODO Currently displays the first 100 sources only, which could be an issue\n",
414
+ " # TODO Display an alert when the source display is being cut off (over 100\n",
415
+ " # sources are stored)\n",
416
+ " update_list = []\n",
417
+ " for i in range(num_sources):\n",
418
+ " if i < len(sources):\n",
419
+ " source_text = format_source_reference(sources[i], i)\n",
420
+ " update_list.append(gr.update(visible=True, value=source_text))\n",
421
+ " else:\n",
422
+ " update_list.append(gr.update(visible=False))\n",
423
+ " return update_list"
424
+ ]
425
+ },
426
+ {
427
+ "cell_type": "code",
428
+ "execution_count": 21,
429
+ "metadata": {
430
+ "id": "XwIHEgnmQ7_q"
431
+ },
432
+ "outputs": [],
433
+ "source": [
434
+ "### Gradio Called Functions ###\n",
435
+ "\n",
436
+ "def prompt_select(selection, number, length):\n",
437
+ " if selection == \"Random\":\n",
438
+ " prompt = f\"Please design a {number} question quiz based on the context provided and the inputted learning objectives (if applicable). The types of questions should be randomized (including multiple choice, short answer, true/false, short answer, etc.). Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide 1 question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect, and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right.\"\n",
439
+ " elif selection == \"Fill in the Blank\":\n",
440
+ " prompt = f\"Create a {number} question fill in the blank quiz refrencing the context provided. The quiz should reflect the learning objectives (if inputted). The 'blank' part of the question should appear as '________'. The answers should reflect what word(s) should go in the blank an accurate statement. An example is the follow: 'The author of the article is ______.' The question should be a statement. Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide ONE question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect,and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right.\"\n",
441
+ " elif selection == \"Short Answer\":\n",
442
+ " prompt = f\"Please design a {number} question quiz about which reflects the learning objectives (if inputted). The questions should be short answer. Expect the correct answers to be {length} sentences long. Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide ONE question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect, and then give me additional chances to respond until I get the correct choice. Explain why the correct answer is right.\"\n",
443
+ " else:\n",
444
+ " prompt = f\"Please design a {number} question {selection.lower()} quiz based on the context provided and the inputted learning objectives (if applicable). Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide 1 question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect, and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right.\"\n",
445
+ " return prompt\n",
446
+ "\n",
447
+ "'''\n",
448
+ "def generate_response_with_document(file_input, text_input,\n",
449
+ " prompt, learning_objective):\n",
450
+ " \"\"\"Determine which type of document input is given and call the corresponding\n",
451
+ " function to generate and return a response, along with the sources.\"\"\"\n",
452
+ " if file_input:\n",
453
+ " return generate_response_from_files(file_input, prompt, learning_objective)\n",
454
+ " elif text_input:\n",
455
+ " return generate_response_from_text(text_input, prompt, learning_objective)\n",
456
+ " else: # No document input\n",
457
+ " # TODO add a UI indicator to the student if they forgot to provide document\n",
458
+ " raise Exception(\"No source text or file given\")\n",
459
+ "'''\n",
460
+ "pass"
461
+ ]
462
+ },
463
+ {
464
+ "cell_type": "code",
465
+ "execution_count": 10,
466
+ "metadata": {
467
+ "id": "mqMZlP4CcfNa"
468
+ },
469
+ "outputs": [],
470
+ "source": [
471
+ "### Chatbot Functions ###\n",
472
+ "\n",
473
+ "def user(user_message, history):\n",
474
+ " \"\"\"Display user message and update chat history to include it.\n",
475
+ " Also disables user text input until bot is finished (call to reenable_chat())\n",
476
+ " See https://gradio.app/creating-a-chatbot/\"\"\"\n",
477
+ " return gr.update(value=\"\", interactive=True), history + [[user_message, None]]\n",
478
+ "\n",
479
+ "def get_chatbot_response(history):\n",
480
+ " \"\"\"Given the history as a list of tuples of strings, output the new history\n",
481
+ " response (referencing the document via vector store `qa`), along with the\n",
482
+ " currently referenced sources.\"\"\"\n",
483
+ "\n",
484
+ " # TODO Disable input when the user has not added a document for context\n",
485
+ " # Right now, if you try to use this before pressing one of the other\n",
486
+ " # generator buttons, it breaks!\n",
487
+ " if not qa: raise Exception(\"No source text or file given\")\n",
488
+ " history = history or [] # Account for issue if history is empty\n",
489
+ " # Turn nested list into string:\n",
490
+ " # [[\"Hello HAL\", \"Hi Dave\"], [\"Open the pod bay doors\", \"No.\"]] ->\n",
491
+ " # \"Hello HAL\\nHi Dave\\nOpen the pod bay doors\\nNo.\"\n",
492
+ " # TODO: This conversion doesn't distinguish between the input of the user\n",
493
+ " # and the output from the model... I could add tags for each section, e.g.\n",
494
+ " # \"USER:Hello HAL\\nBOT:Hi Dave\\nUSER:Open the pod bay doors\\nBOT:No.\"\n",
495
+ " # But that uses up extra tokens... we need to figure out if that's worth it\n",
496
+ " history_flat = [message for exchange in history\n",
497
+ " for message in exchange if message]\n",
498
+ " history_string = '\\n'.join(history_flat)\n",
499
+ " response = generate_response(history_string)\n",
500
+ " return response\n",
501
+ "\n",
502
+ "def bot(history, use_model=True, character_crawl=False):\n",
503
+ " \"\"\"Get model response based on chat history (including user input).\n",
504
+ " If use_model is False, then return a placeholder value.\n",
505
+ " If character_crawl, then show each character in the output one at a time.\n",
506
+ " See https://gradio.app/creating-a-chatbot/\"\"\"\n",
507
+ "\n",
508
+ " answer, sources = get_chatbot_response(history) if use_model \\\n",
509
+ " else (\"This is a placeholder message.\", [])\n",
510
+ "\n",
511
+ " # Add the link to the sources\n",
512
+ " #bot_message = f\"{answer}\\n[Sources](#sources)\"\n",
513
+ " source_indices = update_source_list(sources)\n",
514
+ " source_citations = \" \".join(format_source_citation(source_index)\n",
515
+ " for source_index in source_indices)\n",
516
+ " bot_message = answer + \"\\nSources: \" + source_citations\n",
517
+ "\n",
518
+ " \"\"\"\n",
519
+ " if character_crawl:\n",
520
+ " # Display generated response, one character at a time\n",
521
+ " # TODO doesn't work if the history is empty (no messages)\n",
522
+ " # However, get_chatbot_response() should always output something, so it's fine\n",
523
+ " history[-1][1] = \"\"\n",
524
+ " for character in bot_message:\n",
525
+ " history[-1][1] += character\n",
526
+ " # TODO Remove timeout\n",
527
+ " time.sleep(0.01)\n",
528
+ " yield history\n",
529
+ " else:\n",
530
+ " history[-1][1] = bot_message\n",
531
+ " yield history\n",
532
+ " \"\"\"\n",
533
+ " history[-1][1] = bot_message\n",
534
+ " yield [history, *get_source_display_updates()]\n",
535
+ "\n",
536
+ "def reenable_chat():\n",
537
+ " \"\"\"Called after user() disables chat and bot provides response, meaning\n",
538
+ " the user is now free to add follow-up questions / comments to chatbot.\"\"\"\n",
539
+ " gr.update(interactive=True)"
540
+ ]
541
+ },
542
+ {
543
+ "cell_type": "code",
544
+ "execution_count": 11,
545
+ "metadata": {
546
+ "id": "lOA1Vn9GKPeN"
547
+ },
548
+ "outputs": [],
549
+ "source": [
550
+ "# Downloadable formats\n",
551
+ "\n",
552
+ "def save_chatbot_dialogue(chatbot, filename='chatbot_dialogue.json'):\n",
553
+ " with open(filename, 'w') as file:\n",
554
+ " json.dump(chatbot, file)\n",
555
+ " files.download(filename)\n",
556
+ "\n",
557
+ "def list_to_str(chatbot):\n",
558
+ " total_interaction = \"\"\n",
559
+ " for line in chatbot:\n",
560
+ " if line[0]:\n",
561
+ " user_message = \"User: \" + line[0] + \"\\n\"\n",
562
+ " total_interaction += user_message\n",
563
+ " if line[1]:\n",
564
+ " chat_message = \"Bot: \" + line[1] + \"\\n\"\n",
565
+ " total_interaction += chat_message\n",
566
+ " return total_interaction\n",
567
+ "\n",
568
+ "from textwrap import wrap\n",
569
+ "def save_chatbot_dialogue_pdf(chatbot, filename=\"chatbot_dialogue.pdf\"):\n",
570
+ " chatbot = list_to_str(chatbot)\n",
571
+ " canvas = Canvas(filename)\n",
572
+ " t = canvas.beginText()\n",
573
+ " t.setFont('Helvetica', 12)\n",
574
+ " t.setCharSpace(2)\n",
575
+ " t.setTextOrigin(50, 750)\n",
576
+ " t.textLines(chatbot)\n",
577
+ " canvas.drawText(t)\n",
578
+ " canvas.save()\n",
579
+ " files.download(filename)\n",
580
+ "\n",
581
+ "def save_as_txt(chatbot, filename = 'chat_dialogue.txt'):\n",
582
+ " chatbot = list_to_str(chatbot)\n",
583
+ " with open(filename, 'w') as file:\n",
584
+ " file.write(chatbot)\n",
585
+ " files.download(filename)\n",
586
+ "\n",
587
+ "def save_as_csv(chatbot, filename = 'chat_dialogue.csv'):\n",
588
+ " chatbot = list_to_str(chatbot)\n",
589
+ " with open(filename, 'w') as file:\n",
590
+ " file.write(chatbot)\n",
591
+ " files.download(filename)"
592
+ ]
593
+ },
594
+ {
595
+ "cell_type": "code",
596
+ "execution_count": 12,
597
+ "metadata": {
598
+ "id": "vV9IY1H4H7UK"
599
+ },
600
+ "outputs": [],
601
+ "source": [
602
+ "# Function to save prompts (premade or custom) and return in the user input box in the chatbot\n",
603
+ "\n",
604
+ "saved_text = \"\"\n",
605
+ "def save_text(text):\n",
606
+ " global saved_text\n",
607
+ " saved_text = text\n",
608
+ "\n",
609
+ "def return_text():\n",
610
+ " # Return the saved text\n",
611
+ " return saved_text"
612
+ ]
613
+ },
614
+ {
615
+ "cell_type": "code",
616
+ "execution_count": 22,
617
+ "metadata": {
618
+ "colab": {
619
+ "base_uri": "https://localhost:8080/",
620
+ "height": 684
621
+ },
622
+ "id": "Z6d6m-sBwnkA",
623
+ "outputId": "37bd4262-7eed-49aa-e064-7f0443ed9a69"
624
+ },
625
+ "outputs": [
626
+ {
627
+ "name": "stderr",
628
+ "output_type": "stream",
629
+ "text": [
630
+ "<ipython-input-22-8f58a957929c>:113: GradioUnusedKwargWarning: You have unused kwarg parameters in Box, please remove them: {'scale': 1}\n",
631
+ " with gr.Box(elem_id=\"sources-container\", scale=1):\n"
632
+ ]
633
+ },
634
+ {
635
+ "name": "stdout",
636
+ "output_type": "stream",
637
+ "text": [
638
+ "Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).\n",
639
+ "\n",
640
+ "Colab notebook detected. To show errors in colab notebook, set debug=True in launch()\n",
641
+ "Running on public URL: https://f880b30167f96bbba1.gradio.live\n",
642
+ "\n",
643
+ "This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)\n"
644
+ ]
645
+ },
646
+ {
647
+ "data": {
648
+ "text/html": [
649
+ "<div><iframe src=\"https://f880b30167f96bbba1.gradio.live\" width=\"100%\" height=\"500\" allow=\"autoplay; camera; microphone; clipboard-read; clipboard-write;\" frameborder=\"0\" allowfullscreen></iframe></div>"
650
+ ],
651
+ "text/plain": [
652
+ "<IPython.core.display.HTML object>"
653
+ ]
654
+ },
655
+ "metadata": {},
656
+ "output_type": "display_data"
657
+ },
658
+ {
659
+ "data": {
660
+ "text/plain": []
661
+ },
662
+ "execution_count": 22,
663
+ "metadata": {},
664
+ "output_type": "execute_result"
665
+ }
666
+ ],
667
+ "source": [
668
+ "# See https://gradio.app/custom-CSS-and-JS/\n",
669
+ "css=\"\"\"\n",
670
+ "#sources-container {\n",
671
+ " overflow: scroll !important; /* Needs to override default formatting */\n",
672
+ " /*max-height: 20em; */ /* Arbitrary value */\n",
673
+ "}\n",
674
+ "#sources-container > div { padding-bottom: 1em !important; /* Arbitrary value */ }\n",
675
+ ".short-height > * > * { min-height: 0 !important; }\n",
676
+ ".translucent { opacity: 0.5; }\n",
677
+ ".textbox_label { padding-bottom: .5em; }\n",
678
+ "\"\"\"\n",
679
+ "#srcs = [] # Reset sources (db and qa are kept the same for ease of testing)\n",
680
+ "\n",
681
+ "with gr.Blocks(css=css, analytics_enabled=False) as demo:\n",
682
+ " gr.Markdown(\"# Student Notebook Thingy\")\n",
683
+ " with gr.Box():\n",
684
+ " gr.HTML(\"\"\"<span>Embed your OpenAI API key below; if you haven't created one already, visit\n",
685
+ " <a href=\"https://platform.openai.com/account/api-keys\">platform.openai.com/account/api-keys</a>\n",
686
+ " to sign up for an account and get your personal API key</span>\"\"\",\n",
687
+ " elem_classes=\"textbox_label\")\n",
688
+ " # TODO Find a better way to fix minor spacing issue below this element\n",
689
+ " input = gr.Textbox(show_label=False, type=\"password\", container=False,\n",
690
+ " placeholder=\"●●●●●●●●●●●●●●●●●\")\n",
691
+ " input.change(fn=embed_key, inputs=input, outputs=None)\n",
692
+ "\n",
693
+ "\n",
694
+ " with gr.Blocks():\n",
695
+ " gr.Markdown(\"## Add Document Source (File / Text / VectorStore)\")\n",
696
+ " # TODO Add entry for path to vector store (should be disabled for now)\n",
697
+ " with gr.Row(equal_height=True):\n",
698
+ " text_input = gr.TextArea(label='Copy and paste your text below',\n",
699
+ " lines=2)\n",
700
+ " # TODO This is called every time the text changes, which is very\n",
701
+ " # wasteful... the solution would be to only call this when necessary,\n",
702
+ " # (i.e. when generating text), but that would require some extra\n",
703
+ " # parameters passed in to the generating functions... maybe\n",
704
+ " # add a check for that in the functions which reference the doc store?\n",
705
+ " text_input.input(create_vector_store_from_text, [text_input], None)\n",
706
+ "\n",
707
+ " file_input = gr.Files(label=\"Load a .txt or .pdf file\",\n",
708
+ " file_types=['.pdf', '.txt'], type=\"file\",\n",
709
+ " elem_classes=\"short-height\")\n",
710
+ " file_input.change(create_vector_store_from_files, [file_input], None)\n",
711
+ "\n",
712
+ " text_input = gr.TextArea(label='Enter vector store URL, if given by instructor (WIP)',\n",
713
+ " lines=2, interactive=\n",
714
+ " False, elem_classes=\"translucent\")\n",
715
+ "\n",
716
+ " with gr.Blocks():\n",
717
+ " #gr.Markdown(\"## Optional: Enter Your Learning Objectives\")\n",
718
+ " learning_objectives = gr.Textbox(label='If provided by your instructor, please input your learning objectives for this session', value='')\n",
719
+ "\n",
720
+ " # Premade question prompts\n",
721
+ " # TODO rename section?`\n",
722
+ " with gr.Blocks():\n",
723
+ " gr.Markdown(\"\"\"\n",
724
+ " ## Generate a Premade Prompt\n",
725
+ " Select your type and number of desired questions. Click \"Generate Prompt\" to get your premade prompt,\n",
726
+ " and then \"Insert Prompt into Chat\" to copy the text into the chat interface below. \\\n",
727
+ " You can also copy the prompt using the icon in the upper right corner and paste directly into the input box when interacting with the model.\n",
728
+ " \"\"\")\n",
729
+ " with gr.Row():\n",
730
+ " with gr.Column():\n",
731
+ " question_type = gr.Dropdown([\"Multiple Choice\", \"True or False\", \"Short Answer\", \"Fill in the Blank\", \"Random\"], label=\"Question Type\")\n",
732
+ " number_of_questions = gr.Textbox(label=\"Enter desired number of questions\")\n",
733
+ " sa_desired_length = gr.Dropdown([\"1-2\", \"3-4\", \"5-6\", \"6 or more\"], label = \"For short answer questions only, choose the desired sentence length for answers. The default value is 1-2 sentences.\")\n",
734
+ " with gr.Column():\n",
735
+ " prompt_button = gr.Button(\"Generate Prompt\")\n",
736
+ " premade_prompt_output = gr.Textbox(label=\"Generated prompt (save or copy)\", show_copy_button=True)\n",
737
+ " prompt_button.click(prompt_select,\n",
738
+ " inputs=[question_type, number_of_questions, sa_desired_length],\n",
739
+ " outputs=premade_prompt_output)\n",
740
+ "\n",
741
+ " insert_premade_prompt_button = gr.Button(\"Input Prompt into Chat\")\n",
742
+ "\n",
743
+ " '''\n",
744
+ " with gr.Blocks():\n",
745
+ " # TODO Consider if we want to have this section\n",
746
+ " # Maybe use this section for prompt refinement? E.g. you enter prompt\n",
747
+ " # and model tells you how to improve it?\n",
748
+ " gr.Markdown(\"\"\"\n",
749
+ " ## Write Your Own Custom Prompt\n",
750
+ " For a comprehensive list of example prompts, see our repository's wiki: https://github.com/vanderbilt-data-science/lo-achievement/wiki/High-Impact-Prompts.\n",
751
+ " After writing your prompt, be sure to click \"Save Prompt\" to save your prompt for chatting with the model below. You can also copy the prompt using the icon in the upper right corner and paste directly into the input box when interacting with the model.\n",
752
+ " \"\"\")\n",
753
+ " custom_prompt_block = gr.TextArea(label=\"Enter your own custom prompt\", show_copy_button=True)\n",
754
+ " #save_prompt_block = gr.Button(\"Save Prompt\")\n",
755
+ " #save_prompt_block.click(save_text, inputs=custom_prompt_block, outputs=None)\n",
756
+ " '''\n",
757
+ "\n",
758
+ " # Chatbot (https://gradio.app/creating-a-chatbot/)\n",
759
+ " '''\n",
760
+ " with gr.Blocks():\n",
761
+ " gr.Markdown(\"\"\"\n",
762
+ " ## Chat with the Model\n",
763
+ " Click \"Display Prompt\" to display the premade or custom prompt that you created earlier. Then, continue chatting with the model.\n",
764
+ " \"\"\")\n",
765
+ " with gr.Row():\n",
766
+ " show_prompt_block = gr.Button(\"Display Prompt\")\n",
767
+ " '''\n",
768
+ " gr.Markdown(\"## Chat with the Model\")\n",
769
+ " with gr.Row(equal_height=True):\n",
770
+ " with gr.Column(scale=2):\n",
771
+ " chatbot = gr.Chatbot()\n",
772
+ " with gr.Row():\n",
773
+ " user_chat_input = gr.Textbox(label=\"User input\", scale=9)\n",
774
+ " user_chat_input.submit(return_text, inputs=None, outputs=user_chat_input)\n",
775
+ " user_chat_submit = gr.Button(\"Ask/answer model\", scale=1)\n",
776
+ " #show_prompt_block.click(return_text, inputs=None, outputs=user_chat_input)\n",
777
+ "\n",
778
+ " # TODO Move the sources so it's displayed to the right of the chat bot,\n",
779
+ " # with the sources taking up about 1/3rd of the horizontal space\n",
780
+ " with gr.Box(elem_id=\"sources-container\", scale=1):\n",
781
+ " # TODO: Display document sources in a nicer format?\n",
782
+ " gr.HTML(value=\"<h3 id='sources'>Sources</h3>\")\n",
783
+ " sources_output = []\n",
784
+ " for i in range(num_sources):\n",
785
+ " source_elem = gr.HTML(visible=False)\n",
786
+ " sources_output.append(source_elem)\n",
787
+ "\n",
788
+ " # Copy text from premade prompt output to user chat input\n",
789
+ " insert_premade_prompt_button.click(lambda text: text,\n",
790
+ " inputs=premade_prompt_output,\n",
791
+ " outputs=user_chat_input)\n",
792
+ "\n",
793
+ " # Display input and output in three-ish parts\n",
794
+ " # (using asynchronous functions):\n",
795
+ " # First show user input, then show model output when complete\n",
796
+ " # Then wait until the bot provides response and return the result\n",
797
+ " # Finally, allow the user to ask a new question by reenabling input\n",
798
+ " async_response = user_chat_submit.click(user,\n",
799
+ " [user_chat_input, chatbot],\n",
800
+ " [user_chat_input, chatbot], queue=False) \\\n",
801
+ " .then(bot, chatbot, [chatbot, *sources_output], queue=True) \\\n",
802
+ " .then(reenable_chat, None, [user_chat_input], queue=False)\n",
803
+ "\n",
804
+ " with gr.Blocks():\n",
805
+ " gr.Markdown(\"\"\"\n",
806
+ " ## Export Your Chat History\n",
807
+ " Export your chat history as a .json, PDF file, .txt, or .csv file\n",
808
+ " \"\"\")\n",
809
+ " with gr.Row():\n",
810
+ " export_dialogue_button = gr.Button(\"JSON\")\n",
811
+ " export_dialogue_button.click(save_chatbot_dialogue, chatbot, None)\n",
812
+ " export_dialogue_button_pdf = gr.Button(\"PDF\")\n",
813
+ " export_dialogue_button_pdf.click(save_chatbot_dialogue_pdf, chatbot, None)\n",
814
+ " export_dialogue_button_txt = gr.Button(\"TXT\")\n",
815
+ " export_dialogue_button_txt.click(save_as_txt, chatbot, None)\n",
816
+ " export_dialogue_button_csv = gr.Button(\"CSV\")\n",
817
+ " export_dialogue_button_csv.click(save_as_csv, chatbot, None)\n",
818
+ "\n",
819
+ "demo.queue()\n",
820
+ "#demo.launch(debug=True)\n",
821
+ "demo.launch()"
822
+ ]
823
+ }
824
+ ],
825
+ "metadata": {
826
+ "colab": {
827
+ "include_colab_link": true,
828
+ "provenance": []
829
+ },
830
+ "kernelspec": {
831
+ "display_name": "Python 3.11.5 64-bit",
832
+ "language": "python",
833
+ "name": "python3"
834
+ },
835
+ "language_info": {
836
+ "codemirror_mode": {
837
+ "name": "ipython",
838
+ "version": 3
839
+ },
840
+ "file_extension": ".py",
841
+ "mimetype": "text/x-python",
842
+ "name": "python",
843
+ "nbconvert_exporter": "python",
844
+ "pygments_lexer": "ipython3",
845
+ "version": "3.11.5"
846
+ },
847
+ "vscode": {
848
+ "interpreter": {
849
+ "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
850
+ }
851
+ }
852
+ },
853
+ "nbformat": 4,
854
+ "nbformat_minor": 0
855
+ }
instructor_intr_notebook.ipynb ADDED
@@ -0,0 +1,3153 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "collapsed": true,
7
+ "id": "brzvVeAsYiG2"
8
+ },
9
+ "source": [
10
+ "<a href=\"https://colab.research.google.com/github/vanderbilt-data-science/lo-achievement/blob/main/instructor_intr_notebook.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "markdown",
15
+ "metadata": {
16
+ "id": "WMKrKfx8_3fc"
17
+ },
18
+ "source": [
19
+ "# Instructor Grading and Assessment\n",
20
+ "This notebook executes grading of student submissions of chats with ChatGPT, exported in JSON. Run each cell should be run in order, and follow the prompts displayed when appropriate."
21
+ ]
22
+ },
23
+ {
24
+ "cell_type": "code",
25
+ "execution_count": 35,
26
+ "metadata": {
27
+ "colab": {
28
+ "base_uri": "https://localhost:8080/",
29
+ "height": 16
30
+ },
31
+ "id": "696FqPrTYiG3",
32
+ "outputId": "9679a415-8ab7-4c5f-e715-954d6801b6ec"
33
+ },
34
+ "outputs": [
35
+ {
36
+ "data": {
37
+ "text/html": [
38
+ "\n",
39
+ " <style>\n",
40
+ " pre {\n",
41
+ " white-space: pre-wrap;\n",
42
+ " }\n",
43
+ " </style>\n",
44
+ " "
45
+ ],
46
+ "text/plain": [
47
+ "<IPython.core.display.HTML object>"
48
+ ]
49
+ },
50
+ "metadata": {},
51
+ "output_type": "display_data"
52
+ }
53
+ ],
54
+ "source": [
55
+ "import ipywidgets as widgets\n",
56
+ "from IPython.display import display, HTML, clear_output\n",
57
+ "import io\n",
58
+ "import zipfile\n",
59
+ "import os\n",
60
+ "import json\n",
61
+ "import pandas as pd\n",
62
+ "import glob\n",
63
+ "from getpass import getpass"
64
+ ]
65
+ },
66
+ {
67
+ "cell_type": "code",
68
+ "execution_count": 36,
69
+ "metadata": {
70
+ "colab": {
71
+ "base_uri": "https://localhost:8080/",
72
+ "height": 16
73
+ },
74
+ "id": "fTlnrMwmYiG4",
75
+ "outputId": "e811e000-e9ec-43b6-d136-59d5134adeaf"
76
+ },
77
+ "outputs": [
78
+ {
79
+ "data": {
80
+ "text/html": [
81
+ "\n",
82
+ " <style>\n",
83
+ " pre {\n",
84
+ " white-space: pre-wrap;\n",
85
+ " }\n",
86
+ " </style>\n",
87
+ " "
88
+ ],
89
+ "text/plain": [
90
+ "<IPython.core.display.HTML object>"
91
+ ]
92
+ },
93
+ "metadata": {},
94
+ "output_type": "display_data"
95
+ }
96
+ ],
97
+ "source": [
98
+ "# \"global\" variables modified by mutability\n",
99
+ "grade_settings = {'learning_objectives':None,\n",
100
+ " 'json_file_path':None,\n",
101
+ " 'json_files':None }"
102
+ ]
103
+ },
104
+ {
105
+ "cell_type": "markdown",
106
+ "metadata": {
107
+ "id": "jb0jnIE14Vuh"
108
+ },
109
+ "source": [
110
+ "The `InstructorGradingConfig` holds the contents of the instantiated object including making graindg settings, extracting files from a zip archive, loading JSON files into DataFrames, and displaying relevant information in the output widget."
111
+ ]
112
+ },
113
+ {
114
+ "cell_type": "code",
115
+ "execution_count": 37,
116
+ "metadata": {
117
+ "colab": {
118
+ "base_uri": "https://localhost:8080/",
119
+ "height": 16
120
+ },
121
+ "id": "mPLdaWiuYiG4",
122
+ "outputId": "7a698bc1-7954-44ac-83c8-71d1dc410749"
123
+ },
124
+ "outputs": [
125
+ {
126
+ "data": {
127
+ "text/html": [
128
+ "\n",
129
+ " <style>\n",
130
+ " pre {\n",
131
+ " white-space: pre-wrap;\n",
132
+ " }\n",
133
+ " </style>\n",
134
+ " "
135
+ ],
136
+ "text/plain": [
137
+ "<IPython.core.display.HTML object>"
138
+ ]
139
+ },
140
+ "metadata": {},
141
+ "output_type": "display_data"
142
+ }
143
+ ],
144
+ "source": [
145
+ "class InstructorGradingConfig:\n",
146
+ " def __init__(self):\n",
147
+ " # layouts to help with styling\n",
148
+ " self.items_layout = widgets.Layout(width='auto')\n",
149
+ "\n",
150
+ " self.box_layout = widgets.Layout(display='flex',\n",
151
+ " flex_flow='column',\n",
152
+ " align_items='stretch',\n",
153
+ " width='50%',\n",
154
+ " border='solid 1px gray',\n",
155
+ " padding='0px 30px 20px 30px')\n",
156
+ "\n",
157
+ " # Create all components\n",
158
+ " self.ui_title = widgets.HTML(value=\"<h2>Instructor Grading Configuration</h2>\")\n",
159
+ "\n",
160
+ " self.run_button = widgets.Button(description='Submit', button_style='success', icon='check')\n",
161
+ " self.status_output = widgets.Output()\n",
162
+ " self.status_output.append_stdout('Waiting...')\n",
163
+ "\n",
164
+ " # Setup click behavior\n",
165
+ " self.run_button.on_click(self._setup_environment)\n",
166
+ "\n",
167
+ " # Reset rest of state\n",
168
+ " self.reset_state()\n",
169
+ "\n",
170
+ " def reset_state(self, close_all=False):\n",
171
+ "\n",
172
+ " if close_all:\n",
173
+ " self.learning_objectives_text.close()\n",
174
+ " self.file_upload.close()\n",
175
+ " self.file_upload_box.close()\n",
176
+ " #self.ui_container.close()\n",
177
+ "\n",
178
+ " self.learning_objectives_text = widgets.Textarea(value='', description='Learning Objectives',\n",
179
+ " placeholder='Learning objectives: 1. Understand and implement classes in object-oriented programming',\n",
180
+ " layout=self.items_layout,\n",
181
+ " style={'description_width': 'initial'})\n",
182
+ " self.file_upload = widgets.FileUpload(\n",
183
+ " accept='.zip', # Accepted file extension e.g. '.txt', '.pdf', 'image/*', 'image/*,.pdf'\n",
184
+ " multiple=False # True to accept multiple files upload else False\n",
185
+ " )\n",
186
+ " self.file_upload_box = widgets.HBox([widgets.Label('Upload User Files:\\t'), self.file_upload])\n",
187
+ "\n",
188
+ "\n",
189
+ " # Create a VBox container to arrange the widgets vertically\n",
190
+ " self.ui_container = widgets.VBox([self.ui_title, self.learning_objectives_text,\n",
191
+ " self.file_upload_box, self.run_button, self.status_output],\n",
192
+ " layout=self.box_layout)\n",
193
+ "\n",
194
+ "\n",
195
+ " def _setup_environment(self, btn):\n",
196
+ " grade_settings['learning_objectives'] = self.learning_objectives_text.value\n",
197
+ " grade_settings['json_file_path'] = self.file_upload.value\n",
198
+ "\n",
199
+ " if self.file_upload.value:\n",
200
+ " try:\n",
201
+ " input_file = list(self.file_upload.value.values())[0]\n",
202
+ " extracted_zip_dir = list(grade_settings['json_file_path'].keys())[0][:-4]\n",
203
+ " except:\n",
204
+ " input_file = self.file_upload.value[0]\n",
205
+ " extracted_zip_dir = self.file_upload.value[0]['name'][:-4]\n",
206
+ "\n",
207
+ " self.status_output.clear_output()\n",
208
+ " self.status_output.append_stdout('Loading zip file...\\n')\n",
209
+ "\n",
210
+ " with zipfile.ZipFile(io.BytesIO(input_file['content']), \"r\") as z:\n",
211
+ " z.extractall()\n",
212
+ " extracted_files = z.namelist()\n",
213
+ "\n",
214
+ " self.status_output.append_stdout('Extracted files and directories: {0}\\n'.format(', '.join(extracted_files)))\n",
215
+ "\n",
216
+ " # load all json files\n",
217
+ " grade_settings['json_files'] = glob.glob(''.join([extracted_zip_dir, '/**/*.json']), recursive=True)\n",
218
+ "\n",
219
+ " #status_output.clear_output()\n",
220
+ " self.status_output.append_stdout('Loading successful!\\nLearning Objectives: {0}\\nExtracted JSON files: {1}'.format(grade_settings['learning_objectives'],\n",
221
+ " ', '.join(grade_settings['json_files'])))\n",
222
+ "\n",
223
+ " else:\n",
224
+ " self.status_output.clear_output()\n",
225
+ " self.status_output.append_stdout('Please upload a zip file.')\n",
226
+ "\n",
227
+ " # Clear values so they're not saved\n",
228
+ " self.learning_objectives_text.value = ''\n",
229
+ " self.reset_state(close_all=True)\n",
230
+ " self.run_ui_container()\n",
231
+ "\n",
232
+ " with self.status_output:\n",
233
+ " print('Extracted files and directories: {0}\\n'.format(', '.join(extracted_files)))\n",
234
+ " print('Loading successful!\\nLearning Objectives: {0}\\nExtracted JSON files: {1}'.format(grade_settings['learning_objectives'],\n",
235
+ " ', '.join(grade_settings['json_files'])))\n",
236
+ " print('Submitted and Reset all values.')\n",
237
+ "\n",
238
+ "\n",
239
+ " def run_ui_container(self):\n",
240
+ " display(self.ui_container, clear=True)"
241
+ ]
242
+ },
243
+ {
244
+ "cell_type": "code",
245
+ "execution_count": null,
246
+ "metadata": {
247
+ "colab": {
248
+ "base_uri": "https://localhost:8080/",
249
+ "height": 16
250
+ },
251
+ "id": "4wCQ4Wk8YiG4",
252
+ "outputId": "5c602e80-a210-4449-fd6c-eb8bf3213407"
253
+ },
254
+ "outputs": [
255
+ {
256
+ "data": {
257
+ "text/html": [
258
+ "\n",
259
+ " <style>\n",
260
+ " pre {\n",
261
+ " white-space: pre-wrap;\n",
262
+ " }\n",
263
+ " </style>\n",
264
+ " "
265
+ ],
266
+ "text/plain": [
267
+ "<IPython.core.display.HTML object>"
268
+ ]
269
+ },
270
+ "metadata": {},
271
+ "output_type": "display_data"
272
+ }
273
+ ],
274
+ "source": [
275
+ "#This code helps in the case that we have problems with metadata being retained.\n",
276
+ "#!jupyter nbconvert --ClearOutputPreprocessor.enabled=True --ClearMetadataPreprocessor.enabled=True --ClearMetadataPreprocessor.preserve_cell_metadata_mask \"colab\" --ClearMetadataPreprocessor.preserve_cell_metadata_mask \"kernelspec\" --ClearMetadataPreprocessor.preserve_cell_metadata_mask \"language_info\" --to=notebook --output=instructor_inst_notebook.ipynb instructor_intr_notebook.ipynb"
277
+ ]
278
+ },
279
+ {
280
+ "cell_type": "markdown",
281
+ "metadata": {
282
+ "id": "gj1K3MjHDlqb"
283
+ },
284
+ "source": [
285
+ "# User Settings and Submission Upload\n",
286
+ "The following two cells will ask you for your OpenAI API credentials and to upload the json file of the student submission."
287
+ ]
288
+ },
289
+ {
290
+ "cell_type": "code",
291
+ "execution_count": 4,
292
+ "metadata": {
293
+ "colab": {
294
+ "base_uri": "https://localhost:8080/",
295
+ "height": 519,
296
+ "referenced_widgets": [
297
+ "a84d31fb8f4e4bafb74035158834b404",
298
+ "b051a90758434644955747bc02d00bab",
299
+ "252b8009f3734ed2908049ebb40c0247",
300
+ "6622f76f91f44527a87a7575bbd388d2",
301
+ "654ab6d155eb457ea5c719a9ac27ad5b",
302
+ "86cb4f568f454ff8832face502fb0745",
303
+ "e30fe87f01bc4580a61713b5b72439a2",
304
+ "d16b25c7e9e948938c9303fbe8ae3dcc",
305
+ "453b12da4b6540cd9e4e57f73a4d670c",
306
+ "b74cf92175374028948d4cf529d4d1e6",
307
+ "f7d75b0a32554a9589c513336fc30095",
308
+ "7f7164e80a464ba9b99f96c10132db25",
309
+ "49f80567705147f0b82d45b7f06dd1ba",
310
+ "5a17f4509d194105b23dd616e45183d5",
311
+ "81c4dda35a7d4e15821bb4bc0973354e",
312
+ "df1c46361f714aceb9c046f98fede40c",
313
+ "60b80d550efa403a825a3cb913c26f53",
314
+ "d0bd0e3f12594ff1a51365b65a3fcc43",
315
+ "dfa8d6c7d70b42468cbda035de89404c",
316
+ "26d13984d45745858d3b890bc7f18a90",
317
+ "53722998fbe64a7c94829b79e8cd69d6",
318
+ "1b7ee0de15484cd5aecd6d8ca3b6ee9d",
319
+ "dde20647d3594d31b66b19659f53a95e",
320
+ "8610fffd2d2a4ec28f8c874c06073ce7",
321
+ "54e3918921f44fb4a9020beab951fcdf",
322
+ "1072a8a142f64dfd96ee528a2e9d1595",
323
+ "67b4083cd4234f52bb7cca27ab9cddb3",
324
+ "d0a1ebdf7fc0473f91c39b29ca580934",
325
+ "abbecdc637694e7cb026e003244e7037",
326
+ "7f814595d31e4b86992b5bd6bc85ced4",
327
+ "76548751bb9c4bcb9d4f39788ea7d4af",
328
+ "dbb88901f5084d49af208b91b52b6073"
329
+ ]
330
+ },
331
+ "id": "oQOeYl9OYiG5",
332
+ "outputId": "bb5b7dc4-ea7b-41ea-e741-2fb2bf66cccc"
333
+ },
334
+ "outputs": [
335
+ {
336
+ "data": {
337
+ "application/vnd.jupyter.widget-view+json": {
338
+ "model_id": "1b7ee0de15484cd5aecd6d8ca3b6ee9d",
339
+ "version_major": 2,
340
+ "version_minor": 0
341
+ },
342
+ "text/plain": [
343
+ "VBox(children=(HTML(value='<h2>Instructor Grading Configuration</h2>'), Textarea(value='', description='Learni…"
344
+ ]
345
+ },
346
+ "metadata": {},
347
+ "output_type": "display_data"
348
+ }
349
+ ],
350
+ "source": [
351
+ "InstructorGradingConfig().run_ui_container()"
352
+ ]
353
+ },
354
+ {
355
+ "cell_type": "markdown",
356
+ "metadata": {
357
+ "id": "W9SqmkpeIgpk"
358
+ },
359
+ "source": [
360
+ "You will need an OpenAI API key in order to access the chat functionality. In the following cell, you'll see a blank box pop up - copy your API key there and press enter."
361
+ ]
362
+ },
363
+ {
364
+ "cell_type": "code",
365
+ "execution_count": 5,
366
+ "metadata": {
367
+ "colab": {
368
+ "base_uri": "https://localhost:8080/",
369
+ "height": 32
370
+ },
371
+ "id": "MK8R5DmEYiG5",
372
+ "outputId": "09e11ee6-5a9f-4b61-ff61-ddf82a68c498"
373
+ },
374
+ "outputs": [
375
+ {
376
+ "data": {
377
+ "text/html": [
378
+ "\n",
379
+ " <style>\n",
380
+ " pre {\n",
381
+ " white-space: pre-wrap;\n",
382
+ " }\n",
383
+ " </style>\n",
384
+ " "
385
+ ],
386
+ "text/plain": [
387
+ "<IPython.core.display.HTML object>"
388
+ ]
389
+ },
390
+ "metadata": {},
391
+ "output_type": "display_data"
392
+ },
393
+ {
394
+ "name": "stdout",
395
+ "output_type": "stream",
396
+ "text": [
397
+ "··········\n"
398
+ ]
399
+ }
400
+ ],
401
+ "source": [
402
+ "# setup open AI api key\n",
403
+ "openai_api_key = getpass()"
404
+ ]
405
+ },
406
+ {
407
+ "cell_type": "markdown",
408
+ "metadata": {
409
+ "collapsed": true,
410
+ "id": "0bp158bj_0s6"
411
+ },
412
+ "source": [
413
+ "# Execute Grading\n",
414
+ "Run this cell set to have the generative AI assist you in grading."
415
+ ]
416
+ },
417
+ {
418
+ "cell_type": "markdown",
419
+ "metadata": {
420
+ "id": "vyJuQ7RUR8tB"
421
+ },
422
+ "source": [
423
+ "## Installation and Loading"
424
+ ]
425
+ },
426
+ {
427
+ "cell_type": "code",
428
+ "execution_count": 6,
429
+ "metadata": {
430
+ "colab": {
431
+ "base_uri": "https://localhost:8080/",
432
+ "height": 16
433
+ },
434
+ "id": "tjKxWLA3YiG5",
435
+ "outputId": "6dc85dff-4baa-44f0-edef-42925e6c271a"
436
+ },
437
+ "outputs": [
438
+ {
439
+ "data": {
440
+ "text/html": [
441
+ "\n",
442
+ " <style>\n",
443
+ " pre {\n",
444
+ " white-space: pre-wrap;\n",
445
+ " }\n",
446
+ " </style>\n",
447
+ " "
448
+ ],
449
+ "text/plain": [
450
+ "<IPython.core.display.HTML object>"
451
+ ]
452
+ },
453
+ "metadata": {},
454
+ "output_type": "display_data"
455
+ }
456
+ ],
457
+ "source": [
458
+ "%%capture\n",
459
+ "# install additional packages if needed\n",
460
+ "! pip install -q langchain openai"
461
+ ]
462
+ },
463
+ {
464
+ "cell_type": "code",
465
+ "execution_count": 7,
466
+ "metadata": {
467
+ "colab": {
468
+ "base_uri": "https://localhost:8080/",
469
+ "height": 16
470
+ },
471
+ "id": "S3oQiNm_YiG5",
472
+ "outputId": "95e43744-fadc-45cc-bb02-05737d12fcb2"
473
+ },
474
+ "outputs": [
475
+ {
476
+ "data": {
477
+ "text/html": [
478
+ "\n",
479
+ " <style>\n",
480
+ " pre {\n",
481
+ " white-space: pre-wrap;\n",
482
+ " }\n",
483
+ " </style>\n",
484
+ " "
485
+ ],
486
+ "text/plain": [
487
+ "<IPython.core.display.HTML object>"
488
+ ]
489
+ },
490
+ "metadata": {},
491
+ "output_type": "display_data"
492
+ }
493
+ ],
494
+ "source": [
495
+ "# import necessary libraries here\n",
496
+ "from langchain.llms import OpenAI\n",
497
+ "from langchain.chat_models import ChatOpenAI\n",
498
+ "from langchain.prompts import PromptTemplate\n",
499
+ "from langchain.document_loaders import TextLoader\n",
500
+ "from langchain.indexes import VectorstoreIndexCreator\n",
501
+ "from langchain.text_splitter import CharacterTextSplitter\n",
502
+ "from langchain.embeddings import OpenAIEmbeddings\n",
503
+ "from langchain.schema import SystemMessage, HumanMessage, AIMessage\n",
504
+ "import openai"
505
+ ]
506
+ },
507
+ {
508
+ "cell_type": "code",
509
+ "execution_count": 8,
510
+ "metadata": {
511
+ "colab": {
512
+ "base_uri": "https://localhost:8080/",
513
+ "height": 16
514
+ },
515
+ "id": "uXfSTQPrYiG5",
516
+ "outputId": "f85a6c4a-009f-4b30-f74a-04b6bfd85af6"
517
+ },
518
+ "outputs": [
519
+ {
520
+ "data": {
521
+ "text/html": [
522
+ "\n",
523
+ " <style>\n",
524
+ " pre {\n",
525
+ " white-space: pre-wrap;\n",
526
+ " }\n",
527
+ " </style>\n",
528
+ " "
529
+ ],
530
+ "text/plain": [
531
+ "<IPython.core.display.HTML object>"
532
+ ]
533
+ },
534
+ "metadata": {},
535
+ "output_type": "display_data"
536
+ }
537
+ ],
538
+ "source": [
539
+ "# Helper because lines are printed too long; helps with wrapping visualization\n",
540
+ "from IPython.display import HTML, display\n",
541
+ "\n",
542
+ "def set_css():\n",
543
+ " display(HTML('''\n",
544
+ " <style>\n",
545
+ " pre {\n",
546
+ " white-space: pre-wrap;\n",
547
+ " }\n",
548
+ " </style>\n",
549
+ " '''))\n",
550
+ "get_ipython().events.register('pre_run_cell', set_css)"
551
+ ]
552
+ },
553
+ {
554
+ "cell_type": "code",
555
+ "execution_count": 9,
556
+ "metadata": {
557
+ "colab": {
558
+ "base_uri": "https://localhost:8080/",
559
+ "height": 16
560
+ },
561
+ "id": "sTQFW9TxYiG5",
562
+ "outputId": "e291e167-e635-4006-965d-29fe1a0db10f"
563
+ },
564
+ "outputs": [
565
+ {
566
+ "data": {
567
+ "text/html": [
568
+ "\n",
569
+ " <style>\n",
570
+ " pre {\n",
571
+ " white-space: pre-wrap;\n",
572
+ " }\n",
573
+ " </style>\n",
574
+ " "
575
+ ],
576
+ "text/plain": [
577
+ "<IPython.core.display.HTML object>"
578
+ ]
579
+ },
580
+ "metadata": {},
581
+ "output_type": "display_data"
582
+ },
583
+ {
584
+ "data": {
585
+ "text/html": [
586
+ "\n",
587
+ " <style>\n",
588
+ " pre {\n",
589
+ " white-space: pre-wrap;\n",
590
+ " }\n",
591
+ " </style>\n",
592
+ " "
593
+ ],
594
+ "text/plain": [
595
+ "<IPython.core.display.HTML object>"
596
+ ]
597
+ },
598
+ "metadata": {},
599
+ "output_type": "display_data"
600
+ }
601
+ ],
602
+ "source": [
603
+ "# Set pandas display options\n",
604
+ "pd.set_option('display.max_columns', None)\n",
605
+ "pd.set_option('display.max_colwidth', 0)"
606
+ ]
607
+ },
608
+ {
609
+ "cell_type": "markdown",
610
+ "metadata": {
611
+ "id": "DOACT_LSSM58"
612
+ },
613
+ "source": [
614
+ "Setting of API key in environment and other settings"
615
+ ]
616
+ },
617
+ {
618
+ "cell_type": "code",
619
+ "execution_count": 10,
620
+ "metadata": {
621
+ "colab": {
622
+ "base_uri": "https://localhost:8080/",
623
+ "height": 16
624
+ },
625
+ "id": "OV05xRtDYiG5",
626
+ "outputId": "0d6339d9-bc32-49e9-955f-99947b510456"
627
+ },
628
+ "outputs": [
629
+ {
630
+ "data": {
631
+ "text/html": [
632
+ "\n",
633
+ " <style>\n",
634
+ " pre {\n",
635
+ " white-space: pre-wrap;\n",
636
+ " }\n",
637
+ " </style>\n",
638
+ " "
639
+ ],
640
+ "text/plain": [
641
+ "<IPython.core.display.HTML object>"
642
+ ]
643
+ },
644
+ "metadata": {},
645
+ "output_type": "display_data"
646
+ },
647
+ {
648
+ "data": {
649
+ "text/html": [
650
+ "\n",
651
+ " <style>\n",
652
+ " pre {\n",
653
+ " white-space: pre-wrap;\n",
654
+ " }\n",
655
+ " </style>\n",
656
+ " "
657
+ ],
658
+ "text/plain": [
659
+ "<IPython.core.display.HTML object>"
660
+ ]
661
+ },
662
+ "metadata": {},
663
+ "output_type": "display_data"
664
+ }
665
+ ],
666
+ "source": [
667
+ "#extract info from dictionary\n",
668
+ "json_file_path = grade_settings['json_file_path']\n",
669
+ "learning_objectives = grade_settings['learning_objectives']\n",
670
+ "\n",
671
+ "#set API key\n",
672
+ "os.environ[\"OPENAI_API_KEY\"] = openai_api_key\n",
673
+ "openai.api_key = openai_api_key"
674
+ ]
675
+ },
676
+ {
677
+ "cell_type": "markdown",
678
+ "metadata": {
679
+ "id": "YreIs-I-tuxx"
680
+ },
681
+ "source": [
682
+ "Initiate the OpenAI model using Langchain."
683
+ ]
684
+ },
685
+ {
686
+ "cell_type": "code",
687
+ "execution_count": 11,
688
+ "metadata": {
689
+ "colab": {
690
+ "base_uri": "https://localhost:8080/",
691
+ "height": 16
692
+ },
693
+ "id": "ZRn9wbJBYiG5",
694
+ "outputId": "c09c7a7c-1ca0-4860-b39d-91778f183307"
695
+ },
696
+ "outputs": [
697
+ {
698
+ "data": {
699
+ "text/html": [
700
+ "\n",
701
+ " <style>\n",
702
+ " pre {\n",
703
+ " white-space: pre-wrap;\n",
704
+ " }\n",
705
+ " </style>\n",
706
+ " "
707
+ ],
708
+ "text/plain": [
709
+ "<IPython.core.display.HTML object>"
710
+ ]
711
+ },
712
+ "metadata": {},
713
+ "output_type": "display_data"
714
+ },
715
+ {
716
+ "data": {
717
+ "text/html": [
718
+ "\n",
719
+ " <style>\n",
720
+ " pre {\n",
721
+ " white-space: pre-wrap;\n",
722
+ " }\n",
723
+ " </style>\n",
724
+ " "
725
+ ],
726
+ "text/plain": [
727
+ "<IPython.core.display.HTML object>"
728
+ ]
729
+ },
730
+ "metadata": {},
731
+ "output_type": "display_data"
732
+ }
733
+ ],
734
+ "source": [
735
+ "llm = ChatOpenAI(model='gpt-3.5-turbo-16k')\n",
736
+ "messages = [\n",
737
+ " SystemMessage(content=\"You are a helpful assistant.\"),\n",
738
+ " HumanMessage(content=\"\")\n",
739
+ "]"
740
+ ]
741
+ },
742
+ {
743
+ "cell_type": "markdown",
744
+ "metadata": {
745
+ "id": "pIKYtr0UTJNc"
746
+ },
747
+ "source": [
748
+ "## Functions to help with loading json"
749
+ ]
750
+ },
751
+ {
752
+ "cell_type": "markdown",
753
+ "metadata": {
754
+ "id": "t7O3XPC29Osw"
755
+ },
756
+ "source": [
757
+ "`file_upload_json_to_df` helps when you use the file uploader as the json is directly read in this case. `clean_keys` helps when there are errors on the keys when reading."
758
+ ]
759
+ },
760
+ {
761
+ "cell_type": "code",
762
+ "execution_count": 12,
763
+ "metadata": {
764
+ "colab": {
765
+ "base_uri": "https://localhost:8080/",
766
+ "height": 16
767
+ },
768
+ "id": "qGxPHexrYiG5",
769
+ "outputId": "80657bb1-97e8-423a-a2ff-99afe8d22718"
770
+ },
771
+ "outputs": [
772
+ {
773
+ "data": {
774
+ "text/html": [
775
+ "\n",
776
+ " <style>\n",
777
+ " pre {\n",
778
+ " white-space: pre-wrap;\n",
779
+ " }\n",
780
+ " </style>\n",
781
+ " "
782
+ ],
783
+ "text/plain": [
784
+ "<IPython.core.display.HTML object>"
785
+ ]
786
+ },
787
+ "metadata": {},
788
+ "output_type": "display_data"
789
+ },
790
+ {
791
+ "data": {
792
+ "text/html": [
793
+ "\n",
794
+ " <style>\n",
795
+ " pre {\n",
796
+ " white-space: pre-wrap;\n",
797
+ " }\n",
798
+ " </style>\n",
799
+ " "
800
+ ],
801
+ "text/plain": [
802
+ "<IPython.core.display.HTML object>"
803
+ ]
804
+ },
805
+ "metadata": {},
806
+ "output_type": "display_data"
807
+ }
808
+ ],
809
+ "source": [
810
+ "# Strip beginning and ending newlines\n",
811
+ "def clean_keys(loaded_json):\n",
812
+ " out_json = [{key.strip():value for key, value in json_dict.items()} for json_dict in loaded_json ]\n",
813
+ " return out_json\n",
814
+ "\n",
815
+ "# Convert difficult datatypes to newlines\n",
816
+ "def file_upload_json_to_df(upload_json):\n",
817
+ "\n",
818
+ " #get middle key of json to extract content\n",
819
+ " fname = list(upload_json.keys())[0]\n",
820
+ "\n",
821
+ " #load the json; strict allows us to get around encoding issues\n",
822
+ " loaded_json = json.loads(upload_json[fname]['content'], strict=False)\n",
823
+ "\n",
824
+ " #clean the keys if needed\n",
825
+ " loaded_json = clean_keys(loaded_json)\n",
826
+ "\n",
827
+ " return pd.DataFrame(loaded_json)"
828
+ ]
829
+ },
830
+ {
831
+ "cell_type": "markdown",
832
+ "metadata": {
833
+ "id": "N2yuYFQJYiG6"
834
+ },
835
+ "source": [
836
+ "`create_user_dataframe` filters based on role to create a dataframe for only user responses"
837
+ ]
838
+ },
839
+ {
840
+ "cell_type": "code",
841
+ "execution_count": 13,
842
+ "metadata": {
843
+ "colab": {
844
+ "base_uri": "https://localhost:8080/",
845
+ "height": 17
846
+ },
847
+ "id": "58hygjTXYiG6",
848
+ "outputId": "8f3683fb-f3da-45f3-8338-772e7583d4cc"
849
+ },
850
+ "outputs": [
851
+ {
852
+ "data": {
853
+ "text/html": [
854
+ "\n",
855
+ " <style>\n",
856
+ " pre {\n",
857
+ " white-space: pre-wrap;\n",
858
+ " }\n",
859
+ " </style>\n",
860
+ " "
861
+ ],
862
+ "text/plain": [
863
+ "<IPython.core.display.HTML object>"
864
+ ]
865
+ },
866
+ "metadata": {},
867
+ "output_type": "display_data"
868
+ }
869
+ ],
870
+ "source": [
871
+ "def create_user_dataframe(df):\n",
872
+ " df_user = df.query(\"`author` == 'user'\")\n",
873
+ "\n",
874
+ " return df_user"
875
+ ]
876
+ },
877
+ {
878
+ "cell_type": "markdown",
879
+ "metadata": {
880
+ "id": "MOwaLI97Igpm"
881
+ },
882
+ "source": [
883
+ "`load_json_as_df` helps when you use the file uploader as the json is directly read in this case. It accepts the path to the JSON to load the dataframe based on the json."
884
+ ]
885
+ },
886
+ {
887
+ "cell_type": "code",
888
+ "execution_count": 131,
889
+ "metadata": {
890
+ "colab": {
891
+ "base_uri": "https://localhost:8080/",
892
+ "height": 16
893
+ },
894
+ "id": "w0xN9CJeYiG6",
895
+ "outputId": "9422452f-9a97-4f22-9e49-ea0812c298fd"
896
+ },
897
+ "outputs": [
898
+ {
899
+ "data": {
900
+ "text/html": [
901
+ "\n",
902
+ " <style>\n",
903
+ " pre {\n",
904
+ " white-space: pre-wrap;\n",
905
+ " }\n",
906
+ " </style>\n",
907
+ " "
908
+ ],
909
+ "text/plain": [
910
+ "<IPython.core.display.HTML object>"
911
+ ]
912
+ },
913
+ "metadata": {},
914
+ "output_type": "display_data"
915
+ },
916
+ {
917
+ "data": {
918
+ "text/html": [
919
+ "\n",
920
+ " <style>\n",
921
+ " pre {\n",
922
+ " white-space: pre-wrap;\n",
923
+ " }\n",
924
+ " </style>\n",
925
+ " "
926
+ ],
927
+ "text/plain": [
928
+ "<IPython.core.display.HTML object>"
929
+ ]
930
+ },
931
+ "metadata": {},
932
+ "output_type": "display_data"
933
+ }
934
+ ],
935
+ "source": [
936
+ "def load_json_as_df(fpath):\n",
937
+ " # check if file is .json\n",
938
+ " if not fpath.endswith('.json'):\n",
939
+ " return None\n",
940
+ "\n",
941
+ " keys = [\"timestamp\", \"author\", \"message\"]\n",
942
+ "\n",
943
+ " df_out = None\n",
944
+ " out_error = None\n",
945
+ "\n",
946
+ " try:\n",
947
+ " # Read JSON file\n",
948
+ " with open(fpath, \"r\") as f:\n",
949
+ " json_data = f.read()\n",
950
+ "\n",
951
+ " # Load JSON data\n",
952
+ " data = json.loads(json_data, strict=False)\n",
953
+ "\n",
954
+ " # Quick check to see if we can fix common errors in json\n",
955
+ " # 1. JSON responses wrapped in enclosing dictionary\n",
956
+ " if isinstance(data, dict):\n",
957
+ " if len(data.keys()) == 1:\n",
958
+ " data = data[list(data.keys())[0]]\n",
959
+ " else:\n",
960
+ " data = [data] # convert to list otherwise\n",
961
+ "\n",
962
+ " # We only operate on lists of dictionaries\n",
963
+ " if isinstance(data, list):\n",
964
+ " data = clean_keys(data) # clean keys to make sure there are no unnecessary newlines\n",
965
+ "\n",
966
+ " if all(all(k in d for k in keys) for d in data):\n",
967
+ " # Filter only the student messages based on the \"author\" key\n",
968
+ " data = [d for d in data if d[\"author\"].lower() == \"user\"]\n",
969
+ "\n",
970
+ " df_out = pd.json_normalize(data)\n",
971
+ " if len(df_out) <= 1:\n",
972
+ " out_error = [fpath, \"Warning: JSON keys correct, but something wrong with the overall structure of the JSON when converting to the dataframe. The dataframe only has one row. Skipping.\"]\n",
973
+ " df_out = None\n",
974
+ " else:\n",
975
+ " out_error = [fpath, \"Error: JSON Keys are incorrect. Found keys: \" + str(list(data[0].keys()))]\n",
976
+ " else:\n",
977
+ " out_error = [fpath, \"Error: Something is wrong with the structure of the JSON.\"]\n",
978
+ "\n",
979
+ " except Exception as e:\n",
980
+ " print(f\"Error processing file {fpath}: {str(e)}\")\n",
981
+ " out_error = [fpath, \"Fatal System Error: \" + str(e)]\n",
982
+ "\n",
983
+ " if df_out is not None:\n",
984
+ " df_out['filename'] = fpath\n",
985
+ "\n",
986
+ " return df_out, out_error"
987
+ ]
988
+ },
989
+ {
990
+ "cell_type": "markdown",
991
+ "metadata": {
992
+ "id": "N2yuYFQJYiG6"
993
+ },
994
+ "source": [
995
+ "`create_user_dataframe` filters based on role to create a dataframe for only user responses"
996
+ ]
997
+ },
998
+ {
999
+ "cell_type": "code",
1000
+ "execution_count": 132,
1001
+ "metadata": {
1002
+ "colab": {
1003
+ "base_uri": "https://localhost:8080/",
1004
+ "height": 16
1005
+ },
1006
+ "id": "58hygjTXYiG6",
1007
+ "outputId": "44b588d6-6b6c-4c9e-b944-62ac29117344"
1008
+ },
1009
+ "outputs": [
1010
+ {
1011
+ "data": {
1012
+ "text/html": [
1013
+ "\n",
1014
+ " <style>\n",
1015
+ " pre {\n",
1016
+ " white-space: pre-wrap;\n",
1017
+ " }\n",
1018
+ " </style>\n",
1019
+ " "
1020
+ ],
1021
+ "text/plain": [
1022
+ "<IPython.core.display.HTML object>"
1023
+ ]
1024
+ },
1025
+ "metadata": {},
1026
+ "output_type": "display_data"
1027
+ },
1028
+ {
1029
+ "data": {
1030
+ "text/html": [
1031
+ "\n",
1032
+ " <style>\n",
1033
+ " pre {\n",
1034
+ " white-space: pre-wrap;\n",
1035
+ " }\n",
1036
+ " </style>\n",
1037
+ " "
1038
+ ],
1039
+ "text/plain": [
1040
+ "<IPython.core.display.HTML object>"
1041
+ ]
1042
+ },
1043
+ "metadata": {},
1044
+ "output_type": "display_data"
1045
+ }
1046
+ ],
1047
+ "source": [
1048
+ "def create_user_dataframe(df):\n",
1049
+ " df_user = df.query(\"`author` == 'user'\")\n",
1050
+ "\n",
1051
+ " return df_user"
1052
+ ]
1053
+ },
1054
+ {
1055
+ "cell_type": "markdown",
1056
+ "metadata": {
1057
+ "id": "KA5moX-1Igpn"
1058
+ },
1059
+ "source": [
1060
+ "The `process_file` and `process_files` functions provide the implementation of prompt templates for instructor grading. It uses the input components to assemble a prompt and then sends this prompt to the llm for evaluation alongside the read dataframes."
1061
+ ]
1062
+ },
1063
+ {
1064
+ "cell_type": "code",
1065
+ "execution_count": 15,
1066
+ "metadata": {
1067
+ "colab": {
1068
+ "base_uri": "https://localhost:8080/",
1069
+ "height": 16
1070
+ },
1071
+ "id": "nFz3UVL3YiG6",
1072
+ "outputId": "01029e5f-b313-4941-d572-6dca5903d4ac"
1073
+ },
1074
+ "outputs": [
1075
+ {
1076
+ "data": {
1077
+ "text/html": [
1078
+ "\n",
1079
+ " <style>\n",
1080
+ " pre {\n",
1081
+ " white-space: pre-wrap;\n",
1082
+ " }\n",
1083
+ " </style>\n",
1084
+ " "
1085
+ ],
1086
+ "text/plain": [
1087
+ "<IPython.core.display.HTML object>"
1088
+ ]
1089
+ },
1090
+ "metadata": {},
1091
+ "output_type": "display_data"
1092
+ },
1093
+ {
1094
+ "data": {
1095
+ "text/html": [
1096
+ "\n",
1097
+ " <style>\n",
1098
+ " pre {\n",
1099
+ " white-space: pre-wrap;\n",
1100
+ " }\n",
1101
+ " </style>\n",
1102
+ " "
1103
+ ],
1104
+ "text/plain": [
1105
+ "<IPython.core.display.HTML object>"
1106
+ ]
1107
+ },
1108
+ "metadata": {},
1109
+ "output_type": "display_data"
1110
+ }
1111
+ ],
1112
+ "source": [
1113
+ "def process_file(df, desc, instr, print_results):\n",
1114
+ " messages_as_string = '\\n'.join(df['message'].astype(str))\n",
1115
+ " context = messages_as_string\n",
1116
+ "\n",
1117
+ " # Assemble prompt\n",
1118
+ " prompt = desc if desc is not None else \"\"\n",
1119
+ " prompt = (prompt + instr + \"\\n\") if instr is not None else prompt\n",
1120
+ " prompt = prompt + \"Here is the chat log: \\n\\n\" + context + \"\\n\"\n",
1121
+ "\n",
1122
+ " # Get results and optionally print\n",
1123
+ " messages[1] = HumanMessage(content=prompt)\n",
1124
+ " result = llm(messages)\n",
1125
+ "\n",
1126
+ " # Check if 'filename' exists in df\n",
1127
+ " if 'filename' in df:\n",
1128
+ " if print_results:\n",
1129
+ " print(f\"\\n\\nResult for file {df['filename'][0]}: \\n{result.content}\")\n",
1130
+ " else:\n",
1131
+ " if print_results:\n",
1132
+ " print(f\"\\n\\nResult for file: Unknown Filename \\n{result.content}\")\n",
1133
+ "\n",
1134
+ " return result\n",
1135
+ "\n",
1136
+ "def process_files(json_dfs, output_desc=None, grad_instructions=None, use_defaults = False, print_results=True):\n",
1137
+ " if use_defaults:\n",
1138
+ " output_desc = (\"Given the following chat log, create a table with the question number, the question content, answer, \"\n",
1139
+ " \"whether or not the student answered correctly on the first try, and the number of attempts it took to get the right answer. \")\n",
1140
+ " grad_instructions = (\"Then, calculate the quiz grade from the total number of assessment questions. \"\n",
1141
+ " \"Importantly, a point should only be granted if an answer was correct on the very first attempt. \"\n",
1142
+ " \"If an answer was not correct on the first attempt, even if it was correct in subsequent attempts, no point should be awarded for that question. \")\n",
1143
+ "\n",
1144
+ " results = [process_file(df, output_desc, grad_instructions, print_results) for df in json_dfs]\n",
1145
+ "\n",
1146
+ " return results"
1147
+ ]
1148
+ },
1149
+ {
1150
+ "cell_type": "code",
1151
+ "execution_count": 16,
1152
+ "metadata": {
1153
+ "colab": {
1154
+ "base_uri": "https://localhost:8080/",
1155
+ "height": 17
1156
+ },
1157
+ "id": "EhryP8utrR9D",
1158
+ "outputId": "51f4a60d-d6b7-4885-85c4-410c741ed651"
1159
+ },
1160
+ "outputs": [
1161
+ {
1162
+ "data": {
1163
+ "text/html": [
1164
+ "\n",
1165
+ " <style>\n",
1166
+ " pre {\n",
1167
+ " white-space: pre-wrap;\n",
1168
+ " }\n",
1169
+ " </style>\n",
1170
+ " "
1171
+ ],
1172
+ "text/plain": [
1173
+ "<IPython.core.display.HTML object>"
1174
+ ]
1175
+ },
1176
+ "metadata": {},
1177
+ "output_type": "display_data"
1178
+ }
1179
+ ],
1180
+ "source": [
1181
+ "def output_log_file(df_list, results_list, log_file='evaluation_log.txt'):\n",
1182
+ " \"\"\"\n",
1183
+ " Create a single log file containing evaluation results for all students.\n",
1184
+ "\n",
1185
+ " Parameters:\n",
1186
+ " df_list (list of pandas.DataFrame): List of DataFrames.\n",
1187
+ " results_list (list of ai_model_response): List of evaluation results.\n",
1188
+ " log_file (str): File name where the evaluation log will be saved. Default is 'evaluation_log.txt'.\n",
1189
+ "\n",
1190
+ " Returns:\n",
1191
+ " None\n",
1192
+ " \"\"\"\n",
1193
+ " with open(log_file, 'w') as log:\n",
1194
+ " for df, result in zip(df_list, results_list):\n",
1195
+ " log.write(f\"File: {df['filename'][0]}\\n\")\n",
1196
+ " log.write(result.content)\n",
1197
+ " log.write(\"\\n\\n\")"
1198
+ ]
1199
+ },
1200
+ {
1201
+ "cell_type": "markdown",
1202
+ "metadata": {
1203
+ "id": "lXQ45cJ1AztR"
1204
+ },
1205
+ "source": [
1206
+ "`pretty_print` makes dataframes look better when printed by substituting non-HTML with HTML for rendering."
1207
+ ]
1208
+ },
1209
+ {
1210
+ "cell_type": "code",
1211
+ "execution_count": 134,
1212
+ "metadata": {
1213
+ "colab": {
1214
+ "base_uri": "https://localhost:8080/",
1215
+ "height": 16
1216
+ },
1217
+ "id": "0te_RLOOYiG6",
1218
+ "outputId": "1dc53c98-5f68-4902-b377-7bba451395f0"
1219
+ },
1220
+ "outputs": [
1221
+ {
1222
+ "data": {
1223
+ "text/html": [
1224
+ "\n",
1225
+ " <style>\n",
1226
+ " pre {\n",
1227
+ " white-space: pre-wrap;\n",
1228
+ " }\n",
1229
+ " </style>\n",
1230
+ " "
1231
+ ],
1232
+ "text/plain": [
1233
+ "<IPython.core.display.HTML object>"
1234
+ ]
1235
+ },
1236
+ "metadata": {},
1237
+ "output_type": "display_data"
1238
+ },
1239
+ {
1240
+ "data": {
1241
+ "text/html": [
1242
+ "\n",
1243
+ " <style>\n",
1244
+ " pre {\n",
1245
+ " white-space: pre-wrap;\n",
1246
+ " }\n",
1247
+ " </style>\n",
1248
+ " "
1249
+ ],
1250
+ "text/plain": [
1251
+ "<IPython.core.display.HTML object>"
1252
+ ]
1253
+ },
1254
+ "metadata": {},
1255
+ "output_type": "display_data"
1256
+ }
1257
+ ],
1258
+ "source": [
1259
+ "def pretty_print(df):\n",
1260
+ " return display( HTML( df.to_html().replace(\"\\\\n\",\"<br>\") ) )"
1261
+ ]
1262
+ },
1263
+ {
1264
+ "cell_type": "markdown",
1265
+ "metadata": {
1266
+ "id": "I3rKk7lJYiG6"
1267
+ },
1268
+ "source": [
1269
+ "`save_as_csv` saves the dataframe as a CSV"
1270
+ ]
1271
+ },
1272
+ {
1273
+ "cell_type": "code",
1274
+ "execution_count": 135,
1275
+ "metadata": {
1276
+ "colab": {
1277
+ "base_uri": "https://localhost:8080/",
1278
+ "height": 16
1279
+ },
1280
+ "id": "DnrH2ldeYiG6",
1281
+ "outputId": "f1f6153b-49db-4145-c188-373685ffdcf4"
1282
+ },
1283
+ "outputs": [
1284
+ {
1285
+ "data": {
1286
+ "text/html": [
1287
+ "\n",
1288
+ " <style>\n",
1289
+ " pre {\n",
1290
+ " white-space: pre-wrap;\n",
1291
+ " }\n",
1292
+ " </style>\n",
1293
+ " "
1294
+ ],
1295
+ "text/plain": [
1296
+ "<IPython.core.display.HTML object>"
1297
+ ]
1298
+ },
1299
+ "metadata": {},
1300
+ "output_type": "display_data"
1301
+ },
1302
+ {
1303
+ "data": {
1304
+ "text/html": [
1305
+ "\n",
1306
+ " <style>\n",
1307
+ " pre {\n",
1308
+ " white-space: pre-wrap;\n",
1309
+ " }\n",
1310
+ " </style>\n",
1311
+ " "
1312
+ ],
1313
+ "text/plain": [
1314
+ "<IPython.core.display.HTML object>"
1315
+ ]
1316
+ },
1317
+ "metadata": {},
1318
+ "output_type": "display_data"
1319
+ }
1320
+ ],
1321
+ "source": [
1322
+ "def save_as_csv(df, file_name):\n",
1323
+ " df.to_csv(file_name, index=False)"
1324
+ ]
1325
+ },
1326
+ {
1327
+ "cell_type": "code",
1328
+ "execution_count": 136,
1329
+ "metadata": {
1330
+ "colab": {
1331
+ "base_uri": "https://localhost:8080/",
1332
+ "height": 16
1333
+ },
1334
+ "id": "Vgo_y8R8bzTE",
1335
+ "outputId": "f5effec2-8620-4c1d-be15-672b8cb3de21"
1336
+ },
1337
+ "outputs": [
1338
+ {
1339
+ "data": {
1340
+ "text/html": [
1341
+ "\n",
1342
+ " <style>\n",
1343
+ " pre {\n",
1344
+ " white-space: pre-wrap;\n",
1345
+ " }\n",
1346
+ " </style>\n",
1347
+ " "
1348
+ ],
1349
+ "text/plain": [
1350
+ "<IPython.core.display.HTML object>"
1351
+ ]
1352
+ },
1353
+ "metadata": {},
1354
+ "output_type": "display_data"
1355
+ },
1356
+ {
1357
+ "data": {
1358
+ "text/html": [
1359
+ "\n",
1360
+ " <style>\n",
1361
+ " pre {\n",
1362
+ " white-space: pre-wrap;\n",
1363
+ " }\n",
1364
+ " </style>\n",
1365
+ " "
1366
+ ],
1367
+ "text/plain": [
1368
+ "<IPython.core.display.HTML object>"
1369
+ ]
1370
+ },
1371
+ "metadata": {},
1372
+ "output_type": "display_data"
1373
+ }
1374
+ ],
1375
+ "source": [
1376
+ "def show_json_loading_errors(err_list):\n",
1377
+ " if err_list:\n",
1378
+ " print(\"The following files have the following errors upon loading and will NOT be processed:\", '\\n'.join(err_list))\n",
1379
+ " else:\n",
1380
+ " print(\"No errors found in uploaded zip JSON files.\")\n"
1381
+ ]
1382
+ },
1383
+ {
1384
+ "cell_type": "markdown",
1385
+ "metadata": {
1386
+ "id": "85h5oTysJkHs"
1387
+ },
1388
+ "source": [
1389
+ "## Final data preparation steps"
1390
+ ]
1391
+ },
1392
+ {
1393
+ "cell_type": "code",
1394
+ "execution_count": 137,
1395
+ "metadata": {
1396
+ "colab": {
1397
+ "base_uri": "https://localhost:8080/",
1398
+ "height": 16
1399
+ },
1400
+ "id": "Upah5_ygZRZx",
1401
+ "outputId": "639a0d70-6a93-4462-f65a-2e24f30643c0"
1402
+ },
1403
+ "outputs": [
1404
+ {
1405
+ "data": {
1406
+ "text/html": [
1407
+ "\n",
1408
+ " <style>\n",
1409
+ " pre {\n",
1410
+ " white-space: pre-wrap;\n",
1411
+ " }\n",
1412
+ " </style>\n",
1413
+ " "
1414
+ ],
1415
+ "text/plain": [
1416
+ "<IPython.core.display.HTML object>"
1417
+ ]
1418
+ },
1419
+ "metadata": {},
1420
+ "output_type": "display_data"
1421
+ },
1422
+ {
1423
+ "data": {
1424
+ "text/html": [
1425
+ "\n",
1426
+ " <style>\n",
1427
+ " pre {\n",
1428
+ " white-space: pre-wrap;\n",
1429
+ " }\n",
1430
+ " </style>\n",
1431
+ " "
1432
+ ],
1433
+ "text/plain": [
1434
+ "<IPython.core.display.HTML object>"
1435
+ ]
1436
+ },
1437
+ "metadata": {},
1438
+ "output_type": "display_data"
1439
+ }
1440
+ ],
1441
+ "source": [
1442
+ "#additional processing setup\n",
1443
+ "json_files = grade_settings['json_files']\n",
1444
+ "load_responses = [load_json_as_df(jf) for jf in json_files]\n",
1445
+ "\n",
1446
+ "#unzip to two separate lists\n",
1447
+ "all_json_dfs, errors_list = zip(*load_responses)\n",
1448
+ "\n",
1449
+ "# Remove failed JSONs\n",
1450
+ "all_json_dfs = [df for df in all_json_dfs if df is not None]\n",
1451
+ "\n",
1452
+ "# Update errors list to be individual strings\n",
1453
+ "errors_list = [' '.join(err) for err in errors_list if err is not None]"
1454
+ ]
1455
+ },
1456
+ {
1457
+ "cell_type": "markdown",
1458
+ "metadata": {
1459
+ "id": "P_H4uIfmAsr0"
1460
+ },
1461
+ "source": [
1462
+ "# AI-Assisted Evaluation\n",
1463
+ "Introduction and Instructions\n",
1464
+ "--------------------------------------------------\n",
1465
+ "The following example illustrates how you can specify important components of the prompts for sending to the llm. The `process_files` function will iterate over all of the submissions in your zip file, create dataframes of results (via instruction by setting `output_setup`), and also perform evaluation based on your instructions (via instruction by setting `grading_instructions`).\n",
1466
+ "\n",
1467
+ "Example functionality is demonstrated below."
1468
+ ]
1469
+ },
1470
+ {
1471
+ "cell_type": "code",
1472
+ "execution_count": 138,
1473
+ "metadata": {
1474
+ "colab": {
1475
+ "base_uri": "https://localhost:8080/",
1476
+ "height": 35
1477
+ },
1478
+ "id": "9zIPjG5lco3Z",
1479
+ "outputId": "cc9531c0-3939-4c9f-dc6c-9d2cfa0ea7d2"
1480
+ },
1481
+ "outputs": [
1482
+ {
1483
+ "data": {
1484
+ "text/html": [
1485
+ "\n",
1486
+ " <style>\n",
1487
+ " pre {\n",
1488
+ " white-space: pre-wrap;\n",
1489
+ " }\n",
1490
+ " </style>\n",
1491
+ " "
1492
+ ],
1493
+ "text/plain": [
1494
+ "<IPython.core.display.HTML object>"
1495
+ ]
1496
+ },
1497
+ "metadata": {},
1498
+ "output_type": "display_data"
1499
+ },
1500
+ {
1501
+ "data": {
1502
+ "text/html": [
1503
+ "\n",
1504
+ " <style>\n",
1505
+ " pre {\n",
1506
+ " white-space: pre-wrap;\n",
1507
+ " }\n",
1508
+ " </style>\n",
1509
+ " "
1510
+ ],
1511
+ "text/plain": [
1512
+ "<IPython.core.display.HTML object>"
1513
+ ]
1514
+ },
1515
+ "metadata": {},
1516
+ "output_type": "display_data"
1517
+ },
1518
+ {
1519
+ "name": "stdout",
1520
+ "output_type": "stream",
1521
+ "text": [
1522
+ "No errors found in uploaded zip JSON files.\n"
1523
+ ]
1524
+ }
1525
+ ],
1526
+ "source": [
1527
+ "# Print list of files with the incorrect format\n",
1528
+ "show_json_loading_errors(errors_list)"
1529
+ ]
1530
+ },
1531
+ {
1532
+ "cell_type": "code",
1533
+ "execution_count": 139,
1534
+ "metadata": {
1535
+ "colab": {
1536
+ "base_uri": "https://localhost:8080/",
1537
+ "height": 1000
1538
+ },
1539
+ "id": "utPzYUoKYiG9",
1540
+ "outputId": "eb3e1769-eb7a-4c93-96bc-6c998df55ef1"
1541
+ },
1542
+ "outputs": [
1543
+ {
1544
+ "data": {
1545
+ "text/html": [
1546
+ "\n",
1547
+ " <style>\n",
1548
+ " pre {\n",
1549
+ " white-space: pre-wrap;\n",
1550
+ " }\n",
1551
+ " </style>\n",
1552
+ " "
1553
+ ],
1554
+ "text/plain": [
1555
+ "<IPython.core.display.HTML object>"
1556
+ ]
1557
+ },
1558
+ "metadata": {},
1559
+ "output_type": "display_data"
1560
+ },
1561
+ {
1562
+ "data": {
1563
+ "text/html": [
1564
+ "\n",
1565
+ " <style>\n",
1566
+ " pre {\n",
1567
+ " white-space: pre-wrap;\n",
1568
+ " }\n",
1569
+ " </style>\n",
1570
+ " "
1571
+ ],
1572
+ "text/plain": [
1573
+ "<IPython.core.display.HTML object>"
1574
+ ]
1575
+ },
1576
+ "metadata": {},
1577
+ "output_type": "display_data"
1578
+ },
1579
+ {
1580
+ "name": "stdout",
1581
+ "output_type": "stream",
1582
+ "text": [
1583
+ "\n",
1584
+ "\n",
1585
+ "Result for file instructorTest/spencer-smith_jesse.json: \n",
1586
+ "Summary and feedback for student responses:\n",
1587
+ "\n",
1588
+ "Student 1:\n",
1589
+ "The student provided an excellent response to Question 1. They accurately explained the purpose of capitalizing expenses when incorporating them into the estimate of corporate earnings. They highlighted the importance of accurately reflecting the timing of costs and their related benefits, and how capitalizing expenses can impact a company's financial statements. The student also mentioned the matching principle of accounting and its role in ensuring the comparability and fairness of financial statements. Overall, the response is comprehensive and well-written. Well done!\n",
1590
+ "\n",
1591
+ "Student 2:\n",
1592
+ "The student gave a great response to Question 2. They correctly stated that expenses should be capitalized when they provide value beyond the current accounting period. The student also provided examples of capital expenses, such as the purchase price of a delivery truck or the cost of a building renovation. These examples demonstrate a clear understanding of the topic. The response is well-explained and shows a good grasp of the concept. Great job!\n",
1593
+ "\n",
1594
+ "Numeric summary:\n",
1595
+ "Both students provided correct answers to their respective questions, earning them a point each. Therefore, the numeric summary is as follows:\n",
1596
+ "Student 1: 1 point\n",
1597
+ "Student 2: 1 point\n",
1598
+ "\n",
1599
+ "\n",
1600
+ "Result for file instructorTest/bell_charreau.json: \n",
1601
+ "Summary:\n",
1602
+ "- The first student's response inaccurately states that capitalizing expenses is done to make the money 'look good' on the earnings report. The assistant provides a detailed explanation of the correct purpose of capitalizing expenses.\n",
1603
+ "- The second student's response partially identifies that capitalized expenses provide benefits for a longer period, but the assistant provides a more comprehensive explanation of what types of expenses should be capitalized and why they are treated differently from regular expenses.\n",
1604
+ "\n",
1605
+ "Feedback for the first student:\n",
1606
+ "The student accurately identified the purpose of capitalizing expenses, but their explanation was not entirely correct. They incorrectly stated that it is done to make the money 'look good' on the earnings report. The assistant provided a clear and detailed explanation of the correct purpose of capitalizing expenses and how it aligns with accounting principles.\n",
1607
+ "\n",
1608
+ "Feedback for the second student:\n",
1609
+ "The student partially identified the types of expenses that should be capitalized and why they are treated differently from regular expenses. However, their explanation was not comprehensive. The assistant provided a more detailed explanation of what types of expenses should be capitalized and why, as well as the concept of depreciation or amortization for spreading out the costs over time.\n",
1610
+ "\n",
1611
+ "Numeric Summary:\n",
1612
+ "The first student's response was partially correct, so they receive 0.5 points.\n",
1613
+ "The second student's response was also partially correct, so they receive 0.5 points.\n",
1614
+ "The total point count is 1.\n"
1615
+ ]
1616
+ }
1617
+ ],
1618
+ "source": [
1619
+ "# Example\n",
1620
+ "output_setup = (\"For each student response given in the following chat log, please generate a summary and detailed feedback for each students' responses,\"\n",
1621
+ " \", including what the student did well, and what was done poorly. \"\n",
1622
+ " \"Additionally, please filter feedback alphabetically by the name of the student from the filename.\")\n",
1623
+ "grading_instructions = (\"Then, calculate a numeric summary, summing up the point totals, \"\n",
1624
+ " \"in which a point is awarded for answering correctly. \")\n",
1625
+ "\n",
1626
+ "# Assuming `file_paths` is a list of file paths.\n",
1627
+ "processed_submissions = process_files(all_json_dfs, output_setup, grading_instructions, use_defaults=False, print_results=True)\n",
1628
+ "\n",
1629
+ "output_log_file(all_json_dfs, processed_submissions)"
1630
+ ]
1631
+ },
1632
+ {
1633
+ "cell_type": "markdown",
1634
+ "metadata": {
1635
+ "id": "Pc1myGweIgpo"
1636
+ },
1637
+ "source": [
1638
+ "## Instructor-Specified Evaluation\n",
1639
+ "Now, you can use the following code to create your settings. Change `output_setup` and `grading_instructions` as desired, making sure to keep the syntax (beginning and ending parentheses,and quotes at the beginning and end of each line) correct. `output_setup` has been copied from the previous cell, but you should fill in `grading_instructions`.\n",
1640
+ "\n",
1641
+ "### File Processing Options\n",
1642
+ "The `process_files` function has a number of settings.\n",
1643
+ "* The first setting must always be `all_json_dfs`, which contains the tabular representation of the json output.\n",
1644
+ "* The other settings should be set by name, and are:\n",
1645
+ " * **`output_desc`**: Shown as `output_setup` here, this contains the isntructions about how you want to the tabular representation to be set up. Note that you can also leave this off of the function list (just erase it and the following comma).\n",
1646
+ " * **`grad_instructions`**: Shown as `grading_instructions` here, use this variable to set grading instructions. Note that you can also leave this off of the function list (erase it and the following comma)\n",
1647
+ " * **`use_defaults`**: Some default grading and instruction prompts have already been created. If you set `use_defaults=TRUE`, both the grading instructions and the output table description will use the default prompts provided by the program, regardless of whether you have set values for `output_desc` or `grad_instructions`.\n",
1648
+ " * **`print_results`**: By default, the results will be printed for all students. However, if you don't want to see this output, you can set `print_results=False`.\n",
1649
+ "\n",
1650
+ "Again, make sure to observe the syntax. The defaults used in the program are shown in the above example."
1651
+ ]
1652
+ },
1653
+ {
1654
+ "cell_type": "code",
1655
+ "execution_count": 34,
1656
+ "metadata": {
1657
+ "colab": {
1658
+ "base_uri": "https://localhost:8080/",
1659
+ "height": 16
1660
+ },
1661
+ "id": "GiebKVlbYiG9",
1662
+ "outputId": "d4769d23-393c-4986-9418-5cef6944e6ab"
1663
+ },
1664
+ "outputs": [
1665
+ {
1666
+ "data": {
1667
+ "text/html": [
1668
+ "\n",
1669
+ " <style>\n",
1670
+ " pre {\n",
1671
+ " white-space: pre-wrap;\n",
1672
+ " }\n",
1673
+ " </style>\n",
1674
+ " "
1675
+ ],
1676
+ "text/plain": [
1677
+ "<IPython.core.display.HTML object>"
1678
+ ]
1679
+ },
1680
+ "metadata": {},
1681
+ "output_type": "display_data"
1682
+ }
1683
+ ],
1684
+ "source": [
1685
+ "output_setup = (\"For each student response given in the following chat log, please generate a summary and detailed feedback for each students' responses,\"\n",
1686
+ " \", including what the student did well, and what was done poorly. \")\n",
1687
+ "\n",
1688
+ "# add your own grading instructions\n",
1689
+ "grading_instructions = (\"INSERT ANY CUSTOM GRADING INSTRUCTIONS HERE\")\n",
1690
+ "\n",
1691
+ "# Assuming `file_paths` is a list of file paths.\n",
1692
+ "processed_submissions = process_files(all_json_dfs, output_setup, grading_instructions, use_defaults=False, print_results=True)\n",
1693
+ "\n",
1694
+ "output_log_file(all_json_dfs, processed_submissions)"
1695
+ ]
1696
+ },
1697
+ {
1698
+ "cell_type": "markdown",
1699
+ "metadata": {
1700
+ "id": "snLA6OZ83CrS"
1701
+ },
1702
+ "source": [
1703
+ "## Grading based on Blooms Taxonomy\n",
1704
+ "Another mechanism of evaluation is through Bloom's Taxonomy, where student responses will be evaluated based on where they fall on Bloom's Taxonomy. The higher the score with Bloom's Taxonomy, the more depth is illustrated by the question."
1705
+ ]
1706
+ },
1707
+ {
1708
+ "cell_type": "code",
1709
+ "execution_count": 140,
1710
+ "metadata": {
1711
+ "colab": {
1712
+ "base_uri": "https://localhost:8080/",
1713
+ "height": 625
1714
+ },
1715
+ "id": "HEPXCJdrYiG-",
1716
+ "outputId": "add813e2-6b7c-4772-dac5-5f756e893b8f"
1717
+ },
1718
+ "outputs": [
1719
+ {
1720
+ "data": {
1721
+ "text/html": [
1722
+ "\n",
1723
+ " <style>\n",
1724
+ " pre {\n",
1725
+ " white-space: pre-wrap;\n",
1726
+ " }\n",
1727
+ " </style>\n",
1728
+ " "
1729
+ ],
1730
+ "text/plain": [
1731
+ "<IPython.core.display.HTML object>"
1732
+ ]
1733
+ },
1734
+ "metadata": {},
1735
+ "output_type": "display_data"
1736
+ },
1737
+ {
1738
+ "data": {
1739
+ "text/html": [
1740
+ "\n",
1741
+ " <style>\n",
1742
+ " pre {\n",
1743
+ " white-space: pre-wrap;\n",
1744
+ " }\n",
1745
+ " </style>\n",
1746
+ " "
1747
+ ],
1748
+ "text/plain": [
1749
+ "<IPython.core.display.HTML object>"
1750
+ ]
1751
+ },
1752
+ "metadata": {},
1753
+ "output_type": "display_data"
1754
+ },
1755
+ {
1756
+ "data": {
1757
+ "text/html": [
1758
+ "\n",
1759
+ " <style>\n",
1760
+ " pre {\n",
1761
+ " white-space: pre-wrap;\n",
1762
+ " }\n",
1763
+ " </style>\n",
1764
+ " "
1765
+ ],
1766
+ "text/plain": [
1767
+ "<IPython.core.display.HTML object>"
1768
+ ]
1769
+ },
1770
+ "metadata": {},
1771
+ "output_type": "display_data"
1772
+ },
1773
+ {
1774
+ "data": {
1775
+ "text/html": [
1776
+ "\n",
1777
+ " <style>\n",
1778
+ " pre {\n",
1779
+ " white-space: pre-wrap;\n",
1780
+ " }\n",
1781
+ " </style>\n",
1782
+ " "
1783
+ ],
1784
+ "text/plain": [
1785
+ "<IPython.core.display.HTML object>"
1786
+ ]
1787
+ },
1788
+ "metadata": {},
1789
+ "output_type": "display_data"
1790
+ },
1791
+ {
1792
+ "data": {
1793
+ "text/html": [
1794
+ "\n",
1795
+ " <style>\n",
1796
+ " pre {\n",
1797
+ " white-space: pre-wrap;\n",
1798
+ " }\n",
1799
+ " </style>\n",
1800
+ " "
1801
+ ],
1802
+ "text/plain": [
1803
+ "<IPython.core.display.HTML object>"
1804
+ ]
1805
+ },
1806
+ "metadata": {},
1807
+ "output_type": "display_data"
1808
+ },
1809
+ {
1810
+ "name": "stdout",
1811
+ "output_type": "stream",
1812
+ "text": [
1813
+ "\n",
1814
+ "\n",
1815
+ "Result for file 0 instructorTest/bell_charreau.json \n",
1816
+ "0 instructorTest/spencer-smith_jesse.json\n",
1817
+ "Name: filename, dtype: object: \n",
1818
+ "Student 1:\n",
1819
+ "Summary: The student incorrectly states that capitalizing expenses is done to make the money look good on the earnings report.\n",
1820
+ "Feedback: The student's response is not accurate. They misunderstood the purpose of capitalizing expenses. The main purpose is to spread the cost of certain long-term assets over their useful life, not to make the money 'look good' on the earnings report. \n",
1821
+ "Overall Level of Engagement and Knowledge: 1 (Remember)\n",
1822
+ "\n",
1823
+ "Student 2:\n",
1824
+ "Summary: The student partially understands the purpose of capitalizing expenses, but their answer could be more comprehensive.\n",
1825
+ "Feedback: The student correctly notes that capitalized expenses provide benefits for a longer period and are different from regular expenses. However, their answer could be more comprehensive and provide a more thorough explanation of why certain expenses are capitalized and how they are treated differently.\n",
1826
+ "Overall Level of Engagement and Knowledge: 3 (Apply)\n",
1827
+ "\n",
1828
+ "Student 3:\n",
1829
+ "Summary: The student provides a comprehensive and accurate explanation of the purpose of capitalizing expenses.\n",
1830
+ "Feedback: The student's response is excellent. They provide a comprehensive explanation of why capitalizing expenses is crucial for incorporating costs into the estimate of corporate earnings. They correctly highlight the timing of costs and their related benefits, the impact on financial statements, and the alignment with the matching principle of accounting.\n",
1831
+ "Overall Level of Engagement and Knowledge: 6 (Create)\n",
1832
+ "\n",
1833
+ "Student 4:\n",
1834
+ "Summary: The student accurately states that expenses should be capitalized when they provide value beyond the current accounting period and provides relevant examples.\n",
1835
+ "Feedback: The student's response is great. They correctly identify the types of expenses that should be capitalized and provide relevant examples. They demonstrate a clear understanding of the concept and provide a well-explained answer.\n",
1836
+ "Overall Level of Engagement and Knowledge: 5 (Evaluate)\n"
1837
+ ]
1838
+ }
1839
+ ],
1840
+ "source": [
1841
+ "output_setup = (\"For each student response given in the following chat log, please generate a summary and detailed feedback for each students' responses,\"\n",
1842
+ " \", including what the student did well, and what was done poorly. \")\n",
1843
+ "grading_instructions = \"\"\"\\nEvaluate the each student's overall level or engagement and knowledge, based on bloom's taxonomy using their responses.\n",
1844
+ "Bloom's taxonomy is rated on a 1-6 point system, with 1 being remember (recall facts and basic concepts), 2 being understand (explain ideas or concepts),\n",
1845
+ "3 being apply (use information in new situations), 4 being analyze (draw connections among ideas), 5 being evaluate (justify a stand or decision),\n",
1846
+ "and 6 being create (produce new or original work). Assign the interaction a score from 1-6, where 1 = remember, 2 = understand, 3 = apply, 4 = analyze,\n",
1847
+ "5 = evaluate, and 6 = create.\"\"\"\n",
1848
+ "\n",
1849
+ "# Assuming `file_paths` is a list of file paths.\n",
1850
+ "processed_submissions = process_files(all_json_dfs, output_setup, grading_instructions, use_defaults=False, print_results=True)\n",
1851
+ "\n",
1852
+ "output_log_file(all_json_dfs, processed_submissions)"
1853
+ ]
1854
+ },
1855
+ {
1856
+ "cell_type": "markdown",
1857
+ "metadata": {
1858
+ "id": "FI5-vnUvXM03"
1859
+ },
1860
+ "source": [
1861
+ "# Returning Results\n"
1862
+ ]
1863
+ },
1864
+ {
1865
+ "cell_type": "markdown",
1866
+ "metadata": {
1867
+ "id": "LgoGt82CYiG-"
1868
+ },
1869
+ "source": [
1870
+ "**Extract Student Responses ONLY from CHAT JSON**\n",
1871
+ "\n",
1872
+ "Below are relevant user components of dataframes, including the conversion from the original json, the interaction labeled dataframe, and the output dataframe. Check to make sure they make sense."
1873
+ ]
1874
+ },
1875
+ {
1876
+ "cell_type": "code",
1877
+ "execution_count": null,
1878
+ "metadata": {
1879
+ "colab": {
1880
+ "base_uri": "https://localhost:8080/",
1881
+ "height": 16
1882
+ },
1883
+ "id": "HVq9i_mXYiG-",
1884
+ "outputId": "fb5251e9-a327-4dfa-e294-03f5c58f6d35"
1885
+ },
1886
+ "outputs": [
1887
+ {
1888
+ "data": {
1889
+ "text/html": [
1890
+ "\n",
1891
+ " <style>\n",
1892
+ " pre {\n",
1893
+ " white-space: pre-wrap;\n",
1894
+ " }\n",
1895
+ " </style>\n",
1896
+ " "
1897
+ ],
1898
+ "text/plain": [
1899
+ "<IPython.core.display.HTML object>"
1900
+ ]
1901
+ },
1902
+ "metadata": {},
1903
+ "output_type": "display_data"
1904
+ },
1905
+ {
1906
+ "data": {
1907
+ "text/html": [
1908
+ "\n",
1909
+ " <style>\n",
1910
+ " pre {\n",
1911
+ " white-space: pre-wrap;\n",
1912
+ " }\n",
1913
+ " </style>\n",
1914
+ " "
1915
+ ],
1916
+ "text/plain": [
1917
+ "<IPython.core.display.HTML object>"
1918
+ ]
1919
+ },
1920
+ "metadata": {},
1921
+ "output_type": "display_data"
1922
+ },
1923
+ {
1924
+ "data": {
1925
+ "text/html": [
1926
+ "\n",
1927
+ " <style>\n",
1928
+ " pre {\n",
1929
+ " white-space: pre-wrap;\n",
1930
+ " }\n",
1931
+ " </style>\n",
1932
+ " "
1933
+ ],
1934
+ "text/plain": [
1935
+ "<IPython.core.display.HTML object>"
1936
+ ]
1937
+ },
1938
+ "metadata": {},
1939
+ "output_type": "display_data"
1940
+ },
1941
+ {
1942
+ "data": {
1943
+ "text/html": [
1944
+ "\n",
1945
+ " <style>\n",
1946
+ " pre {\n",
1947
+ " white-space: pre-wrap;\n",
1948
+ " }\n",
1949
+ " </style>\n",
1950
+ " "
1951
+ ],
1952
+ "text/plain": [
1953
+ "<IPython.core.display.HTML object>"
1954
+ ]
1955
+ },
1956
+ "metadata": {},
1957
+ "output_type": "display_data"
1958
+ },
1959
+ {
1960
+ "data": {
1961
+ "text/html": [
1962
+ "\n",
1963
+ " <style>\n",
1964
+ " pre {\n",
1965
+ " white-space: pre-wrap;\n",
1966
+ " }\n",
1967
+ " </style>\n",
1968
+ " "
1969
+ ],
1970
+ "text/plain": [
1971
+ "<IPython.core.display.HTML object>"
1972
+ ]
1973
+ },
1974
+ "metadata": {},
1975
+ "output_type": "display_data"
1976
+ }
1977
+ ],
1978
+ "source": [
1979
+ "def write_responses_to_csv(json_dfs):\n",
1980
+ " # Concatenate all dataframes in json_dfs into one large dataframe\n",
1981
+ " df = pd.concat(json_dfs)\n",
1982
+ "\n",
1983
+ " # Write the dataframe to a CSV\n",
1984
+ " df.to_csv('all_student_responses.csv', index=False)\n",
1985
+ "\n",
1986
+ "write_responses_to_csv(all_json_dfs)"
1987
+ ]
1988
+ },
1989
+ {
1990
+ "cell_type": "markdown",
1991
+ "metadata": {
1992
+ "id": "1WIGxKmDYiG-"
1993
+ },
1994
+ "source": [
1995
+ "**Saving/Downloading AI-Assisted Student Evaluation from Chat JSON**\n",
1996
+ "\n",
1997
+ "Execute the following cell to have all of your students' data returned in a single CSV file."
1998
+ ]
1999
+ },
2000
+ {
2001
+ "cell_type": "code",
2002
+ "execution_count": null,
2003
+ "metadata": {
2004
+ "colab": {
2005
+ "base_uri": "https://localhost:8080/",
2006
+ "height": 16
2007
+ },
2008
+ "id": "QnWNEeqjYiG-",
2009
+ "outputId": "e98c0c39-8449-45d4-f93a-d9394f6781bf"
2010
+ },
2011
+ "outputs": [
2012
+ {
2013
+ "data": {
2014
+ "text/html": [
2015
+ "\n",
2016
+ " <style>\n",
2017
+ " pre {\n",
2018
+ " white-space: pre-wrap;\n",
2019
+ " }\n",
2020
+ " </style>\n",
2021
+ " "
2022
+ ],
2023
+ "text/plain": [
2024
+ "<IPython.core.display.HTML object>"
2025
+ ]
2026
+ },
2027
+ "metadata": {},
2028
+ "output_type": "display_data"
2029
+ },
2030
+ {
2031
+ "data": {
2032
+ "text/html": [
2033
+ "\n",
2034
+ " <style>\n",
2035
+ " pre {\n",
2036
+ " white-space: pre-wrap;\n",
2037
+ " }\n",
2038
+ " </style>\n",
2039
+ " "
2040
+ ],
2041
+ "text/plain": [
2042
+ "<IPython.core.display.HTML object>"
2043
+ ]
2044
+ },
2045
+ "metadata": {},
2046
+ "output_type": "display_data"
2047
+ },
2048
+ {
2049
+ "data": {
2050
+ "text/html": [
2051
+ "\n",
2052
+ " <style>\n",
2053
+ " pre {\n",
2054
+ " white-space: pre-wrap;\n",
2055
+ " }\n",
2056
+ " </style>\n",
2057
+ " "
2058
+ ],
2059
+ "text/plain": [
2060
+ "<IPython.core.display.HTML object>"
2061
+ ]
2062
+ },
2063
+ "metadata": {},
2064
+ "output_type": "display_data"
2065
+ },
2066
+ {
2067
+ "data": {
2068
+ "text/html": [
2069
+ "\n",
2070
+ " <style>\n",
2071
+ " pre {\n",
2072
+ " white-space: pre-wrap;\n",
2073
+ " }\n",
2074
+ " </style>\n",
2075
+ " "
2076
+ ],
2077
+ "text/plain": [
2078
+ "<IPython.core.display.HTML object>"
2079
+ ]
2080
+ },
2081
+ "metadata": {},
2082
+ "output_type": "display_data"
2083
+ },
2084
+ {
2085
+ "data": {
2086
+ "text/html": [
2087
+ "\n",
2088
+ " <style>\n",
2089
+ " pre {\n",
2090
+ " white-space: pre-wrap;\n",
2091
+ " }\n",
2092
+ " </style>\n",
2093
+ " "
2094
+ ],
2095
+ "text/plain": [
2096
+ "<IPython.core.display.HTML object>"
2097
+ ]
2098
+ },
2099
+ "metadata": {},
2100
+ "output_type": "display_data"
2101
+ }
2102
+ ],
2103
+ "source": [
2104
+ "# Start with an empty dataframe\n",
2105
+ "all_results_df = pd.DataFrame()\n",
2106
+ "\n",
2107
+ "for result in processed_submissions:\n",
2108
+ "\n",
2109
+ " # Append the data from the current file to the master dataframe\n",
2110
+ " all_results_df = pd.concat([all_results_df, df])\n",
2111
+ "\n",
2112
+ "# Now all_results_df contains data from all the files\n",
2113
+ "\n",
2114
+ "# Write all results to a single CSV\n",
2115
+ "all_results_df.to_csv('all_results.csv', index=False)"
2116
+ ]
2117
+ }
2118
+ ],
2119
+ "metadata": {
2120
+ "colab": {
2121
+ "include_colab_link": true,
2122
+ "provenance": []
2123
+ },
2124
+ "kernelspec": {
2125
+ "display_name": "Python 3",
2126
+ "name": "python3"
2127
+ },
2128
+ "language_info": {
2129
+ "name": "python",
2130
+ "version": "3.10.6"
2131
+ },
2132
+ "widgets": {
2133
+ "application/vnd.jupyter.widget-state+json": {
2134
+ "1072a8a142f64dfd96ee528a2e9d1595": {
2135
+ "model_module": "@jupyter-widgets/controls",
2136
+ "model_module_version": "1.5.0",
2137
+ "model_name": "LabelModel",
2138
+ "state": {
2139
+ "_dom_classes": [],
2140
+ "_model_module": "@jupyter-widgets/controls",
2141
+ "_model_module_version": "1.5.0",
2142
+ "_model_name": "LabelModel",
2143
+ "_view_count": null,
2144
+ "_view_module": "@jupyter-widgets/controls",
2145
+ "_view_module_version": "1.5.0",
2146
+ "_view_name": "LabelView",
2147
+ "description": "",
2148
+ "description_tooltip": null,
2149
+ "layout": "IPY_MODEL_abbecdc637694e7cb026e003244e7037",
2150
+ "placeholder": "​",
2151
+ "style": "IPY_MODEL_7f814595d31e4b86992b5bd6bc85ced4",
2152
+ "value": "Upload User Files:\t"
2153
+ }
2154
+ },
2155
+ "1b7ee0de15484cd5aecd6d8ca3b6ee9d": {
2156
+ "model_module": "@jupyter-widgets/controls",
2157
+ "model_module_version": "1.5.0",
2158
+ "model_name": "VBoxModel",
2159
+ "state": {
2160
+ "_dom_classes": [],
2161
+ "_model_module": "@jupyter-widgets/controls",
2162
+ "_model_module_version": "1.5.0",
2163
+ "_model_name": "VBoxModel",
2164
+ "_view_count": null,
2165
+ "_view_module": "@jupyter-widgets/controls",
2166
+ "_view_module_version": "1.5.0",
2167
+ "_view_name": "VBoxView",
2168
+ "box_style": "",
2169
+ "children": [
2170
+ "IPY_MODEL_b051a90758434644955747bc02d00bab",
2171
+ "IPY_MODEL_dde20647d3594d31b66b19659f53a95e",
2172
+ "IPY_MODEL_8610fffd2d2a4ec28f8c874c06073ce7",
2173
+ "IPY_MODEL_654ab6d155eb457ea5c719a9ac27ad5b",
2174
+ "IPY_MODEL_86cb4f568f454ff8832face502fb0745"
2175
+ ],
2176
+ "layout": "IPY_MODEL_e30fe87f01bc4580a61713b5b72439a2"
2177
+ }
2178
+ },
2179
+ "252b8009f3734ed2908049ebb40c0247": {
2180
+ "model_module": "@jupyter-widgets/controls",
2181
+ "model_module_version": "1.5.0",
2182
+ "model_name": "TextareaModel",
2183
+ "state": {
2184
+ "_dom_classes": [],
2185
+ "_model_module": "@jupyter-widgets/controls",
2186
+ "_model_module_version": "1.5.0",
2187
+ "_model_name": "TextareaModel",
2188
+ "_view_count": null,
2189
+ "_view_module": "@jupyter-widgets/controls",
2190
+ "_view_module_version": "1.5.0",
2191
+ "_view_name": "TextareaView",
2192
+ "continuous_update": true,
2193
+ "description": "Learning Objectives",
2194
+ "description_tooltip": null,
2195
+ "disabled": false,
2196
+ "layout": "IPY_MODEL_b74cf92175374028948d4cf529d4d1e6",
2197
+ "placeholder": "Learning objectives: 1. Understand and implement classes in object-oriented programming",
2198
+ "rows": null,
2199
+ "style": "IPY_MODEL_f7d75b0a32554a9589c513336fc30095",
2200
+ "value": ""
2201
+ }
2202
+ },
2203
+ "26d13984d45745858d3b890bc7f18a90": {
2204
+ "model_module": "@jupyter-widgets/controls",
2205
+ "model_module_version": "1.5.0",
2206
+ "model_name": "ButtonStyleModel",
2207
+ "state": {
2208
+ "_model_module": "@jupyter-widgets/controls",
2209
+ "_model_module_version": "1.5.0",
2210
+ "_model_name": "ButtonStyleModel",
2211
+ "_view_count": null,
2212
+ "_view_module": "@jupyter-widgets/base",
2213
+ "_view_module_version": "1.2.0",
2214
+ "_view_name": "StyleView",
2215
+ "button_color": null,
2216
+ "font_weight": ""
2217
+ }
2218
+ },
2219
+ "453b12da4b6540cd9e4e57f73a4d670c": {
2220
+ "model_module": "@jupyter-widgets/controls",
2221
+ "model_module_version": "1.5.0",
2222
+ "model_name": "DescriptionStyleModel",
2223
+ "state": {
2224
+ "_model_module": "@jupyter-widgets/controls",
2225
+ "_model_module_version": "1.5.0",
2226
+ "_model_name": "DescriptionStyleModel",
2227
+ "_view_count": null,
2228
+ "_view_module": "@jupyter-widgets/base",
2229
+ "_view_module_version": "1.2.0",
2230
+ "_view_name": "StyleView",
2231
+ "description_width": ""
2232
+ }
2233
+ },
2234
+ "49f80567705147f0b82d45b7f06dd1ba": {
2235
+ "model_module": "@jupyter-widgets/controls",
2236
+ "model_module_version": "1.5.0",
2237
+ "model_name": "FileUploadModel",
2238
+ "state": {
2239
+ "_counter": 1,
2240
+ "_dom_classes": [],
2241
+ "_model_module": "@jupyter-widgets/controls",
2242
+ "_model_module_version": "1.5.0",
2243
+ "_model_name": "FileUploadModel",
2244
+ "_view_count": null,
2245
+ "_view_module": "@jupyter-widgets/controls",
2246
+ "_view_module_version": "1.5.0",
2247
+ "_view_name": "FileUploadView",
2248
+ "accept": ".zip",
2249
+ "button_style": "",
2250
+ "data": [
2251
+ null
2252
+ ],
2253
+ "description": "Upload",
2254
+ "description_tooltip": null,
2255
+ "disabled": false,
2256
+ "error": "",
2257
+ "icon": "upload",
2258
+ "layout": "IPY_MODEL_dfa8d6c7d70b42468cbda035de89404c",
2259
+ "metadata": [
2260
+ {
2261
+ "lastModified": 1689919477171,
2262
+ "name": "instructorTest.zip",
2263
+ "size": 4958,
2264
+ "type": "application/zip"
2265
+ }
2266
+ ],
2267
+ "multiple": false,
2268
+ "style": "IPY_MODEL_26d13984d45745858d3b890bc7f18a90"
2269
+ }
2270
+ },
2271
+ "53722998fbe64a7c94829b79e8cd69d6": {
2272
+ "model_module": "@jupyter-widgets/base",
2273
+ "model_module_version": "1.2.0",
2274
+ "model_name": "LayoutModel",
2275
+ "state": {
2276
+ "_model_module": "@jupyter-widgets/base",
2277
+ "_model_module_version": "1.2.0",
2278
+ "_model_name": "LayoutModel",
2279
+ "_view_count": null,
2280
+ "_view_module": "@jupyter-widgets/base",
2281
+ "_view_module_version": "1.2.0",
2282
+ "_view_name": "LayoutView",
2283
+ "align_content": null,
2284
+ "align_items": null,
2285
+ "align_self": null,
2286
+ "border": null,
2287
+ "bottom": null,
2288
+ "display": null,
2289
+ "flex": null,
2290
+ "flex_flow": null,
2291
+ "grid_area": null,
2292
+ "grid_auto_columns": null,
2293
+ "grid_auto_flow": null,
2294
+ "grid_auto_rows": null,
2295
+ "grid_column": null,
2296
+ "grid_gap": null,
2297
+ "grid_row": null,
2298
+ "grid_template_areas": null,
2299
+ "grid_template_columns": null,
2300
+ "grid_template_rows": null,
2301
+ "height": null,
2302
+ "justify_content": null,
2303
+ "justify_items": null,
2304
+ "left": null,
2305
+ "margin": null,
2306
+ "max_height": null,
2307
+ "max_width": null,
2308
+ "min_height": null,
2309
+ "min_width": null,
2310
+ "object_fit": null,
2311
+ "object_position": null,
2312
+ "order": null,
2313
+ "overflow": null,
2314
+ "overflow_x": null,
2315
+ "overflow_y": null,
2316
+ "padding": null,
2317
+ "right": null,
2318
+ "top": null,
2319
+ "visibility": null,
2320
+ "width": null
2321
+ }
2322
+ },
2323
+ "54e3918921f44fb4a9020beab951fcdf": {
2324
+ "model_module": "@jupyter-widgets/controls",
2325
+ "model_module_version": "1.5.0",
2326
+ "model_name": "DescriptionStyleModel",
2327
+ "state": {
2328
+ "_model_module": "@jupyter-widgets/controls",
2329
+ "_model_module_version": "1.5.0",
2330
+ "_model_name": "DescriptionStyleModel",
2331
+ "_view_count": null,
2332
+ "_view_module": "@jupyter-widgets/base",
2333
+ "_view_module_version": "1.2.0",
2334
+ "_view_name": "StyleView",
2335
+ "description_width": "initial"
2336
+ }
2337
+ },
2338
+ "5a17f4509d194105b23dd616e45183d5": {
2339
+ "model_module": "@jupyter-widgets/base",
2340
+ "model_module_version": "1.2.0",
2341
+ "model_name": "LayoutModel",
2342
+ "state": {
2343
+ "_model_module": "@jupyter-widgets/base",
2344
+ "_model_module_version": "1.2.0",
2345
+ "_model_name": "LayoutModel",
2346
+ "_view_count": null,
2347
+ "_view_module": "@jupyter-widgets/base",
2348
+ "_view_module_version": "1.2.0",
2349
+ "_view_name": "LayoutView",
2350
+ "align_content": null,
2351
+ "align_items": null,
2352
+ "align_self": null,
2353
+ "border": null,
2354
+ "bottom": null,
2355
+ "display": null,
2356
+ "flex": null,
2357
+ "flex_flow": null,
2358
+ "grid_area": null,
2359
+ "grid_auto_columns": null,
2360
+ "grid_auto_flow": null,
2361
+ "grid_auto_rows": null,
2362
+ "grid_column": null,
2363
+ "grid_gap": null,
2364
+ "grid_row": null,
2365
+ "grid_template_areas": null,
2366
+ "grid_template_columns": null,
2367
+ "grid_template_rows": null,
2368
+ "height": null,
2369
+ "justify_content": null,
2370
+ "justify_items": null,
2371
+ "left": null,
2372
+ "margin": null,
2373
+ "max_height": null,
2374
+ "max_width": null,
2375
+ "min_height": null,
2376
+ "min_width": null,
2377
+ "object_fit": null,
2378
+ "object_position": null,
2379
+ "order": null,
2380
+ "overflow": null,
2381
+ "overflow_x": null,
2382
+ "overflow_y": null,
2383
+ "padding": null,
2384
+ "right": null,
2385
+ "top": null,
2386
+ "visibility": null,
2387
+ "width": null
2388
+ }
2389
+ },
2390
+ "60b80d550efa403a825a3cb913c26f53": {
2391
+ "model_module": "@jupyter-widgets/base",
2392
+ "model_module_version": "1.2.0",
2393
+ "model_name": "LayoutModel",
2394
+ "state": {
2395
+ "_model_module": "@jupyter-widgets/base",
2396
+ "_model_module_version": "1.2.0",
2397
+ "_model_name": "LayoutModel",
2398
+ "_view_count": null,
2399
+ "_view_module": "@jupyter-widgets/base",
2400
+ "_view_module_version": "1.2.0",
2401
+ "_view_name": "LayoutView",
2402
+ "align_content": null,
2403
+ "align_items": null,
2404
+ "align_self": null,
2405
+ "border": null,
2406
+ "bottom": null,
2407
+ "display": null,
2408
+ "flex": null,
2409
+ "flex_flow": null,
2410
+ "grid_area": null,
2411
+ "grid_auto_columns": null,
2412
+ "grid_auto_flow": null,
2413
+ "grid_auto_rows": null,
2414
+ "grid_column": null,
2415
+ "grid_gap": null,
2416
+ "grid_row": null,
2417
+ "grid_template_areas": null,
2418
+ "grid_template_columns": null,
2419
+ "grid_template_rows": null,
2420
+ "height": null,
2421
+ "justify_content": null,
2422
+ "justify_items": null,
2423
+ "left": null,
2424
+ "margin": null,
2425
+ "max_height": null,
2426
+ "max_width": null,
2427
+ "min_height": null,
2428
+ "min_width": null,
2429
+ "object_fit": null,
2430
+ "object_position": null,
2431
+ "order": null,
2432
+ "overflow": null,
2433
+ "overflow_x": null,
2434
+ "overflow_y": null,
2435
+ "padding": null,
2436
+ "right": null,
2437
+ "top": null,
2438
+ "visibility": null,
2439
+ "width": null
2440
+ }
2441
+ },
2442
+ "654ab6d155eb457ea5c719a9ac27ad5b": {
2443
+ "model_module": "@jupyter-widgets/controls",
2444
+ "model_module_version": "1.5.0",
2445
+ "model_name": "ButtonModel",
2446
+ "state": {
2447
+ "_dom_classes": [],
2448
+ "_model_module": "@jupyter-widgets/controls",
2449
+ "_model_module_version": "1.5.0",
2450
+ "_model_name": "ButtonModel",
2451
+ "_view_count": null,
2452
+ "_view_module": "@jupyter-widgets/controls",
2453
+ "_view_module_version": "1.5.0",
2454
+ "_view_name": "ButtonView",
2455
+ "button_style": "success",
2456
+ "description": "Submit",
2457
+ "disabled": false,
2458
+ "icon": "check",
2459
+ "layout": "IPY_MODEL_81c4dda35a7d4e15821bb4bc0973354e",
2460
+ "style": "IPY_MODEL_df1c46361f714aceb9c046f98fede40c",
2461
+ "tooltip": ""
2462
+ }
2463
+ },
2464
+ "6622f76f91f44527a87a7575bbd388d2": {
2465
+ "model_module": "@jupyter-widgets/controls",
2466
+ "model_module_version": "1.5.0",
2467
+ "model_name": "HBoxModel",
2468
+ "state": {
2469
+ "_dom_classes": [],
2470
+ "_model_module": "@jupyter-widgets/controls",
2471
+ "_model_module_version": "1.5.0",
2472
+ "_model_name": "HBoxModel",
2473
+ "_view_count": null,
2474
+ "_view_module": "@jupyter-widgets/controls",
2475
+ "_view_module_version": "1.5.0",
2476
+ "_view_name": "HBoxView",
2477
+ "box_style": "",
2478
+ "children": [
2479
+ "IPY_MODEL_7f7164e80a464ba9b99f96c10132db25",
2480
+ "IPY_MODEL_49f80567705147f0b82d45b7f06dd1ba"
2481
+ ],
2482
+ "layout": "IPY_MODEL_5a17f4509d194105b23dd616e45183d5"
2483
+ }
2484
+ },
2485
+ "67b4083cd4234f52bb7cca27ab9cddb3": {
2486
+ "model_module": "@jupyter-widgets/controls",
2487
+ "model_module_version": "1.5.0",
2488
+ "model_name": "FileUploadModel",
2489
+ "state": {
2490
+ "_counter": 0,
2491
+ "_dom_classes": [],
2492
+ "_model_module": "@jupyter-widgets/controls",
2493
+ "_model_module_version": "1.5.0",
2494
+ "_model_name": "FileUploadModel",
2495
+ "_view_count": null,
2496
+ "_view_module": "@jupyter-widgets/controls",
2497
+ "_view_module_version": "1.5.0",
2498
+ "_view_name": "FileUploadView",
2499
+ "accept": ".zip",
2500
+ "button_style": "",
2501
+ "data": [],
2502
+ "description": "Upload",
2503
+ "description_tooltip": null,
2504
+ "disabled": false,
2505
+ "error": "",
2506
+ "icon": "upload",
2507
+ "layout": "IPY_MODEL_76548751bb9c4bcb9d4f39788ea7d4af",
2508
+ "metadata": [],
2509
+ "multiple": false,
2510
+ "style": "IPY_MODEL_dbb88901f5084d49af208b91b52b6073"
2511
+ }
2512
+ },
2513
+ "76548751bb9c4bcb9d4f39788ea7d4af": {
2514
+ "model_module": "@jupyter-widgets/base",
2515
+ "model_module_version": "1.2.0",
2516
+ "model_name": "LayoutModel",
2517
+ "state": {
2518
+ "_model_module": "@jupyter-widgets/base",
2519
+ "_model_module_version": "1.2.0",
2520
+ "_model_name": "LayoutModel",
2521
+ "_view_count": null,
2522
+ "_view_module": "@jupyter-widgets/base",
2523
+ "_view_module_version": "1.2.0",
2524
+ "_view_name": "LayoutView",
2525
+ "align_content": null,
2526
+ "align_items": null,
2527
+ "align_self": null,
2528
+ "border": null,
2529
+ "bottom": null,
2530
+ "display": null,
2531
+ "flex": null,
2532
+ "flex_flow": null,
2533
+ "grid_area": null,
2534
+ "grid_auto_columns": null,
2535
+ "grid_auto_flow": null,
2536
+ "grid_auto_rows": null,
2537
+ "grid_column": null,
2538
+ "grid_gap": null,
2539
+ "grid_row": null,
2540
+ "grid_template_areas": null,
2541
+ "grid_template_columns": null,
2542
+ "grid_template_rows": null,
2543
+ "height": null,
2544
+ "justify_content": null,
2545
+ "justify_items": null,
2546
+ "left": null,
2547
+ "margin": null,
2548
+ "max_height": null,
2549
+ "max_width": null,
2550
+ "min_height": null,
2551
+ "min_width": null,
2552
+ "object_fit": null,
2553
+ "object_position": null,
2554
+ "order": null,
2555
+ "overflow": null,
2556
+ "overflow_x": null,
2557
+ "overflow_y": null,
2558
+ "padding": null,
2559
+ "right": null,
2560
+ "top": null,
2561
+ "visibility": null,
2562
+ "width": null
2563
+ }
2564
+ },
2565
+ "7f7164e80a464ba9b99f96c10132db25": {
2566
+ "model_module": "@jupyter-widgets/controls",
2567
+ "model_module_version": "1.5.0",
2568
+ "model_name": "LabelModel",
2569
+ "state": {
2570
+ "_dom_classes": [],
2571
+ "_model_module": "@jupyter-widgets/controls",
2572
+ "_model_module_version": "1.5.0",
2573
+ "_model_name": "LabelModel",
2574
+ "_view_count": null,
2575
+ "_view_module": "@jupyter-widgets/controls",
2576
+ "_view_module_version": "1.5.0",
2577
+ "_view_name": "LabelView",
2578
+ "description": "",
2579
+ "description_tooltip": null,
2580
+ "layout": "IPY_MODEL_60b80d550efa403a825a3cb913c26f53",
2581
+ "placeholder": "​",
2582
+ "style": "IPY_MODEL_d0bd0e3f12594ff1a51365b65a3fcc43",
2583
+ "value": "Upload User Files:\t"
2584
+ }
2585
+ },
2586
+ "7f814595d31e4b86992b5bd6bc85ced4": {
2587
+ "model_module": "@jupyter-widgets/controls",
2588
+ "model_module_version": "1.5.0",
2589
+ "model_name": "DescriptionStyleModel",
2590
+ "state": {
2591
+ "_model_module": "@jupyter-widgets/controls",
2592
+ "_model_module_version": "1.5.0",
2593
+ "_model_name": "DescriptionStyleModel",
2594
+ "_view_count": null,
2595
+ "_view_module": "@jupyter-widgets/base",
2596
+ "_view_module_version": "1.2.0",
2597
+ "_view_name": "StyleView",
2598
+ "description_width": ""
2599
+ }
2600
+ },
2601
+ "81c4dda35a7d4e15821bb4bc0973354e": {
2602
+ "model_module": "@jupyter-widgets/base",
2603
+ "model_module_version": "1.2.0",
2604
+ "model_name": "LayoutModel",
2605
+ "state": {
2606
+ "_model_module": "@jupyter-widgets/base",
2607
+ "_model_module_version": "1.2.0",
2608
+ "_model_name": "LayoutModel",
2609
+ "_view_count": null,
2610
+ "_view_module": "@jupyter-widgets/base",
2611
+ "_view_module_version": "1.2.0",
2612
+ "_view_name": "LayoutView",
2613
+ "align_content": null,
2614
+ "align_items": null,
2615
+ "align_self": null,
2616
+ "border": null,
2617
+ "bottom": null,
2618
+ "display": null,
2619
+ "flex": null,
2620
+ "flex_flow": null,
2621
+ "grid_area": null,
2622
+ "grid_auto_columns": null,
2623
+ "grid_auto_flow": null,
2624
+ "grid_auto_rows": null,
2625
+ "grid_column": null,
2626
+ "grid_gap": null,
2627
+ "grid_row": null,
2628
+ "grid_template_areas": null,
2629
+ "grid_template_columns": null,
2630
+ "grid_template_rows": null,
2631
+ "height": null,
2632
+ "justify_content": null,
2633
+ "justify_items": null,
2634
+ "left": null,
2635
+ "margin": null,
2636
+ "max_height": null,
2637
+ "max_width": null,
2638
+ "min_height": null,
2639
+ "min_width": null,
2640
+ "object_fit": null,
2641
+ "object_position": null,
2642
+ "order": null,
2643
+ "overflow": null,
2644
+ "overflow_x": null,
2645
+ "overflow_y": null,
2646
+ "padding": null,
2647
+ "right": null,
2648
+ "top": null,
2649
+ "visibility": null,
2650
+ "width": null
2651
+ }
2652
+ },
2653
+ "8610fffd2d2a4ec28f8c874c06073ce7": {
2654
+ "model_module": "@jupyter-widgets/controls",
2655
+ "model_module_version": "1.5.0",
2656
+ "model_name": "HBoxModel",
2657
+ "state": {
2658
+ "_dom_classes": [],
2659
+ "_model_module": "@jupyter-widgets/controls",
2660
+ "_model_module_version": "1.5.0",
2661
+ "_model_name": "HBoxModel",
2662
+ "_view_count": null,
2663
+ "_view_module": "@jupyter-widgets/controls",
2664
+ "_view_module_version": "1.5.0",
2665
+ "_view_name": "HBoxView",
2666
+ "box_style": "",
2667
+ "children": [
2668
+ "IPY_MODEL_1072a8a142f64dfd96ee528a2e9d1595",
2669
+ "IPY_MODEL_67b4083cd4234f52bb7cca27ab9cddb3"
2670
+ ],
2671
+ "layout": "IPY_MODEL_d0a1ebdf7fc0473f91c39b29ca580934"
2672
+ }
2673
+ },
2674
+ "86cb4f568f454ff8832face502fb0745": {
2675
+ "model_module": "@jupyter-widgets/output",
2676
+ "model_module_version": "1.0.0",
2677
+ "model_name": "OutputModel",
2678
+ "state": {
2679
+ "_dom_classes": [],
2680
+ "_model_module": "@jupyter-widgets/output",
2681
+ "_model_module_version": "1.0.0",
2682
+ "_model_name": "OutputModel",
2683
+ "_view_count": null,
2684
+ "_view_module": "@jupyter-widgets/output",
2685
+ "_view_module_version": "1.0.0",
2686
+ "_view_name": "OutputView",
2687
+ "layout": "IPY_MODEL_53722998fbe64a7c94829b79e8cd69d6",
2688
+ "msg_id": "",
2689
+ "outputs": [
2690
+ {
2691
+ "name": "stdout",
2692
+ "output_type": "stream",
2693
+ "text": [
2694
+ "Extracted files and directories: instructorTest/, __MACOSX/._instructorTest, instructorTest/bell_charreau.json, __MACOSX/instructorTest/._bell_charreau.json, instructorTest/spencer-smith_jesse.json, __MACOSX/instructorTest/._spencer-smith_jesse.json\n",
2695
+ "\n",
2696
+ "Loading successful!\n",
2697
+ "Learning Objectives: \n",
2698
+ "Extracted JSON files: instructorTest/spencer-smith_jesse.json, instructorTest/bell_charreau.json\n",
2699
+ "Submitted and Reset all values.\n"
2700
+ ]
2701
+ }
2702
+ ]
2703
+ }
2704
+ },
2705
+ "a84d31fb8f4e4bafb74035158834b404": {
2706
+ "model_module": "@jupyter-widgets/controls",
2707
+ "model_module_version": "1.5.0",
2708
+ "model_name": "VBoxModel",
2709
+ "state": {
2710
+ "_dom_classes": [],
2711
+ "_model_module": "@jupyter-widgets/controls",
2712
+ "_model_module_version": "1.5.0",
2713
+ "_model_name": "VBoxModel",
2714
+ "_view_count": null,
2715
+ "_view_module": "@jupyter-widgets/controls",
2716
+ "_view_module_version": "1.5.0",
2717
+ "_view_name": "VBoxView",
2718
+ "box_style": "",
2719
+ "children": [
2720
+ "IPY_MODEL_b051a90758434644955747bc02d00bab",
2721
+ "IPY_MODEL_252b8009f3734ed2908049ebb40c0247",
2722
+ "IPY_MODEL_6622f76f91f44527a87a7575bbd388d2",
2723
+ "IPY_MODEL_654ab6d155eb457ea5c719a9ac27ad5b",
2724
+ "IPY_MODEL_86cb4f568f454ff8832face502fb0745"
2725
+ ],
2726
+ "layout": "IPY_MODEL_e30fe87f01bc4580a61713b5b72439a2"
2727
+ }
2728
+ },
2729
+ "abbecdc637694e7cb026e003244e7037": {
2730
+ "model_module": "@jupyter-widgets/base",
2731
+ "model_module_version": "1.2.0",
2732
+ "model_name": "LayoutModel",
2733
+ "state": {
2734
+ "_model_module": "@jupyter-widgets/base",
2735
+ "_model_module_version": "1.2.0",
2736
+ "_model_name": "LayoutModel",
2737
+ "_view_count": null,
2738
+ "_view_module": "@jupyter-widgets/base",
2739
+ "_view_module_version": "1.2.0",
2740
+ "_view_name": "LayoutView",
2741
+ "align_content": null,
2742
+ "align_items": null,
2743
+ "align_self": null,
2744
+ "border": null,
2745
+ "bottom": null,
2746
+ "display": null,
2747
+ "flex": null,
2748
+ "flex_flow": null,
2749
+ "grid_area": null,
2750
+ "grid_auto_columns": null,
2751
+ "grid_auto_flow": null,
2752
+ "grid_auto_rows": null,
2753
+ "grid_column": null,
2754
+ "grid_gap": null,
2755
+ "grid_row": null,
2756
+ "grid_template_areas": null,
2757
+ "grid_template_columns": null,
2758
+ "grid_template_rows": null,
2759
+ "height": null,
2760
+ "justify_content": null,
2761
+ "justify_items": null,
2762
+ "left": null,
2763
+ "margin": null,
2764
+ "max_height": null,
2765
+ "max_width": null,
2766
+ "min_height": null,
2767
+ "min_width": null,
2768
+ "object_fit": null,
2769
+ "object_position": null,
2770
+ "order": null,
2771
+ "overflow": null,
2772
+ "overflow_x": null,
2773
+ "overflow_y": null,
2774
+ "padding": null,
2775
+ "right": null,
2776
+ "top": null,
2777
+ "visibility": null,
2778
+ "width": null
2779
+ }
2780
+ },
2781
+ "b051a90758434644955747bc02d00bab": {
2782
+ "model_module": "@jupyter-widgets/controls",
2783
+ "model_module_version": "1.5.0",
2784
+ "model_name": "HTMLModel",
2785
+ "state": {
2786
+ "_dom_classes": [],
2787
+ "_model_module": "@jupyter-widgets/controls",
2788
+ "_model_module_version": "1.5.0",
2789
+ "_model_name": "HTMLModel",
2790
+ "_view_count": null,
2791
+ "_view_module": "@jupyter-widgets/controls",
2792
+ "_view_module_version": "1.5.0",
2793
+ "_view_name": "HTMLView",
2794
+ "description": "",
2795
+ "description_tooltip": null,
2796
+ "layout": "IPY_MODEL_d16b25c7e9e948938c9303fbe8ae3dcc",
2797
+ "placeholder": "​",
2798
+ "style": "IPY_MODEL_453b12da4b6540cd9e4e57f73a4d670c",
2799
+ "value": "<h2>Instructor Grading Configuration</h2>"
2800
+ }
2801
+ },
2802
+ "b74cf92175374028948d4cf529d4d1e6": {
2803
+ "model_module": "@jupyter-widgets/base",
2804
+ "model_module_version": "1.2.0",
2805
+ "model_name": "LayoutModel",
2806
+ "state": {
2807
+ "_model_module": "@jupyter-widgets/base",
2808
+ "_model_module_version": "1.2.0",
2809
+ "_model_name": "LayoutModel",
2810
+ "_view_count": null,
2811
+ "_view_module": "@jupyter-widgets/base",
2812
+ "_view_module_version": "1.2.0",
2813
+ "_view_name": "LayoutView",
2814
+ "align_content": null,
2815
+ "align_items": null,
2816
+ "align_self": null,
2817
+ "border": null,
2818
+ "bottom": null,
2819
+ "display": null,
2820
+ "flex": null,
2821
+ "flex_flow": null,
2822
+ "grid_area": null,
2823
+ "grid_auto_columns": null,
2824
+ "grid_auto_flow": null,
2825
+ "grid_auto_rows": null,
2826
+ "grid_column": null,
2827
+ "grid_gap": null,
2828
+ "grid_row": null,
2829
+ "grid_template_areas": null,
2830
+ "grid_template_columns": null,
2831
+ "grid_template_rows": null,
2832
+ "height": null,
2833
+ "justify_content": null,
2834
+ "justify_items": null,
2835
+ "left": null,
2836
+ "margin": null,
2837
+ "max_height": null,
2838
+ "max_width": null,
2839
+ "min_height": null,
2840
+ "min_width": null,
2841
+ "object_fit": null,
2842
+ "object_position": null,
2843
+ "order": null,
2844
+ "overflow": null,
2845
+ "overflow_x": null,
2846
+ "overflow_y": null,
2847
+ "padding": null,
2848
+ "right": null,
2849
+ "top": null,
2850
+ "visibility": null,
2851
+ "width": "auto"
2852
+ }
2853
+ },
2854
+ "d0a1ebdf7fc0473f91c39b29ca580934": {
2855
+ "model_module": "@jupyter-widgets/base",
2856
+ "model_module_version": "1.2.0",
2857
+ "model_name": "LayoutModel",
2858
+ "state": {
2859
+ "_model_module": "@jupyter-widgets/base",
2860
+ "_model_module_version": "1.2.0",
2861
+ "_model_name": "LayoutModel",
2862
+ "_view_count": null,
2863
+ "_view_module": "@jupyter-widgets/base",
2864
+ "_view_module_version": "1.2.0",
2865
+ "_view_name": "LayoutView",
2866
+ "align_content": null,
2867
+ "align_items": null,
2868
+ "align_self": null,
2869
+ "border": null,
2870
+ "bottom": null,
2871
+ "display": null,
2872
+ "flex": null,
2873
+ "flex_flow": null,
2874
+ "grid_area": null,
2875
+ "grid_auto_columns": null,
2876
+ "grid_auto_flow": null,
2877
+ "grid_auto_rows": null,
2878
+ "grid_column": null,
2879
+ "grid_gap": null,
2880
+ "grid_row": null,
2881
+ "grid_template_areas": null,
2882
+ "grid_template_columns": null,
2883
+ "grid_template_rows": null,
2884
+ "height": null,
2885
+ "justify_content": null,
2886
+ "justify_items": null,
2887
+ "left": null,
2888
+ "margin": null,
2889
+ "max_height": null,
2890
+ "max_width": null,
2891
+ "min_height": null,
2892
+ "min_width": null,
2893
+ "object_fit": null,
2894
+ "object_position": null,
2895
+ "order": null,
2896
+ "overflow": null,
2897
+ "overflow_x": null,
2898
+ "overflow_y": null,
2899
+ "padding": null,
2900
+ "right": null,
2901
+ "top": null,
2902
+ "visibility": null,
2903
+ "width": null
2904
+ }
2905
+ },
2906
+ "d0bd0e3f12594ff1a51365b65a3fcc43": {
2907
+ "model_module": "@jupyter-widgets/controls",
2908
+ "model_module_version": "1.5.0",
2909
+ "model_name": "DescriptionStyleModel",
2910
+ "state": {
2911
+ "_model_module": "@jupyter-widgets/controls",
2912
+ "_model_module_version": "1.5.0",
2913
+ "_model_name": "DescriptionStyleModel",
2914
+ "_view_count": null,
2915
+ "_view_module": "@jupyter-widgets/base",
2916
+ "_view_module_version": "1.2.0",
2917
+ "_view_name": "StyleView",
2918
+ "description_width": ""
2919
+ }
2920
+ },
2921
+ "d16b25c7e9e948938c9303fbe8ae3dcc": {
2922
+ "model_module": "@jupyter-widgets/base",
2923
+ "model_module_version": "1.2.0",
2924
+ "model_name": "LayoutModel",
2925
+ "state": {
2926
+ "_model_module": "@jupyter-widgets/base",
2927
+ "_model_module_version": "1.2.0",
2928
+ "_model_name": "LayoutModel",
2929
+ "_view_count": null,
2930
+ "_view_module": "@jupyter-widgets/base",
2931
+ "_view_module_version": "1.2.0",
2932
+ "_view_name": "LayoutView",
2933
+ "align_content": null,
2934
+ "align_items": null,
2935
+ "align_self": null,
2936
+ "border": null,
2937
+ "bottom": null,
2938
+ "display": null,
2939
+ "flex": null,
2940
+ "flex_flow": null,
2941
+ "grid_area": null,
2942
+ "grid_auto_columns": null,
2943
+ "grid_auto_flow": null,
2944
+ "grid_auto_rows": null,
2945
+ "grid_column": null,
2946
+ "grid_gap": null,
2947
+ "grid_row": null,
2948
+ "grid_template_areas": null,
2949
+ "grid_template_columns": null,
2950
+ "grid_template_rows": null,
2951
+ "height": null,
2952
+ "justify_content": null,
2953
+ "justify_items": null,
2954
+ "left": null,
2955
+ "margin": null,
2956
+ "max_height": null,
2957
+ "max_width": null,
2958
+ "min_height": null,
2959
+ "min_width": null,
2960
+ "object_fit": null,
2961
+ "object_position": null,
2962
+ "order": null,
2963
+ "overflow": null,
2964
+ "overflow_x": null,
2965
+ "overflow_y": null,
2966
+ "padding": null,
2967
+ "right": null,
2968
+ "top": null,
2969
+ "visibility": null,
2970
+ "width": null
2971
+ }
2972
+ },
2973
+ "dbb88901f5084d49af208b91b52b6073": {
2974
+ "model_module": "@jupyter-widgets/controls",
2975
+ "model_module_version": "1.5.0",
2976
+ "model_name": "ButtonStyleModel",
2977
+ "state": {
2978
+ "_model_module": "@jupyter-widgets/controls",
2979
+ "_model_module_version": "1.5.0",
2980
+ "_model_name": "ButtonStyleModel",
2981
+ "_view_count": null,
2982
+ "_view_module": "@jupyter-widgets/base",
2983
+ "_view_module_version": "1.2.0",
2984
+ "_view_name": "StyleView",
2985
+ "button_color": null,
2986
+ "font_weight": ""
2987
+ }
2988
+ },
2989
+ "dde20647d3594d31b66b19659f53a95e": {
2990
+ "model_module": "@jupyter-widgets/controls",
2991
+ "model_module_version": "1.5.0",
2992
+ "model_name": "TextareaModel",
2993
+ "state": {
2994
+ "_dom_classes": [],
2995
+ "_model_module": "@jupyter-widgets/controls",
2996
+ "_model_module_version": "1.5.0",
2997
+ "_model_name": "TextareaModel",
2998
+ "_view_count": null,
2999
+ "_view_module": "@jupyter-widgets/controls",
3000
+ "_view_module_version": "1.5.0",
3001
+ "_view_name": "TextareaView",
3002
+ "continuous_update": true,
3003
+ "description": "Learning Objectives",
3004
+ "description_tooltip": null,
3005
+ "disabled": false,
3006
+ "layout": "IPY_MODEL_b74cf92175374028948d4cf529d4d1e6",
3007
+ "placeholder": "Learning objectives: 1. Understand and implement classes in object-oriented programming",
3008
+ "rows": null,
3009
+ "style": "IPY_MODEL_54e3918921f44fb4a9020beab951fcdf",
3010
+ "value": ""
3011
+ }
3012
+ },
3013
+ "df1c46361f714aceb9c046f98fede40c": {
3014
+ "model_module": "@jupyter-widgets/controls",
3015
+ "model_module_version": "1.5.0",
3016
+ "model_name": "ButtonStyleModel",
3017
+ "state": {
3018
+ "_model_module": "@jupyter-widgets/controls",
3019
+ "_model_module_version": "1.5.0",
3020
+ "_model_name": "ButtonStyleModel",
3021
+ "_view_count": null,
3022
+ "_view_module": "@jupyter-widgets/base",
3023
+ "_view_module_version": "1.2.0",
3024
+ "_view_name": "StyleView",
3025
+ "button_color": null,
3026
+ "font_weight": ""
3027
+ }
3028
+ },
3029
+ "dfa8d6c7d70b42468cbda035de89404c": {
3030
+ "model_module": "@jupyter-widgets/base",
3031
+ "model_module_version": "1.2.0",
3032
+ "model_name": "LayoutModel",
3033
+ "state": {
3034
+ "_model_module": "@jupyter-widgets/base",
3035
+ "_model_module_version": "1.2.0",
3036
+ "_model_name": "LayoutModel",
3037
+ "_view_count": null,
3038
+ "_view_module": "@jupyter-widgets/base",
3039
+ "_view_module_version": "1.2.0",
3040
+ "_view_name": "LayoutView",
3041
+ "align_content": null,
3042
+ "align_items": null,
3043
+ "align_self": null,
3044
+ "border": null,
3045
+ "bottom": null,
3046
+ "display": null,
3047
+ "flex": null,
3048
+ "flex_flow": null,
3049
+ "grid_area": null,
3050
+ "grid_auto_columns": null,
3051
+ "grid_auto_flow": null,
3052
+ "grid_auto_rows": null,
3053
+ "grid_column": null,
3054
+ "grid_gap": null,
3055
+ "grid_row": null,
3056
+ "grid_template_areas": null,
3057
+ "grid_template_columns": null,
3058
+ "grid_template_rows": null,
3059
+ "height": null,
3060
+ "justify_content": null,
3061
+ "justify_items": null,
3062
+ "left": null,
3063
+ "margin": null,
3064
+ "max_height": null,
3065
+ "max_width": null,
3066
+ "min_height": null,
3067
+ "min_width": null,
3068
+ "object_fit": null,
3069
+ "object_position": null,
3070
+ "order": null,
3071
+ "overflow": null,
3072
+ "overflow_x": null,
3073
+ "overflow_y": null,
3074
+ "padding": null,
3075
+ "right": null,
3076
+ "top": null,
3077
+ "visibility": null,
3078
+ "width": null
3079
+ }
3080
+ },
3081
+ "e30fe87f01bc4580a61713b5b72439a2": {
3082
+ "model_module": "@jupyter-widgets/base",
3083
+ "model_module_version": "1.2.0",
3084
+ "model_name": "LayoutModel",
3085
+ "state": {
3086
+ "_model_module": "@jupyter-widgets/base",
3087
+ "_model_module_version": "1.2.0",
3088
+ "_model_name": "LayoutModel",
3089
+ "_view_count": null,
3090
+ "_view_module": "@jupyter-widgets/base",
3091
+ "_view_module_version": "1.2.0",
3092
+ "_view_name": "LayoutView",
3093
+ "align_content": null,
3094
+ "align_items": "stretch",
3095
+ "align_self": null,
3096
+ "border": "solid 1px gray",
3097
+ "bottom": null,
3098
+ "display": "flex",
3099
+ "flex": null,
3100
+ "flex_flow": "column",
3101
+ "grid_area": null,
3102
+ "grid_auto_columns": null,
3103
+ "grid_auto_flow": null,
3104
+ "grid_auto_rows": null,
3105
+ "grid_column": null,
3106
+ "grid_gap": null,
3107
+ "grid_row": null,
3108
+ "grid_template_areas": null,
3109
+ "grid_template_columns": null,
3110
+ "grid_template_rows": null,
3111
+ "height": null,
3112
+ "justify_content": null,
3113
+ "justify_items": null,
3114
+ "left": null,
3115
+ "margin": null,
3116
+ "max_height": null,
3117
+ "max_width": null,
3118
+ "min_height": null,
3119
+ "min_width": null,
3120
+ "object_fit": null,
3121
+ "object_position": null,
3122
+ "order": null,
3123
+ "overflow": null,
3124
+ "overflow_x": null,
3125
+ "overflow_y": null,
3126
+ "padding": "0px 30px 20px 30px",
3127
+ "right": null,
3128
+ "top": null,
3129
+ "visibility": null,
3130
+ "width": "50%"
3131
+ }
3132
+ },
3133
+ "f7d75b0a32554a9589c513336fc30095": {
3134
+ "model_module": "@jupyter-widgets/controls",
3135
+ "model_module_version": "1.5.0",
3136
+ "model_name": "DescriptionStyleModel",
3137
+ "state": {
3138
+ "_model_module": "@jupyter-widgets/controls",
3139
+ "_model_module_version": "1.5.0",
3140
+ "_model_name": "DescriptionStyleModel",
3141
+ "_view_count": null,
3142
+ "_view_module": "@jupyter-widgets/base",
3143
+ "_view_module_version": "1.2.0",
3144
+ "_view_name": "StyleView",
3145
+ "description_width": "initial"
3146
+ }
3147
+ }
3148
+ }
3149
+ }
3150
+ },
3151
+ "nbformat": 4,
3152
+ "nbformat_minor": 0
3153
+ }
instructor_intr_notebook_example_training.ipynb ADDED
@@ -0,0 +1,1277 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "id": "view-in-github",
7
+ "colab_type": "text"
8
+ },
9
+ "source": [
10
+ "<a href=\"https://colab.research.google.com/github/vanderbilt-data-science/lo-achievement/blob/adding_grading_levels_to_instructor_nb/instructor_intr_notebook_example_training.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "markdown",
15
+ "metadata": {
16
+ "id": "WMKrKfx8_3fc"
17
+ },
18
+ "source": [
19
+ "# Instructor Grading and Assessment\n",
20
+ "This notebook executes grading of student submissions based on the examples provided in the [Wiki](https://github.com/vanderbilt-data-science/lo-achievement/wiki/Examples-of-great,-good,-and-poor-answers-to-questions) from Dr. Jesse Blocher. In this iteration, we use the Unstructured File Loader, which cannot proccess .json files (the preferred format). We are working on finding a file loader that allows .json. In this version of the notebook, the model has only been trained on Question 2 from the notebook.\n",
21
+ "\n",
22
+ "To train the model, we used 2 out of the three student example from each grade brack and inputted into a .pdf with clearly defined levels. Then, we used the excluded answers to test the accuracy of the model's grading."
23
+ ]
24
+ },
25
+ {
26
+ "cell_type": "markdown",
27
+ "source": [
28
+ "# Load and Install Neccessary Libraries"
29
+ ],
30
+ "metadata": {
31
+ "id": "2UQgQSoMx4My"
32
+ }
33
+ },
34
+ {
35
+ "cell_type": "code",
36
+ "source": [
37
+ "! pip install -q langchain=='0.0.229' openai gradio numpy chromadb tiktoken unstructured pdf2image pydantic==\"1.10.8\" jq"
38
+ ],
39
+ "metadata": {
40
+ "id": "UJi1Oy0CyPHD"
41
+ },
42
+ "execution_count": null,
43
+ "outputs": []
44
+ },
45
+ {
46
+ "cell_type": "code",
47
+ "source": [
48
+ "# import necessary libraries here\n",
49
+ "from getpass import getpass\n",
50
+ "from langchain.llms import OpenAI as openai\n",
51
+ "from langchain.chat_models import ChatOpenAI\n",
52
+ "from langchain.prompts import PromptTemplate\n",
53
+ "from langchain.document_loaders import TextLoader\n",
54
+ "from langchain.indexes import VectorstoreIndexCreator\n",
55
+ "from langchain.text_splitter import CharacterTextSplitter\n",
56
+ "from langchain.embeddings import OpenAIEmbeddings\n",
57
+ "from langchain.schema import SystemMessage, HumanMessage, AIMessage\n",
58
+ "import numpy as np\n",
59
+ "import os\n",
60
+ "from langchain.vectorstores import Chroma\n",
61
+ "from langchain.document_loaders.unstructured import UnstructuredFileLoader\n",
62
+ "from langchain.document_loaders import UnstructuredFileLoader\n",
63
+ "from langchain.chains import VectorDBQA\n",
64
+ "from langchain.document_loaders import JSONLoader\n",
65
+ "import json\n",
66
+ "from pathlib import Path\n",
67
+ "from pprint import pprint\n",
68
+ "\n",
69
+ "\n",
70
+ "from langchain.prompts.few_shot import FewShotPromptTemplate\n",
71
+ "from langchain.prompts.prompt import PromptTemplate"
72
+ ],
73
+ "metadata": {
74
+ "id": "YHytCUoExrYe"
75
+ },
76
+ "execution_count": 2,
77
+ "outputs": []
78
+ },
79
+ {
80
+ "cell_type": "markdown",
81
+ "source": [
82
+ "# Set up model and pass OpenAI Key\n",
83
+ "Here we are setting up the model and using a system message to pass a persona prompt with the grading advice"
84
+ ],
85
+ "metadata": {
86
+ "id": "4elyN72szz-_"
87
+ }
88
+ },
89
+ {
90
+ "cell_type": "code",
91
+ "source": [
92
+ "# setup open AI api key\n",
93
+ "openai_api_key = getpass()"
94
+ ],
95
+ "metadata": {
96
+ "id": "jVPEFX3ixJnM"
97
+ },
98
+ "execution_count": null,
99
+ "outputs": []
100
+ },
101
+ {
102
+ "cell_type": "code",
103
+ "source": [
104
+ "os.environ[\"OPENAI_API_KEY\"] = openai_api_key\n",
105
+ "openai.api_key = openai_api_key"
106
+ ],
107
+ "metadata": {
108
+ "id": "obplpeB78h_M"
109
+ },
110
+ "execution_count": null,
111
+ "outputs": []
112
+ },
113
+ {
114
+ "cell_type": "code",
115
+ "source": [
116
+ "# Initiate model (model type and specify persona)\n",
117
+ "llm = ChatOpenAI(model='gpt-3.5-turbo-16k')\n",
118
+ "messages = [\n",
119
+ " SystemMessage(content=\"You are a helpful grading assistant. In grading the following questions, keep in mind the advice from the professor: one aspect of it was being specific. The poor answers have a lot of platitudes, the better answers give specific examples. Secondly, they should discuss automation and/or prediction specifically. Those are the things that ML does, it is not 'technology' broadly.\"),\n",
120
+ " HumanMessage(content=\"\")\n",
121
+ "]"
122
+ ],
123
+ "metadata": {
124
+ "id": "e_ZavOnS8iuE"
125
+ },
126
+ "execution_count": null,
127
+ "outputs": []
128
+ },
129
+ {
130
+ "cell_type": "markdown",
131
+ "source": [
132
+ "# Original route: Vector Stores from .json files\n",
133
+ "This section uses .json files and vector stores with system/human messaging and a persona prompt tailored to Dr. Blocher's grading philosophy for the Wiki examples (only includes section 2 at this time)."
134
+ ],
135
+ "metadata": {
136
+ "id": "KHS3E6PydN-2"
137
+ }
138
+ },
139
+ {
140
+ "cell_type": "markdown",
141
+ "source": [
142
+ "## Grading based on A, B, and C-level answers from previous students to Question 2 from the [Wiki](https://github.com/vanderbilt-data-science/lo-achievement/wiki/Examples-of-great,-good,-and-poor-answers-to-questions):\n",
143
+ "\n",
144
+ "**Question 2:** Why is machine learning so important for businesses? Answer this question generally (i.e. such that it applies to many or at least most businesses)."
145
+ ],
146
+ "metadata": {
147
+ "id": "IYCBSD_8l7uu"
148
+ }
149
+ },
150
+ {
151
+ "cell_type": "markdown",
152
+ "source": [
153
+ "### Creating .json file from case examples (Question 2)\n",
154
+ "The purpose of this cell is to create a json file based on the previously submitted, graded work of students based on the case file provided by Dr. Blocher in the Wiki. So, here you could create your own file, or for quick demo purposes you can use the zip file in the next section heading."
155
+ ],
156
+ "metadata": {
157
+ "id": "TYlGEusr64kA"
158
+ }
159
+ },
160
+ {
161
+ "cell_type": "code",
162
+ "source": [
163
+ "q2 = 'Question 2: Why is machine learning so important for businesses? Answer this question generally (i.e. such that it applies to many or at least most businesses).'"
164
+ ],
165
+ "metadata": {
166
+ "id": "DAIaKTRYxOlh"
167
+ },
168
+ "execution_count": null,
169
+ "outputs": []
170
+ },
171
+ {
172
+ "cell_type": "code",
173
+ "source": [
174
+ "# A-level answers\n",
175
+ "\n",
176
+ "q2_A_answer_1 = 'Machine learning is extremely important tool for businesses. It can be used in a variety of ways, but most importantly, it can be used to identify patterns within their data that might not otherwise be identified by human beings. For example, it can understand customer behaviors, optimize logistics, and expand efficiencies throughout the business. Machine learning does not get tired, meaning it can work as long as you want it to. It can sift through massive amounts of data, that no human being can look through in an efficient manner. Machine learning can be used as a tool to identify anomalies when something needs to be checked to save or gain money. The predictions that companies gain from machine learning are cheap, accurate, and automate. These machine learning algorithms can be brought to larger scales to encompass the whole business and its operations. It is important to note, Machine learning is just predictions. Predictions to understand important patterns that could make or break a company since they understand the patterns of their business more. It is an amazing tool, but should be used wisely and carefully because if not, it can expensive, useless, and straight up wrong.'\n",
177
+ "q2_A_answer_2 = 'Machine learning is important for most of the sectors in business. Overall, it gives the company of an overview about what would be the trend for their business industry, and analyze the customer behavior to help business segment their customers groups. Today, many companies have a vast amount of information generated by behavior, computer, events, people, and devices. This massive amount of data is difficult for human to handle, and even if human manages it, it is not profitable as human labor is expensive. Thanks to machine learning, companies can utilize their in-house or even third-party data to make something useful for their business. In medical analysis, for example, with human, it takes a very long time to find patterns in thousands of MRI scans. On the other hand, machines can detect patterns in seconds by entering data as long as the information is correctly labeled or trained properly. Another example would be segmenting customer group. In marketing department, the business could use unsupervised machine learning to cluster their customer segments to generate personalized contents that are relevant for each of individuals.'\n",
178
+ "\n",
179
+ "# List creation\n",
180
+ "\n",
181
+ "q2_A_answers_list = [q2_A_answer_1, q2_A_answer_2]\n"
182
+ ],
183
+ "metadata": {
184
+ "id": "yQT6aExSr1dP"
185
+ },
186
+ "execution_count": 9,
187
+ "outputs": []
188
+ },
189
+ {
190
+ "cell_type": "code",
191
+ "source": [
192
+ "# B-level answers\n",
193
+ "\n",
194
+ "q2_B_answer_1 = 'Companies use ML models to improve different aspects of their business, like manufacturing, hiring, deployment, advertising, etc. The main goal is to improve productive and increase profitability of the company. The ML models are fed with company and externally available data to help the company optimize its departments and in turn become more financially successful/ productive. For example, using purchasing history, the company can predict who to advertise products to, to increase sales.'\n",
195
+ "q2_B_answer_2 = 'Machine learning allows business to have automated decision, scale, predictive analysis and performance. Machine learning also helps a business have a data strategy. This is how a firm uses data, data infrastructure, governance, etc. to accomplish its strategic goals and maintain/grow their competitive advantage within their industry.'\n",
196
+ "q2_B_answer_3 = 'The short answer is ML can help make decisions for businesses. To be clarified, ML does not make decisions for businesses. I mean it can, but people have not trusted ML enough yet and ML has not been that good to let it directly make business decisions. Business people only use ML to help themselves get intuitions of how decisions should be made and make predictions of results they might get based on their decisions. For example, if a business tries to launch a new product, it will use ML to test whether it will work or not on a small scale before it is introduced to a large scale. People called this step piloting. In this step, people collect data that is generated by using the pilot product and analyze their next move. They could tweak some features based on the feedback. If they think the data in their interest shows the product performs well, they might predict that this product will be successful when it is introduced on a large scale. Then, they will launch it.'\n",
197
+ "# List creation\n",
198
+ "\n",
199
+ "q2_B_answers_list = [q2_B_answer_1, q2_B_answer_2, q2_B_answer_3]"
200
+ ],
201
+ "metadata": {
202
+ "id": "KB1CmeRwtRvf"
203
+ },
204
+ "execution_count": 10,
205
+ "outputs": []
206
+ },
207
+ {
208
+ "cell_type": "code",
209
+ "source": [
210
+ "# C-level answers\n",
211
+ "\n",
212
+ "q2_C_answer_1 = 'Machine learning powers many of the services we use today, such as the recommendation systems of Spotify and Netflix; search engines such as Google and Bing; social media such as TikTok and Instagram; voices such as Siri and Alexa, the list can go on. All these examples show that machine learning is already starting to play a pivotal role in today\"s data-rich world. Machines can help us sift through useful information that can lead to big breakthroughs, and we have seen the widespread use of this technology in various industries such as finance, healthcare, insurance, manufacturing, transformational change, etc.'\n",
213
+ "q2_C_answer_2 = 'As technology advanced, there are tons of new data generated and stored. All industries experienced this surge in data, including business. There is a huge amount of business data stored and waited in the database of each firm and they need solutions to utilize these data. Machine learning is a very promising approach for firms to puts these data in and output a meaning pattern or result that could help the firms with their existing work. This could turn into a working product or provide insights that could enhance the efficiency of the company’s workflow. With machine learning, a firm could either enter a new market with the new product or save time and effort with the meaningful insights. Achieving these possibilities with the data they already owned is a low effort but high reward action. This is the reason machine learning is valued by many businesses recently.'\n",
214
+ "\n",
215
+ "# List creation\n",
216
+ "\n",
217
+ "q2_C_answers_list = [q2_C_answer_1, q2_C_answer_2]"
218
+ ],
219
+ "metadata": {
220
+ "id": "3diAz43othjc"
221
+ },
222
+ "execution_count": 11,
223
+ "outputs": []
224
+ },
225
+ {
226
+ "cell_type": "code",
227
+ "source": [
228
+ "q2_Q_and_A = [\"Question:\", q2, \"A-level Answers\", q2_A_answers_list, \"B-level Answers\", q2_B_answers_list, \"C-level Answers\", q2_C_answers_list]"
229
+ ],
230
+ "metadata": {
231
+ "id": "oAnrMSU6u9do"
232
+ },
233
+ "execution_count": null,
234
+ "outputs": []
235
+ },
236
+ {
237
+ "cell_type": "code",
238
+ "source": [
239
+ "import json\n",
240
+ "from google.colab import files\n",
241
+ "\n",
242
+ "def save_example_answers(examples, filename='wiki_ABC_Q2examples.json'):\n",
243
+ " with open(filename, 'w') as file:\n",
244
+ " json.dump(examples, file)\n",
245
+ " files.download(filename)\n",
246
+ "\n",
247
+ "save_example_answers(q2_Q_and_A)"
248
+ ],
249
+ "metadata": {
250
+ "colab": {
251
+ "base_uri": "https://localhost:8080/",
252
+ "height": 17
253
+ },
254
+ "id": "B16iYMEnri9s",
255
+ "outputId": "3e565b3d-804c-4b5e-acc8-efeb955c6c14"
256
+ },
257
+ "execution_count": null,
258
+ "outputs": [
259
+ {
260
+ "output_type": "display_data",
261
+ "data": {
262
+ "text/plain": [
263
+ "<IPython.core.display.Javascript object>"
264
+ ],
265
+ "application/javascript": [
266
+ "\n",
267
+ " async function download(id, filename, size) {\n",
268
+ " if (!google.colab.kernel.accessAllowed) {\n",
269
+ " return;\n",
270
+ " }\n",
271
+ " const div = document.createElement('div');\n",
272
+ " const label = document.createElement('label');\n",
273
+ " label.textContent = `Downloading \"${filename}\": `;\n",
274
+ " div.appendChild(label);\n",
275
+ " const progress = document.createElement('progress');\n",
276
+ " progress.max = size;\n",
277
+ " div.appendChild(progress);\n",
278
+ " document.body.appendChild(div);\n",
279
+ "\n",
280
+ " const buffers = [];\n",
281
+ " let downloaded = 0;\n",
282
+ "\n",
283
+ " const channel = await google.colab.kernel.comms.open(id);\n",
284
+ " // Send a message to notify the kernel that we're ready.\n",
285
+ " channel.send({})\n",
286
+ "\n",
287
+ " for await (const message of channel.messages) {\n",
288
+ " // Send a message to notify the kernel that we're ready.\n",
289
+ " channel.send({})\n",
290
+ " if (message.buffers) {\n",
291
+ " for (const buffer of message.buffers) {\n",
292
+ " buffers.push(buffer);\n",
293
+ " downloaded += buffer.byteLength;\n",
294
+ " progress.value = downloaded;\n",
295
+ " }\n",
296
+ " }\n",
297
+ " }\n",
298
+ " const blob = new Blob(buffers, {type: 'application/binary'});\n",
299
+ " const a = document.createElement('a');\n",
300
+ " a.href = window.URL.createObjectURL(blob);\n",
301
+ " a.download = filename;\n",
302
+ " div.appendChild(a);\n",
303
+ " a.click();\n",
304
+ " div.remove();\n",
305
+ " }\n",
306
+ " "
307
+ ]
308
+ },
309
+ "metadata": {}
310
+ },
311
+ {
312
+ "output_type": "display_data",
313
+ "data": {
314
+ "text/plain": [
315
+ "<IPython.core.display.Javascript object>"
316
+ ],
317
+ "application/javascript": [
318
+ "download(\"download_8ec6f24c-9653-4335-8c1b-766926434399\", \"wiki_ABC_Q2examples.json\", 5931)"
319
+ ]
320
+ },
321
+ "metadata": {}
322
+ }
323
+ ]
324
+ },
325
+ {
326
+ "cell_type": "markdown",
327
+ "source": [
328
+ "### Creating a Vector Store\n",
329
+ "Here we create a vector store based on the .json file (which you can find at this [link](https://drive.google.com/file/d/1nk6JhbqoUHFie-Ewb436pdV-onlObZUt/view?usp=sharing), but will need to unzip)."
330
+ ],
331
+ "metadata": {
332
+ "id": "exXR-A2oxWeg"
333
+ }
334
+ },
335
+ {
336
+ "cell_type": "code",
337
+ "source": [
338
+ "# Upload .json or zip\n",
339
+ "from google.colab import files\n",
340
+ "uploaded = files.upload()"
341
+ ],
342
+ "metadata": {
343
+ "id": "Wpt6qsmEw8WP"
344
+ },
345
+ "execution_count": null,
346
+ "outputs": []
347
+ },
348
+ {
349
+ "cell_type": "code",
350
+ "source": [
351
+ "# Unzip file if neccessary\n",
352
+ "!unzip '/content/wiki_ABC_Q2examples (2).json.zip'"
353
+ ],
354
+ "metadata": {
355
+ "id": "7T2LpkiZh9LT"
356
+ },
357
+ "execution_count": null,
358
+ "outputs": []
359
+ },
360
+ {
361
+ "cell_type": "code",
362
+ "source": [
363
+ "# Create file path\n",
364
+ "data = '/content/wiki_ABC_Q2examples (2).json'"
365
+ ],
366
+ "metadata": {
367
+ "id": "CVY0CVvhxyCu"
368
+ },
369
+ "execution_count": null,
370
+ "outputs": []
371
+ },
372
+ {
373
+ "cell_type": "code",
374
+ "source": [
375
+ "# Load the .json\n",
376
+ "data = json.loads(Path(file_path).read_text())\n",
377
+ "data = str(data)"
378
+ ],
379
+ "metadata": {
380
+ "id": "dHjK0nN6yYPH"
381
+ },
382
+ "execution_count": null,
383
+ "outputs": []
384
+ },
385
+ {
386
+ "cell_type": "code",
387
+ "source": [
388
+ "# Create Vector Store\n",
389
+ "\n",
390
+ "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
391
+ "texts = text_splitter.split_text(data)\n",
392
+ "\n",
393
+ "embeddings = OpenAIEmbeddings()\n",
394
+ "\n",
395
+ "db = Chroma.from_texts(texts, embeddings)\n",
396
+ "\n",
397
+ "qa = VectorDBQA.from_chain_type(llm=llm, chain_type=\"stuff\", vectorstore=db, k=1)"
398
+ ],
399
+ "metadata": {
400
+ "id": "paibxyeuxnu1"
401
+ },
402
+ "execution_count": null,
403
+ "outputs": []
404
+ },
405
+ {
406
+ "cell_type": "code",
407
+ "source": [
408
+ "# Creating examples from student answers to test the Wiki\n",
409
+ "\n",
410
+ "example1 = 'The most powerful aspect of machine learning is its ability to automate processes. If the goal is well defined and repeatable, an algorithm can be trained to perform that task far faster than any human and often with more reliability. Because of this, businesses can implement machine learning algorithms to solve problems human labor was previously required for, whether that be mental or physical labor. Although implementing machine learning often has high initial costs, because computers do not require payment outside of maintaining their operation, in the long run companies can save money by either making their workers’ tasks more efficient or by entirely automating tasks. This increases the profit margins for those firms.'\n",
411
+ "# B level\n",
412
+ "example2 ='ML systems can help improve access to data while managing compliance. It helps businesspeople deal with data in a more efficient way, since there is numerous data generated per second in such industry. If they use some of the simple classification models with high accuracy, it will save them a lot of time on data cleaning and validation, which are kind of repetitive and time-consuming. For example, NLP provides an easy way for personnel to query business information, understand business processes, and discover new relationships between business data, ideas based on intuition and insight often emerge. Using models to do prediction helps people making wiser decision. Since models can handle way more data than human, newly collected data can feed in the model and get some predictive result as a reference to the decision makers. This is significant because in this industry, time is precious, and traders must decide quickly and precisely. A little negligence will lead to a big mistake, lose a lot of money, and even affect the company\"s reputation. Models can see patterns that are not easy for human to spot, which is also valuable for modify the way people doing analysis and interpret.'\n",
413
+ "# C level\n",
414
+ "example3 = 'The machine learning model (or one a broader view, artificial intelligence) is about prediction. According to the lecture, there are tree main innovations in it. Prediction is cheap, more accurate and automated. As a result, armed with machine learning, businesses could automatically gain much more accurate and powerful capabilities in forecasting, leading to a big savings both in time and money.'\n",
415
+ "\n",
416
+ "# Randomized list of answers\n",
417
+ "training_answers = [example2, example1, example3]"
418
+ ],
419
+ "metadata": {
420
+ "id": "-fyY-ftV-ZbG"
421
+ },
422
+ "execution_count": null,
423
+ "outputs": []
424
+ },
425
+ {
426
+ "cell_type": "markdown",
427
+ "source": [
428
+ "## Grading based on A, B, and C-level answers from previous students to Question 4 from the [Wiki](https://github.com/vanderbilt-data-science/lo-achievement/wiki/Examples-of-great,-good,-and-poor-answers-to-questions):\n",
429
+ "\n",
430
+ "**Question 4:** Describe an example where machine learning is not likely to work well. This should be an example where intelligent, rational people could think of using machine learning or even are already implementing it. Describe specifically why you think ML is unlikely to work well. This should not be an example where someone is implementing it badly (i.e. skipping testing). Rather, it should be an example of how ML simply will not work well, no matter how many smart people work carefully on the problem. You may use examples we’ve discussed in class; it does not need to be a novel application."
431
+ ],
432
+ "metadata": {
433
+ "id": "sSnxqnGDmPzC"
434
+ }
435
+ },
436
+ {
437
+ "cell_type": "markdown",
438
+ "source": [
439
+ "### Creating .json file from case examples (Question 4)\n",
440
+ "The purpose of this cell is to create a json file based on the previously submitted, graded work of students based on the case file provided by Dr. Blocher in the Wiki."
441
+ ],
442
+ "metadata": {
443
+ "id": "RMkTwaEFmhWd"
444
+ }
445
+ },
446
+ {
447
+ "cell_type": "code",
448
+ "source": [
449
+ "# Context: question\n",
450
+ "\n",
451
+ "q4 = 'Question 4: Describe an example where machine learning is not likely to work well. This should be an example where intelligent, rational people could think of using machine learning or even are already implementing it. Describe specifically why you think ML is unlikely to work well. This should not be an example where someone is implementing it badly (i.e. skipping testing). Rather, it should be an example of how ML simply will not work well, no matter how many smart people work carefully on the problem. You may use examples we’ve discussed in class; it does not need to be a novel application.'"
452
+ ],
453
+ "metadata": {
454
+ "id": "_UVEgvzKmltw"
455
+ },
456
+ "execution_count": null,
457
+ "outputs": []
458
+ },
459
+ {
460
+ "cell_type": "code",
461
+ "source": [
462
+ "# A-level answers\n",
463
+ "\n",
464
+ "q4_A_answer_1 = 'This is one of the cases discussed in class: Machine learning on the management of at-risk children. Traditionally, at-risk children are mostly determined by social workers and they would report to their agencies if some children are at-risk. The at-risk children will subsequentially be taken into custody before their circumstance is evaluated and a decision about their future is reached. If a machine-learning algorithm were to replace the aforementioned social worker: the biggest concern would be when the algorithm outputs a false positive or false negative prediction. A false positive means that the children are said to be at risk while actually not and removed from their households. A false negative would be when some children are not removed from the dangerous household because the algorithm returned negative results. The costs of both incorrect decisions are too big for anyone to afford. Admittedly, data scientists might be able to develop a great algorithm with very low false positive and false negative rates, and that social workers can easily make the same amount of mistakes or more, but it is still too much for our humane society to take by putting such a life-changing or even life-or-death decision in a machine’s metaphorical hands.'\n",
465
+ "q4_A_answer_2 = 'I believe machine learning will not work well in the court system when making sentences and decisions. I have heard in the news that machine learning was helping judges sentence defenders. Currently, I am unaware if this is still happening but this will not work because our court system has a racial basis. This racial bias stems back hundreds of years and a machine-learning model would exploit the discrimination already so present in our system. The dataset used to train the model has bias that would unfairly sentence a person because of their demographic. This overall is bad because there is no dataset you can use to train a model that already does not include some sort of inherited bias.'\n",
466
+ "\n",
467
+ "# List creation\n",
468
+ "\n",
469
+ "q4_A_answers_list = [q4_A_answer_1, q4_A_answer_2]\n"
470
+ ],
471
+ "metadata": {
472
+ "id": "Irw0U9PBmupJ"
473
+ },
474
+ "execution_count": null,
475
+ "outputs": []
476
+ },
477
+ {
478
+ "cell_type": "code",
479
+ "source": [
480
+ "# B-level answers\n",
481
+ "\n",
482
+ "q4_B_answer_1 = 'I believe that machine learning won’t work well in the advent of self-driving cars. As discussed in class, machines are bad at “analogies, small data or rare events, and identifying causality” (Lecture 4: Data Strategy.) While most driving occurrences are habitual and can be predicted, especially day-to-day events like driving to work or the grocery store, so much driving is small or rare events that can’t necessarily be predicted. For example, you can’t predict that the person behind you is texting while driving and is going to rear end you, or that a snowstorm can make you start to hydroplane on the road. So much of the process of learning is understanding the exceptions, such as driving in the rain, at night, or almost avoiding an accident. For self-driving cars to be successful, so much data is required to reduce error, which I don’t think is possible to acquire. While most algorithms can have leeway in their accuracy, self-driving cars cannot have any errors — a difficult feat.'\n",
483
+ "q4_B_answer_2 = 'As we discussed in class, in a hospital setting, ML cannot do the task of diagnosing medical conditions well. While ML have been used in some diagnostic tasks, such as detecting some certain types of cancer, it generally to be considered difficult and will not be used in a large scale because medical conditions can have many different conditions can have many different symptoms and can be difficult to diagnose even for experienced doctors. The training of a machine learning model for diagnosing medical conditions requires a large dataset of medical records, along with labels indicating each diagnosis. But there are several barriers. First, medical records are often private and sensitive, and sharing them may raise legal and ethical concerns. Second, medical conditions are complicated, it’s hard to ensure the labels are correct, some of which may takes years or decades to testify. Even if they can recognize some pattern for some disease, symptoms vary from person to person. At a first glance, even after running some tests, it’s hard for doctors to tell the case. It will be harder to use ML in diagnose in general.'\n",
484
+ "\n",
485
+ "q4_B_answers_list = [q4_B_answer_1, q4_B_answer_2]"
486
+ ],
487
+ "metadata": {
488
+ "id": "haKh_FsenmT3"
489
+ },
490
+ "execution_count": null,
491
+ "outputs": []
492
+ },
493
+ {
494
+ "cell_type": "code",
495
+ "source": [
496
+ "# C-level answers\n",
497
+ "\n",
498
+ "q4_C_answer_1 = 'One example would be unstable models. Some of the models frequently exhibit extreme instability and degrade over time. In such circumstances, the company can call for model monitoring and high-frequency model revision. Businesses may begin to revert to intuition-based approach when model creation lead times increase.'\n",
499
+ "q4_C_answer_2 = 'The example I want to describe is about GPS. We have heard that some drivers who blindly followed their GPS into disaster. The news reported that Three tourists from Bellevue, Washington, got lost after midnight in 2011 when they were unable to find their way back to the hotel. They took what they believed to be a road that would take them to the freeway after requesting the GPS to reroute them. Instead, their SUV submerged into a large body of water. These people trust algorithms and data more than their own logic and judgement. People who drive into the water just because they believe too much in GPS. Algorithms will never think like a human being and cannot make moral judgement like a human being. Machine learning is unable to provide any guidance on the accepted norms. Machine minds & human minds could never be the same. Everyone’s answer is all different, We cannot collect everyone’s information around the world, that is so hard. As long as there is one person’s information not counted, then it will have difference from what you predict. The world always require human’s judgement and ethics, that is the point can not be changed.'\n",
500
+ "q4_C_answer_3 = 'The organization and the abundance of your data is paramount in the applying machine learning. When your data are either incomplete, like not enough data, or unstructured, like messy data with a lot of missingness and inconsistencies, you are not likely to obtain a good result even from implementing the best model in the world. There are some techniques that you might be able to amplify your data or transfer other data into your own, such as transfer learning or domain adaptation. However, it is still strongly recommended that you should have plenty of data to perform machine learning on. Your data should also be able to well represent the question you are trying to answer. Low coverage of the true target population is likely to result in skewness of the result. Machines learning is not likely to work well in this scenario.'\n",
501
+ "\n",
502
+ "# List creation\n",
503
+ "\n",
504
+ "q4_C_answers_list = [q4_C_answer_1, q4_C_answer_2, q4_C_answer_3]"
505
+ ],
506
+ "metadata": {
507
+ "id": "YNHULek7oCE_"
508
+ },
509
+ "execution_count": null,
510
+ "outputs": []
511
+ },
512
+ {
513
+ "cell_type": "code",
514
+ "source": [
515
+ "q4_Q_and_A = [\"Question:\", q4, \"A-level Answers\", q4_A_answers_list, \"B-level Answers\", q4_B_answers_list, \"C-level Answers\", q4_C_answers_list]"
516
+ ],
517
+ "metadata": {
518
+ "id": "9bXSVmPnojIh"
519
+ },
520
+ "execution_count": null,
521
+ "outputs": []
522
+ },
523
+ {
524
+ "cell_type": "code",
525
+ "source": [
526
+ "import json\n",
527
+ "from google.colab import files\n",
528
+ "\n",
529
+ "def save_example_answers(examples, filename='wiki_ABC_Q4examples.json'):\n",
530
+ " with open(filename, 'w') as file:\n",
531
+ " json.dump(examples, file)\n",
532
+ " files.download(filename)\n",
533
+ "\n",
534
+ "save_example_answers(q4_Q_and_A)"
535
+ ],
536
+ "metadata": {
537
+ "colab": {
538
+ "base_uri": "https://localhost:8080/",
539
+ "height": 17
540
+ },
541
+ "outputId": "3e565b3d-804c-4b5e-acc8-efeb955c6c14",
542
+ "id": "Qqwk0mvroqWz"
543
+ },
544
+ "execution_count": null,
545
+ "outputs": [
546
+ {
547
+ "output_type": "display_data",
548
+ "data": {
549
+ "text/plain": [
550
+ "<IPython.core.display.Javascript object>"
551
+ ],
552
+ "application/javascript": [
553
+ "\n",
554
+ " async function download(id, filename, size) {\n",
555
+ " if (!google.colab.kernel.accessAllowed) {\n",
556
+ " return;\n",
557
+ " }\n",
558
+ " const div = document.createElement('div');\n",
559
+ " const label = document.createElement('label');\n",
560
+ " label.textContent = `Downloading \"${filename}\": `;\n",
561
+ " div.appendChild(label);\n",
562
+ " const progress = document.createElement('progress');\n",
563
+ " progress.max = size;\n",
564
+ " div.appendChild(progress);\n",
565
+ " document.body.appendChild(div);\n",
566
+ "\n",
567
+ " const buffers = [];\n",
568
+ " let downloaded = 0;\n",
569
+ "\n",
570
+ " const channel = await google.colab.kernel.comms.open(id);\n",
571
+ " // Send a message to notify the kernel that we're ready.\n",
572
+ " channel.send({})\n",
573
+ "\n",
574
+ " for await (const message of channel.messages) {\n",
575
+ " // Send a message to notify the kernel that we're ready.\n",
576
+ " channel.send({})\n",
577
+ " if (message.buffers) {\n",
578
+ " for (const buffer of message.buffers) {\n",
579
+ " buffers.push(buffer);\n",
580
+ " downloaded += buffer.byteLength;\n",
581
+ " progress.value = downloaded;\n",
582
+ " }\n",
583
+ " }\n",
584
+ " }\n",
585
+ " const blob = new Blob(buffers, {type: 'application/binary'});\n",
586
+ " const a = document.createElement('a');\n",
587
+ " a.href = window.URL.createObjectURL(blob);\n",
588
+ " a.download = filename;\n",
589
+ " div.appendChild(a);\n",
590
+ " a.click();\n",
591
+ " div.remove();\n",
592
+ " }\n",
593
+ " "
594
+ ]
595
+ },
596
+ "metadata": {}
597
+ },
598
+ {
599
+ "output_type": "display_data",
600
+ "data": {
601
+ "text/plain": [
602
+ "<IPython.core.display.Javascript object>"
603
+ ],
604
+ "application/javascript": [
605
+ "download(\"download_8ec6f24c-9653-4335-8c1b-766926434399\", \"wiki_ABC_Q2examples.json\", 5931)"
606
+ ]
607
+ },
608
+ "metadata": {}
609
+ }
610
+ ]
611
+ },
612
+ {
613
+ "cell_type": "markdown",
614
+ "source": [
615
+ "## Grading based on A, B, and C-level answers from previous students to Question 5 from the [Wiki](https://github.com/vanderbilt-data-science/lo-achievement/wiki/Examples-of-great,-good,-and-poor-answers-to-questions):\n",
616
+ "\n",
617
+ "**Question 5:** Describe a new example where you think machine learning could work. This cannot be examples we’ve given in class like customer churn, fraud detection, image or object recognition, disease diagnosis, stock prediction, self-driving cars, etc. t does not need to be completely novel, e.g., you can look for news stories and find something new. I want something that is different from the commonly used examples. You may use an example that you or another student brought up in class if it meets these criteria. Be sure to explain why you think it is an example where ML could work. Note: You do not need to be proven right that ML works. I am mostly interested in your logic and arguments of why you think it is a good idea, drawing on what we’ve discussed in class."
618
+ ],
619
+ "metadata": {
620
+ "id": "RfofiUreov7C"
621
+ }
622
+ },
623
+ {
624
+ "cell_type": "markdown",
625
+ "source": [
626
+ "### Creating .json file from case examples (Question 5)\n",
627
+ "The purpose of this cell is to create a json file based on the previously submitted, graded work of students based on the case file provided by Dr. Blocher in the Wiki."
628
+ ],
629
+ "metadata": {
630
+ "id": "P2h8XtXQo6eO"
631
+ }
632
+ },
633
+ {
634
+ "cell_type": "code",
635
+ "source": [
636
+ "# Context: question\n",
637
+ "\n",
638
+ "q5 = 'Question 5: Describe a new example where you think machine learning could work. This cannot be examples we’ve given in class like customer churn, fraud detection, image or object recognition, disease diagnosis, stock prediction, self-driving cars, etc. t does not need to be completely novel, e.g., you can look for news stories and find something new. I want something that is different from the commonly used examples. You may use an example that you or another student brought up in class if it meets these criteria. Be sure to explain why you think it is an example where ML could work. Note: You do not need to be proven right that ML works. I am mostly interested in your logic and arguments of why you think it is a good idea, drawing on what we’ve discussed in class.'"
639
+ ],
640
+ "metadata": {
641
+ "id": "evpoXMUSpCi4"
642
+ },
643
+ "execution_count": null,
644
+ "outputs": []
645
+ },
646
+ {
647
+ "cell_type": "code",
648
+ "source": [
649
+ "# A-level answers\n",
650
+ "\n",
651
+ "q5_A_answer_1 = 'As I recall, our group discussed that using ML to do intrusion detection for computers. Nowadays, people’s information is digitalized and stored in databases. Since databases are also servers (i.e., computers), there must be some vulnerability. Software is more common to do detection for now, but it can just recognize the intrusion pattern that already known or preset by the programmer. However, this can be accomplished by employing algorithms that can analyze large amounts of data and detect anomalies that may indicate an attempted intrusion. We can feed ML model with network traffic, labeled or not labeled, system logs to let it learn and identify unusual patterns or behavior. With well-learned models, we can use them to find out signs of unauthorized access or even malware. One of the greatest things is that ML model may be able to find some hidden pattern that human has not and predict how and when the intrusion will take place, or even figure out how the malware will function to attach. This trait is one of the most important traits that distinguish ML model from regular software.'\n",
652
+ "q5_A_answer_2 = 'Machine learning can be used in NLP (natural language processing). It is very useful to deal with massive unstructured text data. For example, we want to build a machine learning model to reflect the personality of Einstein according to the letters he wrote in the past decades to his family and friends. We can use machine learning models for NLP to transfer the unstructured text data to usable data first. Then we will use these data to train the model. After training the model by the letters Einstein wrote in the past decades, the model will have the same or similar personality as Einstein. And now we input the test letters into the model again, we expect the model will give us the sentiment scores of each letter (for example: humor-0.732, joyful-0.193, fear-0.015, sad-0.002). After we got the scores for each letter, we use statistical methods to analyze the scores and we will finally get an aggregate personality of Einstein.'\n",
653
+ "q5_A_answer_3 = 'The sports world is a place where I generally think machine learning can be well-utilized, but not in all cases. Something I’m thinking about is fantasy football, in which football fans can draft a team and decide who to “start” on their team for a given week and who to “sit.” NFL pundits and commentators can absolutely give ideas on who would be optimal, and do a great job at it, but when it comes to actual predictions for how many points a particular player might score in a given week, based on opponent average stats, expected workload, and other factors, while also measuring that against every other potential player, having an algorithm to back this up is extremely helpful. Sports betting can also benefit from these types of algorithms, and being able to accurately predict something as volatile as sports could have major business (and fandom) implications.'\n",
654
+ "\n",
655
+ "# List creation\n",
656
+ "\n",
657
+ "q5_A_answers_list = [q5_A_answer_1, q5_A_answer_2, q5_A_answer_3]\n"
658
+ ],
659
+ "metadata": {
660
+ "id": "B9abYhEupKA_"
661
+ },
662
+ "execution_count": null,
663
+ "outputs": []
664
+ },
665
+ {
666
+ "cell_type": "code",
667
+ "source": [
668
+ "# B-level answers\n",
669
+ "\n",
670
+ "q5_B_answer_1 = 'Machine learning can work well when it comes to finance organization, planning, and tracking. Since machine learning works well with statistical inference, it’s useful in decision making. For things such as planning and organizing, one example that machine learning can work well in is identifying price values for toiletries, groceries, and necessity items across various store within a specific location and output the best location for the user. We know that certain retailers mark up prices for the items on their shelfs. Some items are more expensive in one location than others and often time, certain prices are not posted online. In addition to that, traveling to different stores and compare prices are very inefficient because it can waste gas and time. To solve the problem, the algorithm will examine the user’s location and the item(s) that the user wants and output the best stores for the items. How this item work is that they take in data from all the closest stores, rank each item based on value, distance, and availability. On top of that, if users compile a lists of necessity items and output them in the model, the model can use data and compare stores that have more than one item on the list plus the other features (value, distance, and availability), and recommend the place, or places if item are not available, that could cut down time and expense for the users.'\n",
671
+ "q5_B_answer_2 = 'Natural gas consumption has rebounded in the last year, with economic recovery and increased extreme weather leading to increased demand for natural gas, resulting in a tight market and soaring prices. Gas supply agencies want to use ML to predict the deliverability of natural gas storage in depleted reservoirs. First, natural gas is most commonly stored in underground formations: 1) depleted reservoirs, 2) aquifers, and 3) salt caverns. A depleted reservoir must have some elements of a good oil and gas formation to be converted to a subsurface natural gas reservoir, such as good porosity and permeability, the presence of good seal rock, and the presence of cap rock. I think this thing can be achieved with ML because there is a huge amount of data collected from natural gas extraction sites around the world since 1915, and this data provides a good basis and realizability for our research. We can test and build models from this data so that we can use ML to make predictions.'\n",
672
+ "\n",
673
+ "# List creation\n",
674
+ "\n",
675
+ "q5_B_answers_list = [q5_B_answer_1, q5_B_answer_2]"
676
+ ],
677
+ "metadata": {
678
+ "id": "r0_Tt0lspkQ-"
679
+ },
680
+ "execution_count": null,
681
+ "outputs": []
682
+ },
683
+ {
684
+ "cell_type": "code",
685
+ "source": [
686
+ "# C-level answers\n",
687
+ "\n",
688
+ "q5_C_answer_1 = 'One example where ML would work in is speech recognition, converting speech into text.'\n",
689
+ "q5_C_answer_2 = 'ML may work well with education industry. Teachers can utilize ML for lesson preparation, such as analyze academic data, recommendation of high-quality teaching resources, on-demand generation of teaching plans. Others might be like intelligent teaching assistants instead of an actual person TA, or dynamically adjust the teaching content based on comprehensive factors such as learning situation analysis and learner style, and student feedback.'\n",
690
+ "\n",
691
+ "# List creation\n",
692
+ "\n",
693
+ "q5_C_answers_list = [q5_C_answer_1, q5_C_answer_2]"
694
+ ],
695
+ "metadata": {
696
+ "id": "oRlId1mtp46L"
697
+ },
698
+ "execution_count": null,
699
+ "outputs": []
700
+ },
701
+ {
702
+ "cell_type": "code",
703
+ "source": [
704
+ "q5_Q_and_A = [\"Question:\", q5, \"A-level Answers\", q5_A_answers_list, \"B-level Answers\", q5_B_answers_list, \"C-level Answers\", q5_C_answers_list]"
705
+ ],
706
+ "metadata": {
707
+ "id": "XJC2UNxSqHBp"
708
+ },
709
+ "execution_count": null,
710
+ "outputs": []
711
+ },
712
+ {
713
+ "cell_type": "code",
714
+ "source": [
715
+ "import json\n",
716
+ "from google.colab import files\n",
717
+ "\n",
718
+ "def save_example_answers(examples, filename='wiki_ABC_Q5examples.json'):\n",
719
+ " with open(filename, 'w') as file:\n",
720
+ " json.dump(examples, file)\n",
721
+ " files.download(filename)\n",
722
+ "\n",
723
+ "save_example_answers(q5_Q_and_A)"
724
+ ],
725
+ "metadata": {
726
+ "colab": {
727
+ "base_uri": "https://localhost:8080/",
728
+ "height": 17
729
+ },
730
+ "outputId": "3e565b3d-804c-4b5e-acc8-efeb955c6c14",
731
+ "id": "MRcW9_1MqPG-"
732
+ },
733
+ "execution_count": null,
734
+ "outputs": [
735
+ {
736
+ "output_type": "display_data",
737
+ "data": {
738
+ "text/plain": [
739
+ "<IPython.core.display.Javascript object>"
740
+ ],
741
+ "application/javascript": [
742
+ "\n",
743
+ " async function download(id, filename, size) {\n",
744
+ " if (!google.colab.kernel.accessAllowed) {\n",
745
+ " return;\n",
746
+ " }\n",
747
+ " const div = document.createElement('div');\n",
748
+ " const label = document.createElement('label');\n",
749
+ " label.textContent = `Downloading \"${filename}\": `;\n",
750
+ " div.appendChild(label);\n",
751
+ " const progress = document.createElement('progress');\n",
752
+ " progress.max = size;\n",
753
+ " div.appendChild(progress);\n",
754
+ " document.body.appendChild(div);\n",
755
+ "\n",
756
+ " const buffers = [];\n",
757
+ " let downloaded = 0;\n",
758
+ "\n",
759
+ " const channel = await google.colab.kernel.comms.open(id);\n",
760
+ " // Send a message to notify the kernel that we're ready.\n",
761
+ " channel.send({})\n",
762
+ "\n",
763
+ " for await (const message of channel.messages) {\n",
764
+ " // Send a message to notify the kernel that we're ready.\n",
765
+ " channel.send({})\n",
766
+ " if (message.buffers) {\n",
767
+ " for (const buffer of message.buffers) {\n",
768
+ " buffers.push(buffer);\n",
769
+ " downloaded += buffer.byteLength;\n",
770
+ " progress.value = downloaded;\n",
771
+ " }\n",
772
+ " }\n",
773
+ " }\n",
774
+ " const blob = new Blob(buffers, {type: 'application/binary'});\n",
775
+ " const a = document.createElement('a');\n",
776
+ " a.href = window.URL.createObjectURL(blob);\n",
777
+ " a.download = filename;\n",
778
+ " div.appendChild(a);\n",
779
+ " a.click();\n",
780
+ " div.remove();\n",
781
+ " }\n",
782
+ " "
783
+ ]
784
+ },
785
+ "metadata": {}
786
+ },
787
+ {
788
+ "output_type": "display_data",
789
+ "data": {
790
+ "text/plain": [
791
+ "<IPython.core.display.Javascript object>"
792
+ ],
793
+ "application/javascript": [
794
+ "download(\"download_8ec6f24c-9653-4335-8c1b-766926434399\", \"wiki_ABC_Q2examples.json\", 5931)"
795
+ ]
796
+ },
797
+ "metadata": {}
798
+ }
799
+ ]
800
+ },
801
+ {
802
+ "cell_type": "markdown",
803
+ "source": [
804
+ " # Exploring with Langchain Example Sets, FewShotPrompts and Example Selectors\n",
805
+ " Here we're exploring another avenue for this goal to see if we can yield better results than the vector stores. Again, we are using question 2 from the Wiki to test."
806
+ ],
807
+ "metadata": {
808
+ "id": "PpKRr5Vw4_0F"
809
+ }
810
+ },
811
+ {
812
+ "cell_type": "code",
813
+ "source": [
814
+ "# Context: question\n",
815
+ "\n",
816
+ "q2 = 'Question 2: Why is machine learning so important for businesses? Answer this question generally (i.e. such that it applies to many or at least most businesses).'"
817
+ ],
818
+ "metadata": {
819
+ "id": "-kmMFUaLs_Q1"
820
+ },
821
+ "execution_count": 7,
822
+ "outputs": []
823
+ },
824
+ {
825
+ "cell_type": "code",
826
+ "source": [
827
+ "from langchain.prompts.few_shot import FewShotPromptTemplate\n",
828
+ "from langchain.prompts.prompt import PromptTemplate\n",
829
+ "\n",
830
+ "examples = [\n",
831
+ " {\n",
832
+ " \"question\": f\"\"\" Please grade the following student's answer to the question ({q2})\n",
833
+ " on an A-, B-, C-level grading scale: {q2_A_answer_1}.\"\"\",\n",
834
+ " \"answer\": \"This student should recieve an A-level grade.\"\n",
835
+ " },\n",
836
+ " {\n",
837
+ " \"question\": f\"\"\" Please grade the following student's answer to the question ({q2})\n",
838
+ " on an A-, B-, C-level grading scale: {q2_A_answer_2}.\"\"\",\n",
839
+ " \"answer\": \"The student should recieve an A-level grade.\"\n",
840
+ " },\n",
841
+ " {\n",
842
+ " \"question\": f\"\"\" Please grade the following student's answer to the question ({q2})\n",
843
+ " on an A-, B-, C-level grading scale: {q2_B_answer_1}.\"\"\",\n",
844
+ " \"answer\": \"This student should recieve a B-level grade.\"\n",
845
+ " },\n",
846
+ " {\n",
847
+ " \"question\": f\"\"\" Please grade the following student's answer to the question ({q2})\n",
848
+ " on an A-, B-, C-level grading scale: {q2_B_answer_2}.\"\"\",\n",
849
+ " \"answer\": \"This student should recieve a B-level grade.\"\n",
850
+ " },\n",
851
+ " {\n",
852
+ " \"question\": f\"\"\" Please grade the following student's answer to the question ({q2})\n",
853
+ " on an A-, B-, C-level grading scale: {q2_B_answer_3}.\"\"\",\n",
854
+ " \"answer\": \"This student should recieve a B-level grade.\"\n",
855
+ " },\n",
856
+ " {\n",
857
+ " \"question\": f\"\"\" Please grade the following student's answer to the question ({q2})\n",
858
+ " on an A-, B-, C-level grading scale: {q2_C_answer_1}.\"\"\",\n",
859
+ " \"answer\": \"This student should recieve a C-level grade.\"\n",
860
+ " },\n",
861
+ " {\n",
862
+ " \"question\": f\"\"\" Please grade the following student's answer to the question ({q2})\n",
863
+ " on an A-, B-, C-level grading scale: {q2_C_answer_2}.\"\"\",\n",
864
+ " \"answer\": \"This student should recieve a C-level grade.\"\n",
865
+ " }\n",
866
+ "]"
867
+ ],
868
+ "metadata": {
869
+ "id": "1idnQLYW-697"
870
+ },
871
+ "execution_count": 12,
872
+ "outputs": []
873
+ },
874
+ {
875
+ "cell_type": "code",
876
+ "source": [
877
+ "# Example selector\n",
878
+ "from langchain.prompts.example_selector import SemanticSimilarityExampleSelector, MaxMarginalRelevanceExampleSelector, NGramOverlapExampleSelector\n",
879
+ "from langchain.vectorstores import Chroma\n",
880
+ "from langchain.embeddings import OpenAIEmbeddings\n",
881
+ "\n",
882
+ "\n",
883
+ "example_selector_semantic = SemanticSimilarityExampleSelector.from_examples(\n",
884
+ " # This is the list of examples available to select from.\n",
885
+ " examples,\n",
886
+ " # This is the embedding class used to produce embeddings which are used to measure semantic similarity.\n",
887
+ " OpenAIEmbeddings(),\n",
888
+ " # This is the VectorStore class that is used to store the embeddings and do a similarity search over.\n",
889
+ " Chroma,\n",
890
+ " # This is the number of examples to produce.\n",
891
+ " k=1\n",
892
+ ")\n",
893
+ "\n",
894
+ "example_selector_mmr = MaxMarginalRelevanceExampleSelector.from_examples(\n",
895
+ " # This is the list of examples available to select from.\n",
896
+ " examples,\n",
897
+ " # This is the embedding class used to produce embeddings which are used to measure semantic similarity.\n",
898
+ " OpenAIEmbeddings(),\n",
899
+ " # This is the VectorStore class that is used to store the embeddings and do a similarity search over.\n",
900
+ " Chroma,\n",
901
+ " # This is the number of examples to produce.\n",
902
+ " k=1,\n",
903
+ ")\n",
904
+ "\n",
905
+ "example_selector_ngram = NGramOverlapExampleSelector(\n",
906
+ " # These are the examples it has available to choose from.\n",
907
+ " examples=examples,\n",
908
+ " # This is the PromptTemplate being used to format the examples.\n",
909
+ " example_prompt=example_prompt,\n",
910
+ " # This is the threshold, at which selector stops.\n",
911
+ " # It is set to -1.0 by default.\n",
912
+ " threshold=-1.0,\n",
913
+ " # For negative threshold:\n",
914
+ " # Selector sorts examples by ngram overlap score, and excludes none.\n",
915
+ " # For threshold greater than 1.0:\n",
916
+ " # Selector excludes all examples, and returns an empty list.\n",
917
+ " # For threshold equal to 0.0:\n",
918
+ " # Selector sorts examples by ngram overlap score,\n",
919
+ " # and excludes those with no ngram overlap with input.\n",
920
+ ")\n",
921
+ "\n"
922
+ ],
923
+ "metadata": {
924
+ "id": "Y7vNiHKhuIjm"
925
+ },
926
+ "execution_count": 15,
927
+ "outputs": []
928
+ },
929
+ {
930
+ "cell_type": "code",
931
+ "source": [
932
+ "example_prompt = PromptTemplate(input_variables=[\"question\", \"answer\"], template=\"Question: {question}\\n{answer}\")\n",
933
+ "\n",
934
+ "print(example_prompt.format(**examples[0]))"
935
+ ],
936
+ "metadata": {
937
+ "id": "x2dCmQdACwGi"
938
+ },
939
+ "execution_count": null,
940
+ "outputs": []
941
+ },
942
+ {
943
+ "cell_type": "code",
944
+ "source": [
945
+ "prompt = FewShotPromptTemplate(\n",
946
+ " #examples=examples,\n",
947
+ " example_selector = example_selector_semantic,\n",
948
+ " example_prompt = example_prompt,\n",
949
+ " suffix=\"Question: {input}\",\n",
950
+ " input_variables=[\"input\"]\n",
951
+ ")\n",
952
+ "\n",
953
+ "print(prompt.format(input=f\"\"\" Please grade the following student's answer to the question ({q2})\n",
954
+ " on an A-, B-, C-level grading scale: {example1}.\"\"\"))"
955
+ ],
956
+ "metadata": {
957
+ "id": "Uc_BvY0rBZgx"
958
+ },
959
+ "execution_count": null,
960
+ "outputs": []
961
+ },
962
+ {
963
+ "cell_type": "code",
964
+ "source": [
965
+ "prompt_semantic = FewShotPromptTemplate(\n",
966
+ " #examples=examples,\n",
967
+ " example_selector = example_selector_semantic,\n",
968
+ " example_prompt = example_prompt,\n",
969
+ " suffix=\"Question: {input}\",\n",
970
+ " input_variables=[\"input\"]\n",
971
+ ")\n",
972
+ "\n",
973
+ "prompt_mmr = FewShotPromptTemplate(\n",
974
+ " #examples=examples,\n",
975
+ " example_selector = example_selector_mmr,\n",
976
+ " example_prompt = example_prompt,\n",
977
+ " suffix=\"Question: {input}\",\n",
978
+ " input_variables=[\"input\"]\n",
979
+ ")\n",
980
+ "\n",
981
+ "prompt_ngram = FewShotPromptTemplate(\n",
982
+ " #examples=examples,\n",
983
+ " example_selector = example_selector_ngram,\n",
984
+ " example_prompt = example_prompt,\n",
985
+ " suffix=\"Question: {input}\",\n",
986
+ " input_variables=[\"input\"]\n",
987
+ ")"
988
+ ],
989
+ "metadata": {
990
+ "id": "k1laCoAPxhcH"
991
+ },
992
+ "execution_count": 18,
993
+ "outputs": []
994
+ },
995
+ {
996
+ "cell_type": "code",
997
+ "source": [
998
+ "# First using the Example Set to test without an example selector, just the base OpenAI model.\n",
999
+ "from langchain.llms import OpenAI\n",
1000
+ "base_llm = OpenAI()\n",
1001
+ "base_llm(prompt.format(input=f\"\"\" Please grade the following students' answers to the question ({q2})\n",
1002
+ " on an A-, B-, C-level grading scale: student 1: {example1}, student 2: {example2}, and student 3: {example3}. Grade based on the following advice from the professor:\n",
1003
+ " One aspect of it was being specific. The poor answers (C) have a lot of platitudes, the better answers give\n",
1004
+ " specific examples (B and A). Secondly, they should discuss automation and/or prediction specifically. Those are\n",
1005
+ " the things that ML does (A and B answers), it is not 'technology' broadly (what a C answer might say).\n",
1006
+ " The difference between A and B answers should be in their complexity, specificity, thoughfulness\n",
1007
+ " , and general insight on the subject. However, keep in mind that these are short answer questions.\n",
1008
+ " The number of examples or the legnth of the response is not as important as the content itself. After assigning your grade, provide a brief\n",
1009
+ " explanation on your reasoning for assigning the grade based on the professor's advice, and\n",
1010
+ " what differentiated the answers from the higher grade level (if applicable). There should be three grades assigned in your response, and assign based on the order the\n",
1011
+ " answers were passed in.\"\"\"))"
1012
+ ],
1013
+ "metadata": {
1014
+ "id": "rUdx2axQuec1"
1015
+ },
1016
+ "execution_count": null,
1017
+ "outputs": []
1018
+ },
1019
+ {
1020
+ "cell_type": "code",
1021
+ "source": [
1022
+ "# Example Selector 1:\n",
1023
+ "\n",
1024
+ "base_llm(prompt_mmr.format(input=f\"\"\" Please grade the following students' answers to the question ({q2})\n",
1025
+ " on an A-, B-, C-level grading scale: student 1: {example1}, student 2: {example2}, and student 3: {example3}. Grade based on the following advice from the professor:\n",
1026
+ " One aspect of it was being specific. The poor answers (C) have a lot of platitudes, the better answers give\n",
1027
+ " specific examples (B and A). Secondly, they should discuss automation and/or prediction specifically. Those are\n",
1028
+ " the things that ML does (A and B answers), it is not 'technology' broadly (what a C answer might say).\n",
1029
+ " The difference between A and B answers should be in their complexity, specificity, thoughfulness\n",
1030
+ " , and general insight on the subject. However, keep in mind that these are short answer questions.\n",
1031
+ " The number of examples or the legnth of the response is not as important as the content itself. After assigning your grade, provide a brief\n",
1032
+ " explanation on your reasoning for assigning the grade based on the professor's advice, and\n",
1033
+ " what differentiated the answers from the higher grade level (if applicable). There should be three grades assigned in your response, and assign based on the order the\n",
1034
+ " answers were passed in.\"\"\"))"
1035
+ ],
1036
+ "metadata": {
1037
+ "id": "fe24IOslyITA"
1038
+ },
1039
+ "execution_count": null,
1040
+ "outputs": []
1041
+ },
1042
+ {
1043
+ "cell_type": "code",
1044
+ "source": [
1045
+ "# Example selector 2:\n",
1046
+ "\n",
1047
+ "base_llm(prompt_ngram.format(input=f\"\"\" Please grade the following students' answers to the question ({q2})\n",
1048
+ " on an A-, B-, C-level grading scale: student 1: {example1}, student 2: {example2}, and student 3: {example3}. Grade based on the following advice from the professor:\n",
1049
+ " One aspect of it was being specific. The poor answers (C) have a lot of platitudes, the better answers give\n",
1050
+ " specific examples (B and A). Secondly, they should discuss automation and/or prediction specifically. Those are\n",
1051
+ " the things that ML does (A and B answers), it is not 'technology' broadly (what a C answer might say).\n",
1052
+ " The difference between A and B answers should be in their complexity, specificity, thoughfulness\n",
1053
+ " , and general insight on the subject. However, keep in mind that these are short answer questions.\n",
1054
+ " The number of examples or the legnth of the response is not as important as the content itself. After assigning your grade, provide a brief\n",
1055
+ " explanation on your reasoning for assigning the grade based on the professor's advice, and\n",
1056
+ " what differentiated the answers from the higher grade level (if applicable). There should be three grades assigned in your response, and assign based on the order the\n",
1057
+ " answers were passed in.\"\"\"))"
1058
+ ],
1059
+ "metadata": {
1060
+ "id": "dXwJa30wyL4S"
1061
+ },
1062
+ "execution_count": null,
1063
+ "outputs": []
1064
+ },
1065
+ {
1066
+ "cell_type": "markdown",
1067
+ "source": [
1068
+ "### Results from the ngram example selector with the following prompt (yielded the best results (relatively):\n",
1069
+ "\n",
1070
+ "\n",
1071
+ "\n",
1072
+ "```\n",
1073
+ "input=f\"\"\" Please grade the following student's answer to the question ({q2})\n",
1074
+ " on an A-, B-, C-level grading scale: {example3}. Grade based on the following advice from the professor:\n",
1075
+ " One aspect of it was being specific. The poor answers have a lot of platitudes, the better answers give\n",
1076
+ " specific examples. Secondly, they should discuss automation and/or prediction specifically. Those are\n",
1077
+ " the things that ML does, it is not 'technology' broadly. The best answers\n",
1078
+ " (A) should be more complex and specific in answering the question and meeting that criteria\n",
1079
+ " than the better answers (B), though length may not be a determinant. After assigning your grade, provide a brief\n",
1080
+ " explanation on your reasoning for assigning the grade based on the professor's advice.\"\"\"\n",
1081
+ "```\n",
1082
+ "\n",
1083
+ "\n",
1084
+ "\n",
1085
+ "1. Example 1 (A-level grade): \"This student should recieve a B-level grade. The student's answer includes specific examples of how machine learning can be used in business to increase efficiency and profit margins, as well as how machine learning can automate processes previously done by humans. The student also mentions the importance of prediction in machine learning, which is one of its core functions.\" (incorrect)\n",
1086
+ "2. Example 2 (B-level grade): \"This student should recieve a B-level grade. The student's answer provides specific examples of how ML can be used in business such as NLP and predictive analysis. The student also references automation and prediction which are key aspects of ML.\" (correct)\n",
1087
+ "3. Example 3 (C-level grade): \"This student should recieve a B-level grade. The student's answer provides a specific example about how machine learning can help businesses save time and money by providing more accurate predictions. It also mentions the three main innovations that machine learning brings to the table (prediction is cheap, more accurate, and automated).\" (incorrect)"
1088
+ ],
1089
+ "metadata": {
1090
+ "id": "bKqtzH8hcoHD"
1091
+ }
1092
+ },
1093
+ {
1094
+ "cell_type": "markdown",
1095
+ "source": [
1096
+ "# Exploring Chat Prompt Templates and System/Human Messages\n",
1097
+ "This section is unfinalized as we are not sure if it is neccessary for the goals of this project, and we ran into errors which we were unsuccessful in debugging. This approach may yield better results but would need to be further devloped and debugged."
1098
+ ],
1099
+ "metadata": {
1100
+ "id": "YGWq7cw697Wv"
1101
+ }
1102
+ },
1103
+ {
1104
+ "cell_type": "code",
1105
+ "source": [
1106
+ "from langchain.chat_models import ChatOpenAI\n",
1107
+ "from langchain import PromptTemplate, LLMChain\n",
1108
+ "from langchain.prompts.chat import (\n",
1109
+ " ChatPromptTemplate,\n",
1110
+ " SystemMessagePromptTemplate,\n",
1111
+ " AIMessagePromptTemplate,\n",
1112
+ " HumanMessagePromptTemplate,\n",
1113
+ ")\n",
1114
+ "from langchain.schema import AIMessage, HumanMessage, SystemMessage\n",
1115
+ "\n",
1116
+ "\n",
1117
+ "\n",
1118
+ "from langchain.prompts import (\n",
1119
+ " ChatPromptTemplate,\n",
1120
+ " PromptTemplate,\n",
1121
+ " SystemMessagePromptTemplate,\n",
1122
+ " AIMessagePromptTemplate,\n",
1123
+ " HumanMessagePromptTemplate,\n",
1124
+ ")\n",
1125
+ "from langchain.schema import (\n",
1126
+ " AIMessage,\n",
1127
+ " HumanMessage,\n",
1128
+ " SystemMessage\n",
1129
+ ")"
1130
+ ],
1131
+ "metadata": {
1132
+ "id": "NzK7Yo_OEqlG"
1133
+ },
1134
+ "execution_count": null,
1135
+ "outputs": []
1136
+ },
1137
+ {
1138
+ "cell_type": "code",
1139
+ "source": [
1140
+ "template=\"You are a helpful assistant that grades student responses ({answer}) to questions ({question}) about machine learning.\"\n",
1141
+ "system_message_prompt = SystemMessagePromptTemplate.from_template(template)\n",
1142
+ "human_template=\"{text}\"\n",
1143
+ "human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)"
1144
+ ],
1145
+ "metadata": {
1146
+ "id": "XEsYhpKMFcud"
1147
+ },
1148
+ "execution_count": null,
1149
+ "outputs": []
1150
+ },
1151
+ {
1152
+ "cell_type": "code",
1153
+ "source": [
1154
+ "fprompt=PromptTemplate(\n",
1155
+ " template=\"You are a helpful assistant that grades student responses ({answer}) to questions ({question}) about machine learning.\",\n",
1156
+ " input_variables=[\"question\", \"answer\"],\n",
1157
+ ")\n",
1158
+ "system_message_prompt = SystemMessagePromptTemplate(prompt=prompt)"
1159
+ ],
1160
+ "metadata": {
1161
+ "id": "En5SM-ZHCT8l"
1162
+ },
1163
+ "execution_count": null,
1164
+ "outputs": []
1165
+ },
1166
+ {
1167
+ "cell_type": "code",
1168
+ "source": [
1169
+ "chat_prompt = ChatPromptTemplate.from_messages([system_message_prompt, human_message_prompt])\n",
1170
+ "\n",
1171
+ "# get a chat completion from the formatted messages\n",
1172
+ "chat_prompt.format_prompt(input = [q2, example1], text= \"The student should recieve an A-level grade.\" ).to_messages()"
1173
+ ],
1174
+ "metadata": {
1175
+ "id": "W2zIxXxDFNIh"
1176
+ },
1177
+ "execution_count": null,
1178
+ "outputs": []
1179
+ },
1180
+ {
1181
+ "cell_type": "code",
1182
+ "source": [
1183
+ "example_human = SystemMessagePromptTemplate.from_template(\n",
1184
+ " \"Machine learning is awesome\", additional_kwargs={\"name\": \"example_user\"}\n",
1185
+ ")\n",
1186
+ "example_ai = SystemMessagePromptTemplate.from_template(\n",
1187
+ " \"This student should reieve a C-level grade.\", additional_kwargs={\"name\": \"example_assistant\"}\n",
1188
+ ")\n",
1189
+ "human_template = \"{text}\"\n",
1190
+ "human_message_prompt = HumanMessagePromptTemplate.from_template(human_template)"
1191
+ ],
1192
+ "metadata": {
1193
+ "id": "b_r2Y7YXTa0R"
1194
+ },
1195
+ "execution_count": null,
1196
+ "outputs": []
1197
+ },
1198
+ {
1199
+ "cell_type": "code",
1200
+ "source": [
1201
+ "chat = ChatOpenAI(temperature=0)"
1202
+ ],
1203
+ "metadata": {
1204
+ "id": "R2XA1z27cl6V"
1205
+ },
1206
+ "execution_count": null,
1207
+ "outputs": []
1208
+ },
1209
+ {
1210
+ "cell_type": "code",
1211
+ "source": [
1212
+ "chat_prompt = ChatPromptTemplate.from_messages(\n",
1213
+ " [system_message_prompt, example_human, example_ai, human_message_prompt]\n",
1214
+ ")\n",
1215
+ "chain = LLMChain(llm=chat, prompt=chat_prompt)\n",
1216
+ "# get a chat completion from the formatted messages\n",
1217
+ "text_dict = {'text': 'Please grade the following students response to question 2 based on the A-, B-, C-level grading scale'}\n",
1218
+ "chain.run(text_dict)"
1219
+ ],
1220
+ "metadata": {
1221
+ "id": "3WNMa70jVuf5"
1222
+ },
1223
+ "execution_count": null,
1224
+ "outputs": []
1225
+ },
1226
+ {
1227
+ "cell_type": "code",
1228
+ "source": [
1229
+ "\n",
1230
+ "query = f\"\"\" Please grade the following students answer: {example1} to the question ({q2}).\n",
1231
+ "The uploaded json should serve as as examples of A, B, and C level answers. In the document, the\n",
1232
+ "original question is printed, as well as examples of previous student answers that have recieved\n",
1233
+ "A, B, and C grades (labeled accordingly)\"\"\"\n",
1234
+ "grades = qa.run(query)\n",
1235
+ "print(grades)"
1236
+ ],
1237
+ "metadata": {
1238
+ "id": "naKuTxKa2-U8"
1239
+ },
1240
+ "execution_count": null,
1241
+ "outputs": []
1242
+ },
1243
+ {
1244
+ "cell_type": "markdown",
1245
+ "source": [
1246
+ "## Conclusions based on Question 2\n",
1247
+ "\n",
1248
+ "Using different types of chains and example selectors made minor differences, but the results were still not entirely correct, and also inconsistent. Using a persona prompt as the system message was semi-useful, as was adding the grading criteria to the prompt. But it did not remedy our problem.\n",
1249
+ "\n",
1250
+ "Moving forward:\n",
1251
+ "\n",
1252
+ "1. Grading criteria helped, but for longer/more complex questions, more context may be needed (i.e., more specifics on what seperates an A from B, B from C, etc. The model did not perform as well in distinguishing between these levels/did not seem to understand the nuances of the grading criteria).\n",
1253
+ "2. The model would likely perform better with a better example set/more context to be trained on.\n",
1254
+ "\n"
1255
+ ],
1256
+ "metadata": {
1257
+ "id": "C7kDC2pe7bNd"
1258
+ }
1259
+ }
1260
+ ],
1261
+ "metadata": {
1262
+ "colab": {
1263
+ "provenance": [],
1264
+ "include_colab_link": true
1265
+ },
1266
+ "kernelspec": {
1267
+ "display_name": "Python 3",
1268
+ "name": "python3"
1269
+ },
1270
+ "language_info": {
1271
+ "name": "python",
1272
+ "version": "3.10.6"
1273
+ }
1274
+ },
1275
+ "nbformat": 4,
1276
+ "nbformat_minor": 0
1277
+ }
instructor_intr_notebook_grading_training.ipynb ADDED
@@ -0,0 +1,737 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "id": "view-in-github",
7
+ "colab_type": "text"
8
+ },
9
+ "source": [
10
+ "<a href=\"https://colab.research.google.com/github/vanderbilt-data-science/lo-achievement/blob/adding_grading_levels_to_instructor_nb/instructor_intr_notebook_grading_training.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "markdown",
15
+ "metadata": {
16
+ "collapsed": true,
17
+ "id": "brzvVeAsYiG2"
18
+ },
19
+ "source": [
20
+ "<a href=\"https://colab.research.google.com/github/vanderbilt-data-science/lo-achievement/blob/main/instructor_intr_notebook.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
21
+ ]
22
+ },
23
+ {
24
+ "cell_type": "markdown",
25
+ "metadata": {
26
+ "id": "WMKrKfx8_3fc"
27
+ },
28
+ "source": [
29
+ "# Instructor Grading and Assessment\n",
30
+ "This notebook executes grading of student submissions based on the examples provided in the [Wiki](https://github.com/vanderbilt-data-science/lo-achievement/wiki/Examples-of-great,-good,-and-poor-answers-to-questions) from Dr. Jesse Blocher. In this iteration, we use the Unstructured File Loader, which cannot proccess .json files (the preferred format). We are working on finding a file loader that allows .json. In this version of the notebook, the model has only been trained on Question 2 from the notebook.\n",
31
+ "\n",
32
+ "To train the model, we used 2 out of the three student example from each grade brack and inputted into a .pdf with clearly defined levels. Then, we used the excluded answers to test the accuracy of the model's grading."
33
+ ]
34
+ },
35
+ {
36
+ "cell_type": "markdown",
37
+ "source": [
38
+ "## Grading based on A, B, and C-level answers from previous students to Question 2 from the [Wiki](https://github.com/vanderbilt-data-science/lo-achievement/wiki/Examples-of-great,-good,-and-poor-answers-to-questions):\n",
39
+ "\n",
40
+ "**Question 2:** Why is machine learning so important for businesses? Answer this question generally (i.e. such that it applies to many or at least most businesses)."
41
+ ],
42
+ "metadata": {
43
+ "id": "ZTkNQ-dL5iO5"
44
+ }
45
+ },
46
+ {
47
+ "cell_type": "markdown",
48
+ "source": [
49
+ "### Creating .json file from case examples (Question 2)\n",
50
+ "The purpose of this cell is to create a json file based on the previously submitted, graded work of students based on the case file provided by Dr. Blocher in the Wiki"
51
+ ],
52
+ "metadata": {
53
+ "id": "TYlGEusr64kA"
54
+ }
55
+ },
56
+ {
57
+ "cell_type": "code",
58
+ "source": [
59
+ "# Context: question\n",
60
+ "\n",
61
+ "q2 = 'Question 2: Why is machine learning so important for businesses? Answer this question generally (i.e. such that it applies to many or at least most businesses).'"
62
+ ],
63
+ "metadata": {
64
+ "id": "-kmMFUaLs_Q1"
65
+ },
66
+ "execution_count": 1,
67
+ "outputs": []
68
+ },
69
+ {
70
+ "cell_type": "code",
71
+ "source": [
72
+ "# A-level answers\n",
73
+ "\n",
74
+ "A_answer_1 = 'Machine learning is extremely important tool for businesses. It can be used in a variety of ways, but most importantly, it can be used to identify patterns within their data that might not otherwise be identified by human beings. For example, it can understand customer behaviors, optimize logistics, and expand efficiencies throughout the business. Machine learning does not get tired, meaning it can work as long as you want it to. It can sift through massive amounts of data, that no human being can look through in an efficient manner. Machine learning can be used as a tool to identify anomalies when something needs to be checked to save or gain money. The predictions that companies gain from machine learning are cheap, accurate, and automate. These machine learning algorithms can be brought to larger scales to encompass the whole business and its operations. It is important to note, Machine learning is just predictions. Predictions to understand important patterns that could make or break a company since they understand the patterns of their business more. It is an amazing tool, but should be used wisely and carefully because if not, it can expensive, useless, and straight up wrong.'\n",
75
+ "A_answer_2 = 'Machine learning is important for most of the sectors in business. Overall, it gives the company of an overview about what would be the trend for their business industry, and analyze the customer behavior to help business segment their customers groups. Today, many companies have a vast amount of information generated by behavior, computer, events, people, and devices. This massive amount of data is difficult for human to handle, and even if human manages it, it is not profitable as human labor is expensive. Thanks to machine learning, companies can utilize their in-house or even third-party data to make something useful for their business. In medical analysis, for example, with human, it takes a very long time to find patterns in thousands of MRI scans. On the other hand, machines can detect patterns in seconds by entering data as long as the information is correctly labeled or trained properly. Another example would be segmenting customer group. In marketing department, the business could use unsupervised machine learning to cluster their customer segments to generate personalized contents that are relevant for each of individuals.'\n",
76
+ "\n",
77
+ "# List creation\n",
78
+ "\n",
79
+ "A_answers_list = [A_answer_1, A_answer_2]\n"
80
+ ],
81
+ "metadata": {
82
+ "id": "yQT6aExSr1dP"
83
+ },
84
+ "execution_count": 2,
85
+ "outputs": []
86
+ },
87
+ {
88
+ "cell_type": "code",
89
+ "source": [
90
+ "# B-level answers\n",
91
+ "\n",
92
+ "B_answer_1 = 'Companies use ML models to improve different aspects of their business, like manufacturing, hiring, deployment, advertising, etc. The main goal is to improve productive and increase profitability of the company. The ML models are fed with company and externally available data to help the company optimize its departments and in turn become more financially successful/ productive. For example, using purchasing history, the company can predict who to advertise products to, to increase sales.'\n",
93
+ "B_answer_2 = 'Machine learning allows business to have automated decision, scale, predictive analysis and performance. Machine learning also helps a business have a data strategy. This is how a firm uses data, data infrastructure, governance, etc. to accomplish its strategic goals and maintain/grow their competitive advantage within their industry.'\n",
94
+ "B_answer_3 = 'The short answer is ML can help make decisions for businesses. To be clarified, ML does not make decisions for businesses. I mean it can, but people have not trusted ML enough yet and ML has not been that good to let it directly make business decisions. Business people only use ML to help themselves get intuitions of how decisions should be made and make predictions of results they might get based on their decisions. For example, if a business tries to launch a new product, it will use ML to test whether it will work or not on a small scale before it is introduced to a large scale. People called this step piloting. In this step, people collect data that is generated by using the pilot product and analyze their next move. They could tweak some features based on the feedback. If they think the data in their interest shows the product performs well, they might predict that this product will be successful when it is introduced on a large scale. Then, they will launch it.'\n",
95
+ "# List creation\n",
96
+ "\n",
97
+ "B_answers_list = [B_answer_1, B_answer_2, B_answer_3]"
98
+ ],
99
+ "metadata": {
100
+ "id": "KB1CmeRwtRvf"
101
+ },
102
+ "execution_count": 3,
103
+ "outputs": []
104
+ },
105
+ {
106
+ "cell_type": "code",
107
+ "source": [
108
+ "# C-level answers\n",
109
+ "\n",
110
+ "C_answer_1 = 'Machine learning powers many of the services we use today, such as the recommendation systems of Spotify and Netflix; search engines such as Google and Bing; social media such as TikTok and Instagram; voices such as Siri and Alexa, the list can go on. All these examples show that machine learning is already starting to play a pivotal role in today\"s data-rich world. Machines can help us sift through useful information that can lead to big breakthroughs, and we have seen the widespread use of this technology in various industries such as finance, healthcare, insurance, manufacturing, transformational change, etc.'\n",
111
+ "C_answer_3 = 'As technology advanced, there are tons of new data generated and stored. All industries experienced this surge in data, including business. There is a huge amount of business data stored and waited in the database of each firm and they need solutions to utilize these data. Machine learning is a very promising approach for firms to puts these data in and output a meaning pattern or result that could help the firms with their existing work. This could turn into a working product or provide insights that could enhance the efficiency of the company’s workflow. With machine learning, a firm could either enter a new market with the new product or save time and effort with the meaningful insights. Achieving these possibilities with the data they already owned is a low effort but high reward action. This is the reason machine learning is valued by many businesses recently.'\n",
112
+ "\n",
113
+ "# List creation\n",
114
+ "\n",
115
+ "C_answers_list = [C_answer_1, C_answer_3]"
116
+ ],
117
+ "metadata": {
118
+ "id": "3diAz43othjc"
119
+ },
120
+ "execution_count": 4,
121
+ "outputs": []
122
+ },
123
+ {
124
+ "cell_type": "code",
125
+ "source": [
126
+ "Q_and_A = [\"Question:\", q2, \"A-level Answers\", A_answers_list, \"B-level Answers\", B_answers_list, \"C-level Answers\", C_answers_list]"
127
+ ],
128
+ "metadata": {
129
+ "id": "oAnrMSU6u9do"
130
+ },
131
+ "execution_count": 5,
132
+ "outputs": []
133
+ },
134
+ {
135
+ "cell_type": "code",
136
+ "source": [
137
+ "import json\n",
138
+ "from google.colab import files\n",
139
+ "\n",
140
+ "def save_example_answers(examples, filename='wiki_ABC_Q2examples.json'):\n",
141
+ " with open(filename, 'w') as file:\n",
142
+ " json.dump(examples, file)\n",
143
+ " files.download(filename)\n",
144
+ "\n",
145
+ "save_example_answers(Q_and_A)"
146
+ ],
147
+ "metadata": {
148
+ "colab": {
149
+ "base_uri": "https://localhost:8080/",
150
+ "height": 17
151
+ },
152
+ "id": "B16iYMEnri9s",
153
+ "outputId": "3e565b3d-804c-4b5e-acc8-efeb955c6c14"
154
+ },
155
+ "execution_count": 6,
156
+ "outputs": [
157
+ {
158
+ "output_type": "display_data",
159
+ "data": {
160
+ "text/plain": [
161
+ "<IPython.core.display.Javascript object>"
162
+ ],
163
+ "application/javascript": [
164
+ "\n",
165
+ " async function download(id, filename, size) {\n",
166
+ " if (!google.colab.kernel.accessAllowed) {\n",
167
+ " return;\n",
168
+ " }\n",
169
+ " const div = document.createElement('div');\n",
170
+ " const label = document.createElement('label');\n",
171
+ " label.textContent = `Downloading \"${filename}\": `;\n",
172
+ " div.appendChild(label);\n",
173
+ " const progress = document.createElement('progress');\n",
174
+ " progress.max = size;\n",
175
+ " div.appendChild(progress);\n",
176
+ " document.body.appendChild(div);\n",
177
+ "\n",
178
+ " const buffers = [];\n",
179
+ " let downloaded = 0;\n",
180
+ "\n",
181
+ " const channel = await google.colab.kernel.comms.open(id);\n",
182
+ " // Send a message to notify the kernel that we're ready.\n",
183
+ " channel.send({})\n",
184
+ "\n",
185
+ " for await (const message of channel.messages) {\n",
186
+ " // Send a message to notify the kernel that we're ready.\n",
187
+ " channel.send({})\n",
188
+ " if (message.buffers) {\n",
189
+ " for (const buffer of message.buffers) {\n",
190
+ " buffers.push(buffer);\n",
191
+ " downloaded += buffer.byteLength;\n",
192
+ " progress.value = downloaded;\n",
193
+ " }\n",
194
+ " }\n",
195
+ " }\n",
196
+ " const blob = new Blob(buffers, {type: 'application/binary'});\n",
197
+ " const a = document.createElement('a');\n",
198
+ " a.href = window.URL.createObjectURL(blob);\n",
199
+ " a.download = filename;\n",
200
+ " div.appendChild(a);\n",
201
+ " a.click();\n",
202
+ " div.remove();\n",
203
+ " }\n",
204
+ " "
205
+ ]
206
+ },
207
+ "metadata": {}
208
+ },
209
+ {
210
+ "output_type": "display_data",
211
+ "data": {
212
+ "text/plain": [
213
+ "<IPython.core.display.Javascript object>"
214
+ ],
215
+ "application/javascript": [
216
+ "download(\"download_8ec6f24c-9653-4335-8c1b-766926434399\", \"wiki_ABC_Q2examples.json\", 5931)"
217
+ ]
218
+ },
219
+ "metadata": {}
220
+ }
221
+ ]
222
+ },
223
+ {
224
+ "cell_type": "markdown",
225
+ "source": [
226
+ "## **Start here** to interact with model"
227
+ ],
228
+ "metadata": {
229
+ "id": "PpKRr5Vw4_0F"
230
+ }
231
+ },
232
+ {
233
+ "cell_type": "code",
234
+ "source": [
235
+ "! pip install -q langchain=='0.0.229' openai gradio numpy chromadb tiktoken unstructured pdf2image pydantic==\"1.10.8\" jq"
236
+ ],
237
+ "metadata": {
238
+ "id": "UJi1Oy0CyPHD"
239
+ },
240
+ "execution_count": null,
241
+ "outputs": []
242
+ },
243
+ {
244
+ "cell_type": "code",
245
+ "source": [
246
+ "# import necessary libraries here\n",
247
+ "from getpass import getpass\n",
248
+ "from langchain.llms import OpenAI as openai\n",
249
+ "from langchain.chat_models import ChatOpenAI\n",
250
+ "from langchain.prompts import PromptTemplate\n",
251
+ "from langchain.document_loaders import TextLoader\n",
252
+ "from langchain.indexes import VectorstoreIndexCreator\n",
253
+ "from langchain.text_splitter import CharacterTextSplitter\n",
254
+ "from langchain.embeddings import OpenAIEmbeddings\n",
255
+ "from langchain.schema import SystemMessage, HumanMessage, AIMessage\n",
256
+ "import numpy as np\n",
257
+ "import os\n",
258
+ "from langchain.vectorstores import Chroma\n",
259
+ "from langchain.document_loaders.unstructured import UnstructuredFileLoader\n",
260
+ "from langchain.document_loaders import UnstructuredFileLoader\n",
261
+ "from langchain.chains import VectorDBQA\n",
262
+ "from langchain.document_loaders import JSONLoader\n",
263
+ "import json\n",
264
+ "from pathlib import Path\n",
265
+ "from pprint import pprint"
266
+ ],
267
+ "metadata": {
268
+ "id": "YHytCUoExrYe"
269
+ },
270
+ "execution_count": 18,
271
+ "outputs": []
272
+ },
273
+ {
274
+ "cell_type": "code",
275
+ "source": [
276
+ "from google.colab import files\n",
277
+ "uploaded = files.upload()"
278
+ ],
279
+ "metadata": {
280
+ "colab": {
281
+ "base_uri": "https://localhost:8080/",
282
+ "height": 74
283
+ },
284
+ "id": "Wpt6qsmEw8WP",
285
+ "outputId": "4563fa62-5245-4115-ecb9-326353dba29c"
286
+ },
287
+ "execution_count": 7,
288
+ "outputs": [
289
+ {
290
+ "output_type": "display_data",
291
+ "data": {
292
+ "text/plain": [
293
+ "<IPython.core.display.HTML object>"
294
+ ],
295
+ "text/html": [
296
+ "\n",
297
+ " <input type=\"file\" id=\"files-6f630631-16ee-49df-8de9-29bfa3ec5f7b\" name=\"files[]\" multiple disabled\n",
298
+ " style=\"border:none\" />\n",
299
+ " <output id=\"result-6f630631-16ee-49df-8de9-29bfa3ec5f7b\">\n",
300
+ " Upload widget is only available when the cell has been executed in the\n",
301
+ " current browser session. Please rerun this cell to enable.\n",
302
+ " </output>\n",
303
+ " <script>// Copyright 2017 Google LLC\n",
304
+ "//\n",
305
+ "// Licensed under the Apache License, Version 2.0 (the \"License\");\n",
306
+ "// you may not use this file except in compliance with the License.\n",
307
+ "// You may obtain a copy of the License at\n",
308
+ "//\n",
309
+ "// http://www.apache.org/licenses/LICENSE-2.0\n",
310
+ "//\n",
311
+ "// Unless required by applicable law or agreed to in writing, software\n",
312
+ "// distributed under the License is distributed on an \"AS IS\" BASIS,\n",
313
+ "// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
314
+ "// See the License for the specific language governing permissions and\n",
315
+ "// limitations under the License.\n",
316
+ "\n",
317
+ "/**\n",
318
+ " * @fileoverview Helpers for google.colab Python module.\n",
319
+ " */\n",
320
+ "(function(scope) {\n",
321
+ "function span(text, styleAttributes = {}) {\n",
322
+ " const element = document.createElement('span');\n",
323
+ " element.textContent = text;\n",
324
+ " for (const key of Object.keys(styleAttributes)) {\n",
325
+ " element.style[key] = styleAttributes[key];\n",
326
+ " }\n",
327
+ " return element;\n",
328
+ "}\n",
329
+ "\n",
330
+ "// Max number of bytes which will be uploaded at a time.\n",
331
+ "const MAX_PAYLOAD_SIZE = 100 * 1024;\n",
332
+ "\n",
333
+ "function _uploadFiles(inputId, outputId) {\n",
334
+ " const steps = uploadFilesStep(inputId, outputId);\n",
335
+ " const outputElement = document.getElementById(outputId);\n",
336
+ " // Cache steps on the outputElement to make it available for the next call\n",
337
+ " // to uploadFilesContinue from Python.\n",
338
+ " outputElement.steps = steps;\n",
339
+ "\n",
340
+ " return _uploadFilesContinue(outputId);\n",
341
+ "}\n",
342
+ "\n",
343
+ "// This is roughly an async generator (not supported in the browser yet),\n",
344
+ "// where there are multiple asynchronous steps and the Python side is going\n",
345
+ "// to poll for completion of each step.\n",
346
+ "// This uses a Promise to block the python side on completion of each step,\n",
347
+ "// then passes the result of the previous step as the input to the next step.\n",
348
+ "function _uploadFilesContinue(outputId) {\n",
349
+ " const outputElement = document.getElementById(outputId);\n",
350
+ " const steps = outputElement.steps;\n",
351
+ "\n",
352
+ " const next = steps.next(outputElement.lastPromiseValue);\n",
353
+ " return Promise.resolve(next.value.promise).then((value) => {\n",
354
+ " // Cache the last promise value to make it available to the next\n",
355
+ " // step of the generator.\n",
356
+ " outputElement.lastPromiseValue = value;\n",
357
+ " return next.value.response;\n",
358
+ " });\n",
359
+ "}\n",
360
+ "\n",
361
+ "/**\n",
362
+ " * Generator function which is called between each async step of the upload\n",
363
+ " * process.\n",
364
+ " * @param {string} inputId Element ID of the input file picker element.\n",
365
+ " * @param {string} outputId Element ID of the output display.\n",
366
+ " * @return {!Iterable<!Object>} Iterable of next steps.\n",
367
+ " */\n",
368
+ "function* uploadFilesStep(inputId, outputId) {\n",
369
+ " const inputElement = document.getElementById(inputId);\n",
370
+ " inputElement.disabled = false;\n",
371
+ "\n",
372
+ " const outputElement = document.getElementById(outputId);\n",
373
+ " outputElement.innerHTML = '';\n",
374
+ "\n",
375
+ " const pickedPromise = new Promise((resolve) => {\n",
376
+ " inputElement.addEventListener('change', (e) => {\n",
377
+ " resolve(e.target.files);\n",
378
+ " });\n",
379
+ " });\n",
380
+ "\n",
381
+ " const cancel = document.createElement('button');\n",
382
+ " inputElement.parentElement.appendChild(cancel);\n",
383
+ " cancel.textContent = 'Cancel upload';\n",
384
+ " const cancelPromise = new Promise((resolve) => {\n",
385
+ " cancel.onclick = () => {\n",
386
+ " resolve(null);\n",
387
+ " };\n",
388
+ " });\n",
389
+ "\n",
390
+ " // Wait for the user to pick the files.\n",
391
+ " const files = yield {\n",
392
+ " promise: Promise.race([pickedPromise, cancelPromise]),\n",
393
+ " response: {\n",
394
+ " action: 'starting',\n",
395
+ " }\n",
396
+ " };\n",
397
+ "\n",
398
+ " cancel.remove();\n",
399
+ "\n",
400
+ " // Disable the input element since further picks are not allowed.\n",
401
+ " inputElement.disabled = true;\n",
402
+ "\n",
403
+ " if (!files) {\n",
404
+ " return {\n",
405
+ " response: {\n",
406
+ " action: 'complete',\n",
407
+ " }\n",
408
+ " };\n",
409
+ " }\n",
410
+ "\n",
411
+ " for (const file of files) {\n",
412
+ " const li = document.createElement('li');\n",
413
+ " li.append(span(file.name, {fontWeight: 'bold'}));\n",
414
+ " li.append(span(\n",
415
+ " `(${file.type || 'n/a'}) - ${file.size} bytes, ` +\n",
416
+ " `last modified: ${\n",
417
+ " file.lastModifiedDate ? file.lastModifiedDate.toLocaleDateString() :\n",
418
+ " 'n/a'} - `));\n",
419
+ " const percent = span('0% done');\n",
420
+ " li.appendChild(percent);\n",
421
+ "\n",
422
+ " outputElement.appendChild(li);\n",
423
+ "\n",
424
+ " const fileDataPromise = new Promise((resolve) => {\n",
425
+ " const reader = new FileReader();\n",
426
+ " reader.onload = (e) => {\n",
427
+ " resolve(e.target.result);\n",
428
+ " };\n",
429
+ " reader.readAsArrayBuffer(file);\n",
430
+ " });\n",
431
+ " // Wait for the data to be ready.\n",
432
+ " let fileData = yield {\n",
433
+ " promise: fileDataPromise,\n",
434
+ " response: {\n",
435
+ " action: 'continue',\n",
436
+ " }\n",
437
+ " };\n",
438
+ "\n",
439
+ " // Use a chunked sending to avoid message size limits. See b/62115660.\n",
440
+ " let position = 0;\n",
441
+ " do {\n",
442
+ " const length = Math.min(fileData.byteLength - position, MAX_PAYLOAD_SIZE);\n",
443
+ " const chunk = new Uint8Array(fileData, position, length);\n",
444
+ " position += length;\n",
445
+ "\n",
446
+ " const base64 = btoa(String.fromCharCode.apply(null, chunk));\n",
447
+ " yield {\n",
448
+ " response: {\n",
449
+ " action: 'append',\n",
450
+ " file: file.name,\n",
451
+ " data: base64,\n",
452
+ " },\n",
453
+ " };\n",
454
+ "\n",
455
+ " let percentDone = fileData.byteLength === 0 ?\n",
456
+ " 100 :\n",
457
+ " Math.round((position / fileData.byteLength) * 100);\n",
458
+ " percent.textContent = `${percentDone}% done`;\n",
459
+ "\n",
460
+ " } while (position < fileData.byteLength);\n",
461
+ " }\n",
462
+ "\n",
463
+ " // All done.\n",
464
+ " yield {\n",
465
+ " response: {\n",
466
+ " action: 'complete',\n",
467
+ " }\n",
468
+ " };\n",
469
+ "}\n",
470
+ "\n",
471
+ "scope.google = scope.google || {};\n",
472
+ "scope.google.colab = scope.google.colab || {};\n",
473
+ "scope.google.colab._files = {\n",
474
+ " _uploadFiles,\n",
475
+ " _uploadFilesContinue,\n",
476
+ "};\n",
477
+ "})(self);\n",
478
+ "</script> "
479
+ ]
480
+ },
481
+ "metadata": {}
482
+ },
483
+ {
484
+ "output_type": "stream",
485
+ "name": "stdout",
486
+ "text": [
487
+ "Saving wiki_ABC_Q2examples (2).json to wiki_ABC_Q2examples (2).json\n"
488
+ ]
489
+ }
490
+ ]
491
+ },
492
+ {
493
+ "cell_type": "code",
494
+ "source": [
495
+ "# setup open AI api key\n",
496
+ "openai_api_key = getpass()"
497
+ ],
498
+ "metadata": {
499
+ "id": "jVPEFX3ixJnM"
500
+ },
501
+ "execution_count": null,
502
+ "outputs": []
503
+ },
504
+ {
505
+ "cell_type": "code",
506
+ "source": [
507
+ "os.environ[\"OPENAI_API_KEY\"] = openai_api_key\n",
508
+ "openai.api_key = openai_api_key"
509
+ ],
510
+ "metadata": {
511
+ "id": "MO7VuGmrxVAr"
512
+ },
513
+ "execution_count": 19,
514
+ "outputs": []
515
+ },
516
+ {
517
+ "cell_type": "code",
518
+ "source": [
519
+ "mdl_name = 'gpt-3.5-turbo-0301'"
520
+ ],
521
+ "metadata": {
522
+ "id": "4Thxj6Gk1zVS"
523
+ },
524
+ "execution_count": 20,
525
+ "outputs": []
526
+ },
527
+ {
528
+ "cell_type": "code",
529
+ "source": [
530
+ "llm = ChatOpenAI(model='gpt-3.5-turbo-16k')\n",
531
+ "messages = [\n",
532
+ " SystemMessage(content=\"You are a helpful assistant.\"),\n",
533
+ " HumanMessage(content=\"\")\n",
534
+ "]"
535
+ ],
536
+ "metadata": {
537
+ "id": "Sgq9aVqpxZnK"
538
+ },
539
+ "execution_count": 36,
540
+ "outputs": []
541
+ },
542
+ {
543
+ "cell_type": "code",
544
+ "source": [
545
+ "file_path='/content/wiki_ABC_Q2examples (2).json'"
546
+ ],
547
+ "metadata": {
548
+ "id": "Ak7X_ZRba48F"
549
+ },
550
+ "execution_count": null,
551
+ "outputs": []
552
+ },
553
+ {
554
+ "cell_type": "code",
555
+ "source": [
556
+ "data = json.loads(Path(file_path).read_text())\n",
557
+ "data = str(data)"
558
+ ],
559
+ "metadata": {
560
+ "id": "PKqQROVMc6BP"
561
+ },
562
+ "execution_count": 26,
563
+ "outputs": []
564
+ },
565
+ {
566
+ "cell_type": "code",
567
+ "source": [
568
+ "# Contruct Vector Store\n",
569
+ "text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)\n",
570
+ "texts = text_splitter.split_text(data)\n",
571
+ "\n",
572
+ "embeddings = OpenAIEmbeddings()\n",
573
+ "\n",
574
+ "db = Chroma.from_texts(texts, embeddings)\n",
575
+ "\n",
576
+ "qa = VectorDBQA.from_chain_type(llm=ChatOpenAI(model_name = mdl_name), chain_type=\"stuff\", vectorstore=db, k=1)"
577
+ ],
578
+ "metadata": {
579
+ "id": "i2NadtLlcxur"
580
+ },
581
+ "execution_count": 37,
582
+ "outputs": []
583
+ },
584
+ {
585
+ "cell_type": "code",
586
+ "source": [
587
+ "# A level\n",
588
+ "example1 = 'The most powerful aspect of machine learning is its ability to automate processes. If the goal is well defined and repeatable, an algorithm can be trained to perform that task far faster than any human and often with more reliability. Because of this, businesses can implement machine learning algorithms to solve problems human labor was previously required for, whether that be mental or physical labor. Although implementing machine learning often has high initial costs, because computers do not require payment outside of maintaining their operation, in the long run companies can save money by either making their workers’ tasks more efficient or by entirely automating tasks. This increases the profit margins for those firms.'\n",
589
+ "# B level\n",
590
+ "example2 ='ML systems can help improve access to data while managing compliance. It helps businesspeople deal with data in a more efficient way, since there is numerous data generated per second in such industry. If they use some of the simple classification models with high accuracy, it will save them a lot of time on data cleaning and validation, which are kind of repetitive and time-consuming. For example, NLP provides an easy way for personnel to query business information, understand business processes, and discover new relationships between business data, ideas based on intuition and insight often emerge. Using models to do prediction helps people making wiser decision. Since models can handle way more data than human, newly collected data can feed in the model and get some predictive result as a reference to the decision makers. This is significant because in this industry, time is precious, and traders must decide quickly and precisely. A little negligence will lead to a big mistake, lose a lot of money, and even affect the company\"s reputation. Models can see patterns that are not easy for human to spot, which is also valuable for modify the way people doing analysis and interpret.'\n",
591
+ "# C level\n",
592
+ "example3 = 'The machine learning model (or one a broader view, artificial intelligence) is about prediction. According to the lecture, there are tree main innovations in it. Prediction is cheap, more accurate and automated. As a result, armed with machine learning, businesses could automatically gain much more accurate and powerful capabilities in forecasting, leading to a big savings both in time and money.'\n",
593
+ "\n",
594
+ "# Randomized list of answers\n",
595
+ "training_answers = [example2, example1, example3]"
596
+ ],
597
+ "metadata": {
598
+ "id": "CJxUs8lG12kd"
599
+ },
600
+ "execution_count": 38,
601
+ "outputs": []
602
+ },
603
+ {
604
+ "cell_type": "code",
605
+ "source": [
606
+ "query = f\"\"\" Please grade the following student answers: {training_answers} to the question ({q2}).\n",
607
+ "The uploaded pdf should serve as as examples of A, B, and C level answers. In the document, the\n",
608
+ "original question is printed, as well as examples of previous student answers that have recieved\n",
609
+ "A, B, and C grades (labeled accordingly)\"\"\"\n",
610
+ "\n",
611
+ "query_prefix = \"\"\" The uploaded pdf should serve as as examples of A, B, and C level answers.\n",
612
+ "In the document, the original question is printed, as well as examples of previous student answers that have recieved\n",
613
+ "A, B, and C grades (labeled accordingly).\"\"\"\n",
614
+ "answer = qa.run(query_prefix + query)\n",
615
+ "print(answer)"
616
+ ],
617
+ "metadata": {
618
+ "colab": {
619
+ "base_uri": "https://localhost:8080/"
620
+ },
621
+ "id": "naKuTxKa2-U8",
622
+ "outputId": "fc10a157-071d-4a09-bf4d-315fec1d53e9"
623
+ },
624
+ "execution_count": 56,
625
+ "outputs": [
626
+ {
627
+ "output_type": "stream",
628
+ "name": "stdout",
629
+ "text": [
630
+ "The first answer would be a B-level answer. It mentions the efficiency of using machine learning for data cleaning and validation, as well as the ability to handle large amounts of data and make predictions. However, it could be more specific in terms of how machine learning can benefit businesses in various industries.\n",
631
+ "\n",
632
+ "The second answer would be an A-level answer. It highlights the ability of machine learning to automate processes and save money in the long run, as well as mentioning the reliability and efficiency of algorithms compared to human labor.\n",
633
+ "\n",
634
+ "The third answer would also be an A-level answer. It accurately describes the main innovation of machine learning as being prediction, and how it can lead to cost savings in time and money for businesses. It also hints at the potential for machine learning to improve forecasting capabilities.\n"
635
+ ]
636
+ }
637
+ ]
638
+ },
639
+ {
640
+ "cell_type": "markdown",
641
+ "source": [
642
+ "## Conclusions based on Question 2\n",
643
+ "\n",
644
+ "Moving forward:\n",
645
+ "\n",
646
+ "\n",
647
+ "1. Train the model on the other questions provided in the case example\n",
648
+ "2. Request more student answers from Dr. Blocher\n",
649
+ "3. Request a grading rubric from Dr. Blocher for the case examples, which may help the model gain more context\n",
650
+ "\n"
651
+ ],
652
+ "metadata": {
653
+ "id": "C7kDC2pe7bNd"
654
+ }
655
+ },
656
+ {
657
+ "cell_type": "markdown",
658
+ "source": [
659
+ "### Ouput for query with .pdf\n",
660
+ "\n",
661
+ "We also experimenting with using a .pdf file (before we figured out how to parse .json files). Here is one set of results.\n",
662
+ "\n",
663
+ "The first answer can be graded as a B. It touches on the benefits of machine learning such as improving access to data, efficiency in data management, and faster decision-making. However, the answer could be strengthened by providing more specific examples of how machine learning has benefited businesses.\n",
664
+ "\n",
665
+ "The second answer can be graded as an A. It provides a clear and concise explanation of how machine learning can automate processes and save companies money in the long run. The answer also acknowledges the initial costs of implementing machine learning but emphasizes the potential for increased profit margins.\n",
666
+ "\n",
667
+ "The third answer can also be graded as an A. It highlights the main innovation of machine learning which is prediction and how it can lead to cost and time savings for businesses. The answer is well-organized and provides a clear explanation of why machine learning is important for businesses in general.\n",
668
+ "\n",
669
+ "**Results:**\n",
670
+ "\n",
671
+ "The model did not perform as well as expected. It did not successfully grade any of the questions. The first qyestion should have been graded as an A, second as a B, and third as a C."
672
+ ],
673
+ "metadata": {
674
+ "id": "iJTiGX4aZ91_"
675
+ }
676
+ },
677
+ {
678
+ "cell_type": "markdown",
679
+ "source": [
680
+ "### Output for .json queries\n",
681
+ "\n",
682
+ "**First query:**\n",
683
+ "\n",
684
+ "The first answer would be a B-level answer. It touches on the efficiency and time-saving benefits of machine learning, but could benefit from more specific examples and a deeper explanation of how it can improve business outcomes.\n",
685
+ "\n",
686
+ "The second answer would be an A-level answer. It provides a clear explanation of how machine learning can automate processes and improve efficiency, leading to cost savings and increased profits for businesses.\n",
687
+ "\n",
688
+ "The third answer would also be an A-level answer. It focuses on the predictive power of machine learning and how it can save time and money for businesses by providing more accurate forecasts. It also mentions the automation benefits of machine learning.\n",
689
+ "\n",
690
+ "Notes: Same results as first query using .pdf.\n",
691
+ "\n",
692
+ "**Second query:**\n",
693
+ "\n",
694
+ "Answer 1: B-level answer. The answer talks about how machine learning can automate processes and solve problems that were previously done by human labor. It also mentions how implementing machine learning can save money in the long run.\n",
695
+ "\n",
696
+ "Answer 2: A-level answer. The answer discusses how machine learning can help with data cleaning and validation, which can save time and enhance efficiency. It also mentions how models can handle more data than humans and can provide predictive results for decision-makers. Additionally, it talks about how machines can see patterns that are difficult for humans to spot, which can help with analysis and interpretation.\n",
697
+ "\n",
698
+ "Answer 3: C-level answer. The answer discusses how machine learning can provide accurate and powerful capabilities in forecasting, resulting in savings in time and money. It also talks about the three main innovations in machine learning, which are prediction, accuracy, and automation. The answer provides examples of different industries that have implemented machine learning, showcasing its importance in today's data-rich world.\n",
699
+ "\n",
700
+ "Notes: Graded one answer correctky (answer C).\n",
701
+ "\n",
702
+ "**Third query**\n",
703
+ "\n",
704
+ "The first answer would be a B-level answer. While it touches on some important points such as saving time on data cleaning and validation and using models for prediction, it lacks depth and doesn't provide specific examples or applications for businesses.\n",
705
+ "\n",
706
+ "The second answer would be an A-level answer. It provides a clear and concise explanation of how machine learning can automate processes and save companies money in the long run. It also acknowledges the initial costs of implementing machine learning and how it can increase profit margins.\n",
707
+ "\n",
708
+ "The third answer would also be a B-level answer. While it mentions the three main innovations of machine learning (cheap, more accurate, and automated predictions), it doesn't provide specific examples or applications for businesses and lacks depth in its explanation.\n",
709
+ "\n",
710
+ "Notes: None of the answers correct\n",
711
+ "\n",
712
+ "### General comments:\n",
713
+ "\n",
714
+ "None of the query results were consistent. There were also intsances where the output would be something like \"As an AI language model, I do not have the capability to...\". So more training and prompt engineering is certainly needed."
715
+ ],
716
+ "metadata": {
717
+ "id": "FR4YMpPseoTE"
718
+ }
719
+ }
720
+ ],
721
+ "metadata": {
722
+ "colab": {
723
+ "provenance": [],
724
+ "include_colab_link": true
725
+ },
726
+ "kernelspec": {
727
+ "display_name": "Python 3",
728
+ "name": "python3"
729
+ },
730
+ "language_info": {
731
+ "name": "python",
732
+ "version": "3.10.6"
733
+ }
734
+ },
735
+ "nbformat": 4,
736
+ "nbformat_minor": 0
737
+ }
instructor_vector_store_creator.ipynb ADDED
@@ -0,0 +1,489 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "id": "HpzC9I3uDBud"
7
+ },
8
+ "source": [
9
+ "<a href=\"https://colab.research.google.com/github/vanderbilt-data-science/lo-achievement/blob/main/instructor_vector_store_creator.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
10
+ ]
11
+ },
12
+ {
13
+ "cell_type": "markdown",
14
+ "metadata": {
15
+ "id": "hbthVGIgDBug"
16
+ },
17
+ "source": [
18
+ "# Creating a Shared Vector Store (for Instructors)\n",
19
+ "\n",
20
+ "This notebook is for instructors to create a *vector store* which contains all of the information necessary for students to generate their own self-study materials using large language models."
21
+ ]
22
+ },
23
+ {
24
+ "cell_type": "markdown",
25
+ "metadata": {
26
+ "id": "NpYSvDgzDBuh"
27
+ },
28
+ "source": [
29
+ "## Using This Notebook\n",
30
+ "This notebook is split up into a bunch of *cells*, which contain explanations / instructions or code. You should go through the cells in order from top to bottom.\n",
31
+ "\n",
32
+ "When you encounter a code cell, hover the mouse over it and press the 'run' button (the circle with a triangle inside of it) to execute the code in the cell. Wait until the green checkmark appears to the right of the run button before proceeding to the next cell.\n",
33
+ "\n",
34
+ "Run the following code block to test it out. It should display `4` below the code.\n",
35
+ "\n",
36
+ "*Note: Before running, there may be a pop-up saying 'Warning: This notebook was not authored by Google'. In that case, click 'Run anyways'.*"
37
+ ]
38
+ },
39
+ {
40
+ "cell_type": "code",
41
+ "execution_count": null,
42
+ "metadata": {
43
+ "id": "ACE73ZmZDBuh"
44
+ },
45
+ "outputs": [],
46
+ "source": [
47
+ "print(2+2)"
48
+ ]
49
+ },
50
+ {
51
+ "cell_type": "markdown",
52
+ "metadata": {
53
+ "id": "dTrRB_2rDBui"
54
+ },
55
+ "source": [
56
+ "## Importing Libraries\n",
57
+ "Continue running the code below. Ignore any warnings that pop up."
58
+ ]
59
+ },
60
+ {
61
+ "cell_type": "code",
62
+ "execution_count": null,
63
+ "metadata": {
64
+ "id": "0MHRWInBDBuj"
65
+ },
66
+ "outputs": [],
67
+ "source": [
68
+ "# install libraries here\n",
69
+ "# -q flag for \"quiet\" install\n",
70
+ "!pip install -q langchain openai unstructured tiktoken deeplake trafilatura justext yt_dlp pydub"
71
+ ]
72
+ },
73
+ {
74
+ "cell_type": "code",
75
+ "execution_count": null,
76
+ "metadata": {
77
+ "id": "2FpSOOlEDBuk"
78
+ },
79
+ "outputs": [],
80
+ "source": [
81
+ "# import libraries here\n",
82
+ "import os\n",
83
+ "\n",
84
+ "from google.colab import files\n",
85
+ "from getpass import getpass\n",
86
+ "from IPython.display import display, Markdown\n",
87
+ "\n",
88
+ "import openai\n",
89
+ "from langchain.docstore.document import Document\n",
90
+ "from langchain.text_splitter import RecursiveCharacterTextSplitter\n",
91
+ "from langchain.embeddings import OpenAIEmbeddings\n",
92
+ "from langchain.document_loaders.unstructured import UnstructuredFileLoader\n",
93
+ "\n",
94
+ "from langchain.document_loaders.generic import GenericLoader\n",
95
+ "from langchain.document_loaders.parsers import OpenAIWhisperParser\n",
96
+ "from langchain.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader\n",
97
+ "\n",
98
+ "from langchain.document_loaders import WebBaseLoader\n",
99
+ "import trafilatura\n",
100
+ "import requests\n",
101
+ "import justext\n",
102
+ "\n",
103
+ "import deeplake\n",
104
+ "from langchain.vectorstores import DeepLake"
105
+ ]
106
+ },
107
+ {
108
+ "cell_type": "markdown",
109
+ "metadata": {
110
+ "id": "TIFQUXspDBul"
111
+ },
112
+ "source": [
113
+ "## Setting Up API Access\n",
114
+ "Much of the following code rely on certain *APIs* (application programming interfaces) which have limited access. You will need to get an *API key* for each of those services which will be inserted into the code to let the service know you are an authorized user."
115
+ ]
116
+ },
117
+ {
118
+ "cell_type": "markdown",
119
+ "metadata": {
120
+ "id": "vKXV9FubDBum"
121
+ },
122
+ "source": [
123
+ "#### OpenAI"
124
+ ]
125
+ },
126
+ {
127
+ "cell_type": "markdown",
128
+ "metadata": {
129
+ "id": "GG9GwadDDBuo"
130
+ },
131
+ "source": [
132
+ "First, you will need an **OpenAI API key**. To do this:\n",
133
+ "1. Visit [platform.openai.com/account/api-keys](https://platform.openai.com/account/api-keys) and sign up for an account.\n",
134
+ "2. Click 'Create a secret API key', and give it any name you want.\n",
135
+ "3. Copy the newly created key, either by right-clicking and pressing 'Copy' or using the keyboard shortcut -- Ctrl+C on Windows, Cmd+C on a Mac.\n",
136
+ "\n",
137
+ "Run the following code cell. You'll see a blank text box pop up -- paste your API key there (using the shortcut Ctrl+V on Windows, or Cmd+V if you are using a Mac) and press Enter."
138
+ ]
139
+ },
140
+ {
141
+ "cell_type": "code",
142
+ "execution_count": null,
143
+ "metadata": {
144
+ "id": "Dv3l_ZKiDBup"
145
+ },
146
+ "outputs": [],
147
+ "source": [
148
+ "OPENAI_API_KEY = getpass(\"OpenAI API key: \")\n",
149
+ "os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY\n",
150
+ "openai.api_key = OPENAI_API_KEY"
151
+ ]
152
+ },
153
+ {
154
+ "cell_type": "markdown",
155
+ "metadata": {
156
+ "id": "1Xbl8szQDBup"
157
+ },
158
+ "source": [
159
+ "#### DeepLake\n",
160
+ "\n",
161
+ "Next, you will need to input a **DeepLake API key**, found in the DeepLake dashboard at [app.activeloop.ai](https://app.activeloop.ai).\n",
162
+ "\n",
163
+ "1. Click the link above and create an account.\n",
164
+ "2. After making an account, you will be prompted to set a username. Once you have set your username, copy it, run the code below, paste the username into the text box, and press Enter. (This username will be shared with students.)"
165
+ ]
166
+ },
167
+ {
168
+ "cell_type": "code",
169
+ "execution_count": null,
170
+ "metadata": {
171
+ "id": "rfzGVvm-DBup"
172
+ },
173
+ "outputs": [],
174
+ "source": [
175
+ "DEEPLAKE_USERNAME = input(\"DeepLake username: \")"
176
+ ]
177
+ },
178
+ {
179
+ "cell_type": "markdown",
180
+ "metadata": {
181
+ "id": "o9_N6ychDBur"
182
+ },
183
+ "source": [
184
+ "3. You should then be on the DeepLake dashboard. At the top, click 'Create API token'. You should see an empty table with the columns 'Name', 'Expiration date', and 'Token'.\n",
185
+ "4. Click the 'Create API token' button ![image.png](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAJYAAAAwCAYAAADzRIMRAAALoElEQVR4Xu1caXBT1xX+nvZdssFm8+DYgGPWAVMggFkyMWuJyYBDgRICDZ1JO6HJkNKkAYakEHcgSxsyQybDUkqLW9bWCVsh7GFJ2NywhuASJyQ4GCxZaF973nuWbFmSbWHJjjXv/tLy3r3nnvPdc7577nmP8VOD0AQNxFkDjACsOGtU6I7TgAAsAQgJ0YAArISoVehUAJaAgYRoQABWQtQqdBozsLhNpM8J193P4a78FO6q8/CZr8NruQp4BIW2Ww1IALGmD0S6XEjTBkPaOR+y9KGASA6GYWKeVvOB5ffCY70D+5cb4aooBSQqyDuPhTj9J5AaciFSZ0AkVccsgHDDj0MDPrcVPuttuE3kJO6eg7PyKDkKG2SZU6F89BeQqLvQVk/cbGGbBpbfB5/bDNul92AjUCmynoay12xIO+Y1exDhwvapAfe9C7B/VQLHre1QEbhU/V8k56EjgImanFCjwPJ7HXDeOQZr2VuQ6nO4jiWGR5vsVLgguTTgMX3JORZ3zQ2oBy6GvMsYcl6KRicZFVh+coO2LzfBdnE5NMPfh7LHzOTSljCbmDVgL/8nLKcXQjXoDfJg88AQHYrWIgKLA9XVD+Ao3wbdqLUU9gbHLIRwQ3JqwH3vPMwnfg1FjxlQ9flVVHCFAcvvdcJ2fQMcN/4K/eObhdCXnPho0azY0FhzZC4UOc9ClfschUV5WH+hwCKi7vjuEB4cnQ3DxP2Cp2qR+pP7ZtZzmfZPhHZsCRTdnggj9CHA8rlMMB4ogqr3AoFTJTcu4jI7lnPZrq1HyvgdEMkMIX3WAYvyVJYLK+Gz/cDxKqEJGmiOBli+JVJ1giZvaUieKwgsj+U2qj8aidTJBwRe1RyNCtdwGmD5VvXe8UgtPAmJJiOoFQ5Y7DGN5cIK+F1m6Ia/LahM0EBMGjCf/i0YmY681rLg8Q8PLEqEVpeOhG70OiGjHpNKhYtZDbAZevPxXyJ16slg4pQDlvPOcVjOLkGHwhNx1ZTH/hmc5VfA1j4zkmzKfdDZojSuQySoMyNsN3bB59JCnjUDkY5AHd9sgMdc/3A2HfLMUZBo9WB/9aMctsvHwGjHQpWZHRc52TG9jkFQ5uSh6UOVuAzZ7E7ufzQKmiFvUlZ+NHcPByzLxWKCnR2aoSua3VFjF/rxA8yHhsF5qyL0MlEuVPnHoclJi8s4DTuxX3+JzrW6QzNpEVqCX4/pDzDuWM4tCGnPM0gZOyxMXtNeBq7vw6ch6VYC/aRZBK7NqF7/LJguJejw01lhFz6MrOyYbvNSpMxcASpG+FE1y+fLSFlKaAa9Vgcs44HplOyaC+UjU+MgrAvmoyo4bnoh6bYe2sfnQqqQ0jnTFpj3zIHXVQTDrO2QhefUWjy25VQmnRgMgWHBDsha0Jv5mJhOHZ6AuNNBeKuegeGZzZA1ONjngTUZ2qfXckb2oxLWY4/BfRdQj/VD2bNxYD2MrD9mYNm/LqWk+mZKPeysA9b9HX2hL9gVl92gx1wM47YlYDp+gJSnnkd9e/hst+CXZnHh0HO/GKZ9f4as71p4btGpeef9MIzoD49lL8yHX4Dn3i0qEMqBrMdaaPPJyJy4NlgvvQD75b9QiQd9lQ2CIm8rtP16wXpuJP1+Cn62JkxBYSnnGPRDc6ky4xIeHJ4N153LZHw1JJ3XQjduLiRRKkB8vl0wbp4OpJVA03sPag5vgXKEG9o+oT6CB9b0EBC7KxfBtPtPkOZ+AV3+xageyxJFVq/tIMxHXoan6hIoV00lScW0MH8PWW01UiiwXLCcyYf9phOqEaegzmZIN/PpbHcbpwORZjY0BZugSOV9d80hhkL3ciqDOQ/Hpd2g7BKk3UuhKygMsdHDrkcuG//JNHQoulIHrLubGHT8uSUu9VTO60+i5tPdUD5GxugX3WG7q16BqXQ1/KJU8gzjIc9eDWXWSRi3zoJf+xLUg2dRAeGHsJ/dyBmKA92Dd2DauQ6S3r+j+iALXF+9SADNgGbit2AsL8H6xXt0TwYkGVSklrka6t73CeSD4PX8DKqhL9OO5d+wnSom0G9HyqSiiDzFeXMmao5uhXq0G6qcz1C9JR9+zToipgs4hQW4TTiwbERg+9CqreCAqO5TEhVYbBhsKKsqpxLGnUOpYDKXFttiiJiTcF7bSIuEQl8RhT5aCHXAWgbHuVGwlZ0lPd+Gpl9XWM/0Im/thbzvGsi7+2E/Xwj3/ZnQz/gH5Er+XjZ0izstgSwrHW5Od52gHlcJdebDwqnuPrae694WDdLn8U8Tchzr7noG6Qvi83ih9VweldlcJOX6aZVHFzgALEU9AHL3ftEJ+ln7OGWwzXpxOKz/7Y2UeQQw+u5zV8Frs5GFDfA7ilFD4FTUjtUwvLj+Nwemw8c44KlqUyzOiudgPniNQtgpKPQN5TPC9FEqXNXzg+NxYfGrPOimn+UuVqTw90TjWIz+daRMWw6ROLZQ6LwxDTXH/8UBWp3DL8gAyAMeMwAs5cAy2GjxynLPQ5+fR574Yxg3FUL0SB0f9PkOwFgyAZKcb8lzZ9QDJc/PAt41oLuWQwuoj6O4AyugIPkAfkLRWhBY9QAYzVhAb2ievAqp6A2YPn6dvJyadplu6lpK4LIGQdwQWPz3byKKEAn4AdIu7r4N2mFUlkvNW/MuHhxYAwmReLYFiDwvK+8duRUqzSYvSUWQWQMJVLQAmiDvkWUdCMP80iCfC/QhYT12fn8ezMThqHMw8oUwzFwDKY0V9P4RZiqu3Tw05GeR9N9ScIUDK46hMMBRvMw06It2Ql6vWtlZ8S48XgpL2d2IR/ChsP6KefBpV9ivD4B2+oeQhOynJVSP3Q2WwwwdkteR6YarrqGxHFcmwnz6GlRjTkCeHuqRRarMsNRHY0BkFAs5vRvmkDGDHiuUY9U3TKzA4mUtCwlNAePL80zQ5elrvSQDaa9n4CnfDFGnEhhox8n4NqB64wKIs3dDM7hfCD4YaRpVkavCPVYE/bcEWBFDYTzJOyucjXiWhVw15I9D0XcRZF38xD3egruc8mRqnjP4q8OB5a5aSh7pTTAd1kA/4XniFSbKBa0kYv82lF2ktNskYH1dCP3TpRxg+bBG5dK1Xs9eNgYPzlVBNW475Kk5xBl38pyNQKGbsAoyrRP2r1fRsyCLoOkTmvLw4yTHp3x4Hropr4bwL+fVKbBduczpPRCqIpH3WIDVmKzaJ4gPoowW0lPw2kN5UiDd4KrVMR8Os1Czn0L49wOgGl0KVc+u8FbvowNiKzQjZ3PkPNEeKyJ5j2+6gVevo+JVWE6QER116hanLYG2YCW3y4nmiu035xMoN/G7O7ZJ+tKu5wiX+/La/kYEdy7xLDbhqgSjzobfeCUIrAAw2R2V5JGDSC0ogKvyjzB/8lqdHKIMyPvTbmhIHpfIDLQAn5HVhp36IPE63ofx77/hf0rjibw5wq4wFmBFlfUgyeqs7YkWprbgP9yiYlsoOIgP1oJJM+EMFJ0vo2bPZEp33AuKIU5bDf2UxQ2Ify3HirPHiphuiHeCNETBzu8oW+yhE/Dw0FP/utDPLtoB3iFSaghmsuv+5/+DtCsklB9r2PxE7qkAFmJ9WojX8dgr6CxUFfZ7dBkS/09kWRube9My8X3aYtR30/02dUXEBGmijnSaEkb4P3k0EPFIRziETh4Dt8VMoh5CC2UzbWGO5BkzatkMO0Wh0C95DN2aM2m00I8TRChNbk17JM1YTZYmszMVHqZIGnu3ykSa9zAF57WEx79axSJJMEhMj3/xEVF4YDUJ7J7QKcT+wGqtOMIj9gm1S7vu/KEfsQ/MWngpSLu2f0KEb/FLQYLgorDIZuWtZauE1xglxFTto9PQ1xi9wj0wEel9DfVnI7x4rX3Ytk2kTNiL10JmI7wqsk2M21qDtv6rIhvMTHi5bWuZupXHabOX27byPIXh2rcGmuZY7Xt+gvRtpAEBWG2k+GQfVgBWslu4jeb3f6EJmcthjHP3AAAAAElFTkSuQmCC) at the right of the page, choose a name for the token, then click 'Create API token'. (You do not need to change the expiration date.)\n",
186
+ "5. Afterwards, you should see the table look something like this:"
187
+ ]
188
+ },
189
+ {
190
+ "cell_type": "markdown",
191
+ "metadata": {
192
+ "id": "ivyqa05xDBur"
193
+ },
194
+ "source": [
195
+ "![image.png](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAsMAAABrCAYAAACSXI/oAAAgAElEQVR4Xu3dCVhN6QMG8DctSiZC9p0xzFgG2ZUlFMLIkkGWwYx9+BvZMxhjzSTTDJM9JoPIGMvIln2isa9DlsgMkZSi7f7POffe3K5KqZN7u+99nnlmps7yfb/vfPe85zvfORkphA/4oQAFKEABClCAAhSggAEKGDEMG2Crs8oUoAAFKEABClCAApIAwzAPBApQgAIUoAAFKEABgxVgGDbYpmfFKUABClCAAhSgAAUYhnkMUIACFKAABShAAQoYrADDsME2PStOAQpQgAIUoAAFKMAwzGOAAhSgAAUoQAEKUMBgBRiGDbbpWXEKUIACFKAABShAAYZhHgMUoAAFKEABClCAAgYrwDBssE3PilOAAhSgAAUoQAEKMAzzGKAABShAAQpQgAIUMFgBhmGDbXpWnAIUoAAFKEABClCAYZjHAAUoQAEKUIACFKCAwQowDBts07PiFKAABShAAQpQgAIMwzwGKEABClCAAhSgAAUMVoBh2GCbnhWnAAUoQAEKUIACFGAY5jFAAQpQgAIUoAAFKGCwAgzDBtv0rDgFKEABClCAAhSgAMMwjwEKUIACFKAABShAAYMVYBg22KZnxSlAAQpQgAIUoAAFGIZ5DFCAAhSgAAUoQAEKGKyALGH41vbJWHKxFeZ4dEQxDdpXIV6Y+FdLeI+xNVhwVlz3BcTjd96uyDQFNUIZuHw7Bx3Lv3v5Q9eNwuHiczDBWbNXZH97ydG7MX/GQ3RfPAQfm2V//ayuocAJLB16Dm1WjkS9rK7E5ShgAAJJD7Zgxsw/8TidujYfthJfNEkfgX3KAA4OVlEvBWQJw6fnNsbna56h7oAA/OZRB6Yqmritrmi9wxUhfi56icVCG4aAePwOP/8lAj0dNSpsjCKly8JKfTC/A8W7hmEFLsDDfhRK+hzFmDpAXoXhFPwKt6r7MCBsLTQl0qu6aDY3eT0CPWq+gwxXoYB+CSgSn+HhvzFIFop9wccFPimLsGJMNakSlsUroFih9OuTnT6lXyIsLQX0W0C2MPxFSA18fCcCLdYGYWx9Y0lJOwyLXyhnD+zElUgjWFWyRwe7ijBXeb54cg+vLMog9uLvOPJPAdRq2wUNbGKl5a9GlkQTF0dU1/jCibkbjH1Hw5FYvB46tauTo9Ci303K0udUQAx2o65NTveiTREXiQfPTFGqbBHVRV4CoiL+g6JoBVibCifIx0Yobn0f+wPOIrpQZdg7tUR51XH6KjoCUbBB6SKmUP+35bMT+DvqU7T6tAjS7w/C9h8exuyes1B81mYMbVRSWD8RkeHxKFyhuKq/JODJhf3Yc+EpTLSOf+mkLZSphHUUTuw9igcv3uw7ml6vos9h/85LQtlroE3Xm/imhmYYfr2fAhp1E+uyf143KRD8Mq5+qk1K/B0cUe2zQZe2qCXUmx8K5DeB9C4E1cf+/ReFUb2lE5pWtpCq/WYYjsPj8CdItiotfS8Ar/tYRudExc0gqa8Xqd4OTk1Lpw425TdX1ocCeSkgWxgWw0RAt83oNssG3qHL0FK4nasZhhUIwbTWo3D5077oZZuCgytW4b8WqxA4v5nUuf0GlsWSqw3QsEtXfHjXF2tDyqLex0CFxp2Bw944bDYJB7a6wUr48ji5oD0mBDWE66AGMLmyGb+GNIXv7qmy3kLOy0bivvJWILMwnJIYipmOQ/F8+EEs7W2N8K190OvHBlh/2B2VLnvAwTUIZpVqoH3X5nh5fB0CL9bFokPL0aEooHnSlPZxoCgKm5ZCuy8XYkqP8Az6QzkETJ+AnwOvwqx2M7TpNRuTexzSGLEVjv+5bTBsZyX0G9wZBf72EY7/TzBvny862QCvtMt0eBX+iBqIHXtHQXvGR+z52ejWdwcKt+8Pl4q3seXkFUSHVsD0sDXSyPCx2c3gfqwZXAc0wMujP2HThbbwDf4O1rsmY7L3TvyjqIvmLfrAY243FBa29dmwo6jvOgh1Cl9EwKq/4bBiT+qFcd62KPdGAfkEtMNwUqQ/BjjMQ0ybwehV6z9s/WUnzHtswsapdWCc5m6Lsu9OvjYWm1e7oqTpI2wc6og1sb3g6lwRUUIf2xv3DQKEO6nixCrxnOgT1gBlGzqgm9A/160LRvOFZ/GdYwH5KsctU8BABGQNwyF+ztgxug4WR8/DDqFDm2tNk0hMTISpqXK0SDxpO31ljhXHpqKGquOHdLiPZf0KQIFgjG/4AxrvDURf1QlevWyVu/Pg3DMac4/Ph61q/uSVJa0wO3kDNk2sYCDNyGrmpoByms/TNJs0UpTD2MBgaZpCUuQ69O8UhJ7r7bGx7wkM+3NtavB0HGaMn0/MRC1p7QRsHfExNpT5Q5o+oB2GpzzyxJ6ldqkjOxn1hw+1pkloji61FY7/Tp3CMfXsT2gjHf8J2D+pEebGe+OQdyupX2mWKePbtFFY7lofh22Ppvab5PiVcP3kOIapwrBC6K9JQn8Ve6z21A3NuikQhnmOrlBMP4Vpdsq7QolCObsONYdv0Pg3Qnhuth23RYG8FkgbhpV9fmvVQ6n9KOnJz+jZfC96HNiBfuWVU4/cwnxRZG5ruJ8bBr+Ng1BZ6LvPd38Bp5X2+GPbINWzNuK2miC0UyjmdSkghWH1OVGsI6cm5XVLc3/5WUDmMOwCcSRtqoMbYr86joUFh6fOGVbgEbaN7465wa9gYlEcH9e2xLXLtlivEYb/7hyBH3orT7wz7ZegxRHl3EXN4FxeCNgNFsahZd1Sqe2U8vwqrhacgJOcm5yfj13Z6pbZyLB6p+H+PeE04zrazDqCH/tZp3tBJ/5Q826IdhjWnGObWX/ILAzbpTMP/9X5yegwygq+Ql8SR6s1LzIzCsPa4VYsu/ay93ePQt/pJxFXoBBK1v4EZhevwmHtEekCQbNu4nr9qy3FizZ1hUkhCiWZ0TNcP/oBpl9TjjLzQ4H8IpD2QjDt/H5lHROwun8VXOz6AJ69hb5RdQ+qD7mNgM22+OW0F1qoBnHE7QwLqgbb6papNAkPQvG8mb90MS2GYfU5kWE4vxw9rIeuCMgehsWKxp73gLPrcTj2L4Pt112kuZjxQUNhv7A2tgeNk0aKYs9MROdx1tIJXD0ynNUw3NK/FQK8usBEQ9XIogTKl1DO0+KHAtkReFsYFoPrhoEO+CnMEmaV3bFddRtT++6GuM+I1c7ocewrnFzd5Y2RYc0wnFl/eFsYbhXgghP+rqkjzOK2WnnWx+/CVAjrbIXhEbBeehz/q6/U0gzD7bELw2t7o/HOPzG0ivC7xFOY1NodFZdnFIYD0WG/Jxw0OyUKwSZ1nnN2WoTLUkB3Bd4Mw2n7kQL3sLizI6LHXsVsRyEMV5+BO81Ho1WkD6433Z76kLm4nelP5mLl+LQPoZqq5hMzDOvuMcCS6b9AnoRhkenqMgd0XxYOqybzpTAcF+gG+/UOOCjcErISfq85mpWdMFwt2h9u9r5oqvGgXvzNq/ivYi3p1hM/FMiuwNvCsDgqPGh3X2zxs0Ngv044322fNH9YDMPtet/GxMN+6CpM51HOL+6Dx25/Y/ngDzINw5n1h8zCsEP0L+jd5De09t8nzccVg/qa/q1x0PZPbBhX4Y3pR5k9zb5rfA3Mj5wvTWkS5yg+Ey5Qu/SOlOYMt8dmDP5wC5xDtqCXMBCugHDHp8V4lE0nDEN4TNBvYHNsr7Y59USvSLyBG3eq4qMP06Tj7DYNl6eAzgloT1c4Mq0ept2bmdqPYoU7NZ36v8D3wrMzzc1ev6HFQTW32GZacOr3R/vP72HyAeW0K/Hz39WrMK1VK3XOMEeGda75WaB8IpBnYVg5mtYKy1LmSmFYDAqePYZgy6s6qFuhIBJe3hFOlm2zPU1CDM4Pgr/B4DH7kFyjCUrFX8DN2EaY/euPcCqXT1qJ1chTgczmDA9O8UCXwfcwUTVPWHxYZmAHXzRZFYSvzGahw5dhqFUuDGfCEpAcH4siTb7HttXKcJnZNInM+0OCNPd+8hErVO24GDsXPEjzyjPx+B8wcg+iLCxglJCAUm0WYsNSJ2mf2qPVmYXh5MgDmNR3OHb9WxSWZgqUammLAr+/wmhpznACTi3rglFrjPBxg/IomPgSdy7dQzfVNInnwmh0yzEhKFTcGV7Hv0cjYVuT3cbiQPzHqF/+BS5fjkNrj02Y3710nrYld0YBuQW0w7B44ffzoJ746WJBmJslIcmoKsb8shlDhItV7f6nvmvqIl3MJuPS6gEY7BWGMnXrweT+aTwtMQBL1/0P9YSZExwZlrsluX1DFpAlDGcHVHyFWoxJGdVrZbKzpvayyldcvbJUv6ImJ9viuhTIvoBm8Cz3jsd1xv0hTut1atrlU76iyTiTd5xmtUaZ9ck3Xy33eqvK1yFWTPOOVfG1a49fWGq8ii6rpeByFNBvgXc/9nOvL+u3IEtPgbwTeO9hOO+qyj1RQF6B9OYMy7tHbp0CFKAABShAgZwKMAznVJDrU0AlkPzsLHYfMUarrnWlefD8UIACFKAABSig+wIMw7rfRiwhBShAAQpQgAIUoIBMAgzDMsFysxSgAAUoQAEKUIACui/AMKz7bcQSUoACFKAABShAAQrIJMAwLBMsN0sBClCAAhSgAAUooPsCDMO630YsIQUoQAEKUIACFKCATAIMwzLBcrMUoAAFKEABClCAArovwDCs+23EElKAAhSgAAUoQAEKyCTAMCwTLDdLAQpQgAIUoAAFKKD7AgzDut9GLCEFKEABClCAAhSggEwCDMMywXKzFKAABShAAQpQgAK6L5BhGH769Knul54lpAAFKEABClCAAhSgQA4EODKcAzyuSgEKUIACFKAABSig3wIMw/rdfiw9BShAAQpQgAIUoEAOBBiGc4DHVSlAAQpQgAIUoAAF9FuAYVi/24+lpwAFKEABClCAAhTIgQDDcA7wuCoFKEABClCAAhSggH4LMAzrd/ux9BSgAAUoQAEKUIACORBgGM4BHlelAAUoQAEKUIACFNBvAYZh/W4/lp4CFKAABShAAQpQIAcCDMM5wOOqFKAABShAAQpQgAL6LcAwrN/tx9JTgAIUoAAFKEABCuRAgGE4B3hclQIUoAAFKEABClBAvwUYhvW7/Vh6ClCAAhSgAAUoQIEcCDAM5wCPq1KAAhSgAAUoQAEK6LcAw7B+tx9LTwEKUIACFKAABSiQAwGG4RzgcVUKUIACFKAABShAAf0WYBjW7/Zj6SlAAQpQgAIUoAAFciDAMJwDPK5KAQpQgAIUoAAFKKDfAgzD+t1+LD0FKEABClCAAhSgQA4EGIZzgMdVKUABClCAAhSgAAX0W4BhWL/bj6WnAAUoQAEKUIACsgnEx8fj5cuX2dq+ubk5LCwssrXO+1yYYfh96nPfFKAABShAAQpQQIcFbt26BfGf7HyqVasG8R99+TAM60tLsZwUoAAFKEABClAgjwXUYbhRo0ZZ2vPp06elIMwwnCWurC4Uh8fhz1CwdFlYmWZ1HS5HAQpQgAIUyH8CN4+ux6m7BdKtWLUW/dGsivJXisRn+OevAwi5Ey/9f+k6XdGqnhV4Gs1/x4QcNRKnRkRERKBs2bLSv8VA3KFDhyztat++fe8UhqOiomBtbZ3hPsQyyTX1Is9Ghl9d9kC7LuGYHrYGjlniBJIi/THE8VuEKixRu+dGbJpaK4trpr+YAhfg0aorYkfdwQ+9lV8WD/9NRrEKxWGeoy1z5fwqcGv7ZCy93xPeY2zfWkVx2d9SvsbUHmXeWDY58gA85xxCnYnfoWN55a8jghfAc19NTJzbDaUz2boi8QGOrvKGV8B+hEcVw0ftBuJbj/6oXuj1SvEP/4TXpAUIuPIcJWv1w5QFY2BX1kS1QALu/eWLZYu34dDtaJSo7Ighc2agVy3lUZ8SfxlrZk/Hr0HhiDEvjSaf/Q/Tv26LUjxrvrXNuQAF8lpg67g6mH+soLDbZMQ/f44U82KwNFNIxejkcQazuwKx5z3R94vluGX8ERrULY2CRlG4cfIykquPhu/m0fjYLK9Lzf3pm4AYTMURXnE0+OnTp7KHYfX+xPBdu3btN7hiYmJw5swZVKxYUZYRZ50Ow2cXNMeXp0fgwFY3WOXCkaQdht8loOdCMbgJPRI4PbcxRl2bjBA/l7eWWlx2bvJ6BHrUTLNsSvwpTOvshm33y2J0QDDG1AGenZmCbv0D8KzEIGw5NhU1Mtx6DDYOboTVRv/DHPdOKG/5EH/OGoHl9wdjx95REHN1cvQWfGE/B6bD12GmswVC14/H7E3V4R26DC2Fk97jbX3RYaElxi6aiHaVLXD/8DSM/+4Jhu/dhYFVzmKa3QDcajkHU0Y2hHXcBayaOAlHyi9F0E8OHEV6a6tzAQq8HwHt85m6FOL3wWD7qYjp+hv85zRIHehJSQzFTMfeOFxuEXYI32fF3k+xuVc9EcjrMCyyXLp0KXU0WjMQq4Ow+FCeGM5NTNQDPbmHKWMYjsOtY3twUrhFI96eaWayGE7pjAzH3A3GvqPheFWoMuydWqK8NNqVgKiI/3DC2wU+KYuwQkgPNqrRW3E09+yBnbgSaQSrSvboYFcxtbMr4iLx4JkpSpUtknoSfxUdgceJ1ihfwgKaXx5LnCMRdup7uA19iFHBC+FgVRqli3AoLPcOrfyxpbRhOA6R4fEonHonIQHPIiJRwEY5hSe9MKzAv1jeuw3215+JGnt8UNbnKEaU+gW9221FsylNsNvHAisyDcPCHYy4OKF/FHp9Uktai88/2oKOwbswSEjDV5a0wvDrU3BghZPquI/Cctf6ONbsFDaME8ec4xAfVwgWqSPJCVjdvwoO2oZKv09MTISp6etj/9X5yegwygq+bylX/mhh1oIC8gqId4VmjZyCgHPPYFLsU7jN74jbEy6gZ+gPaCPtOgGX/L8SLlZDcD/WCEUrtsSUn39E9w9NEHfTD6NHeOKvu3EoYFkJzfvNwLxv7KUgm1EYFr8PPt/THbuCxkkXy5qfxLvz4NQ2EI7+f8G9sbz15tb1W+B9hOH0AnFeBGFxv7KEYfEKdE7nPvB/UB71mlWH8cML+Ne4OJ5fKoW5qmkSCjzCttEdMeOEDRo2KA+zl/dw9iwwZMMejKx/HYu694P/jRfCabwwipq3wnzhi6ORdOtnNWKqNEc16zjc/TsEkdYj8dv+8dLIWtxWVzTxqocAjZO4ZpjR/PL43nIE7GYcwfPnybAoWhif9tuEdeOr6/fRy9LnuoDm8ZOCX+FWdR8GhK2VpvpIx5P9KJQUAq442pvRyLBCCJuJplcxR2NZMYCm3JgDp6/M3xqGtSslntA6tbuBCZfXwMksCj496+Ja97tY1u/1PMKI1c7oFuSGE/6ub4zuKhCGeY7t8GjITXj1Tjv3ULzYPDTLCbOfzuXIcK4fTdygoQmI3xGz2/ZAaEMf/DSrA2wSz2G9+wj8EFQTP6jOhfeEvuroVRhT/HzxeT1j3Nk7E95/fY4FM23g07kD7vQ+DK+BpWAcfwfrPH9HE/ex0jSH9MOw8kL39xoH3rhDJdorcEUoTyf801V9oWxoLcL6ZlXgfYVhzUBcsmRJiOWQc0RY7SFLGBavTHv622J1sCeaqkajwv17ovO0D7BI9QXwcGsfOC+tBb8DM1PnL4ULP3PxqouNqjCrHS7EkB1y9hM0baya65gYiGH156Daz6GYape9MCzOGeY0iax2C8NdLjfCsPIklDY4iz8Tj7/shmHxInJNfztsLr0OOxc3hYnWPHh1S6V3Yaj+ndgXuy6pDt/j82GrmjsolqVtt434Txh/ruo0D7/92BUZP8ZguMcDa06B7Ai8PP417L5MxMKzP6GNqq9pXlR3kL4X+iBp4iXM66J1YYp7WNy5Nc7YboDnpOaqu6av955eGFb/7KLD3nTDsLi238CyCKh2MMPfZ6d+XDb/CrzPMCyqnjt3Do8ePZLuWtrZ2ckyNUKz9WQIw8qRqqC6QWk6W9rgmfY2rbpACgRjfMNZ+GjjQYwQpl2+OdKmfBDIx/csom0+Qc8BLRA8vHfqA3HZGRlmGM6/nTg3a6ZLYVh8kG7t0HZY8WQ0Vm8bpRodUo70PBuufChUMwzb/doRh7YN0phvn4Aw/wFwmRePMeu2YUh94zeoXkXfwu7vB2HpY3f8vrpLrszVz8324LYooE8C6Z2TNMNwe+FuU/9qK9AsUPksgfZHnCbh/vWPOPRPFIyL1UCXMd74tl9V6W5PRtMk/IdWwJqiW7BPuFhWf5IjQ3A2qgEafngeU1r2xIsRt9LcSdInU5Y1bwTeZxhWT41QKBRISkqS3miR3kN1uSmR62E4ow6qGYalq2HhrQ5bo2xSn4JVV8oIlui1JBju9m+G4avLHODia4OxPrPRuXI0Tqz0wrKAU2jscUsKAgzDuXlocFuigK6EYfEhvPmuQ/FHwW/w68ZBqKzxNPiO0dWxpuQfaS4+90/6BJ5Gftgzv4GqIeNwdIkLxqwtiskBG9BHmI+Y0SdFnJNc4zC+UE0H4ZFAAQq8m0B80FC0mFkBG0/MhPpdSGnD8C4Mr+2Byqq7mxnvJQ7/hq7GyC98UXfJeXzrkHEYjljXDQ4Li8Hz+Bp0Kqrc4jnvtui3siKmTimIRR7RmBGyCT146+fdGtVA1npfYVh7jvC1a9fSfagut5sh18Ow+DCAOGdpU+mANFem2lMS0rt61a6c5siwQrplZIcbn12C77APpEW1gzfDcG4fHoa3PXEqzqHD5WDfvjRMhJFY789b4GSTo9g0sQLEk1j/alvgfGo7+tqIr/7bhqGOXmi49kimc4ZTj1WNOcPiz9JOk9B+OO+1ffzDAHztMgORrVZg5Xy7N54Cl/pW79sYF7QGPYTXqcWe94Bz71AMEt4WMUh456g4orx5fFcsvNYRP/p/h2ZC2dWflLhTmDFkA5rPWyxcYIrTj4TRY7++6OFdE+tOf4e6hncIsMYUyJLAwzMB+KdIN9hLD7oF4aAw8urcqLhwXnqEv7ZdQmnntqhotBvDbSfD3D0Inv1KCSO6qrsz0yywQDVl8Mi0enC/OhqbNg2TLnLF/r79r5bo9Zk1TuzYjZqdPpNecyhud+lnTfBgwG0sEl5uk9HAk7jchoGt4PmgB5at/1b1isU4bBtlC/c/E1B7QNoL5yxVlgsZnMD7CMMZPSyX0VsmcrNRZAjDgDgfuP3Uxxi8YTcmNC4onYx3Te+DiVuqw0v1BRBzfBwcBl+Eq2oZsVLSl0BoG/R1Vr70Je00iQRhrlN1bLTZLM2VFG8TpcTvxUi7kbBwD1OODAe6wVb4kll6+he0F+Yqi+9endJvJE4Ir5IRX42V/qvVbmPSNT84GqV9oj43kbkt/RFIitwhBM+JOPisCMyTY1DoQ833ckbB/ys7zD5mBkuLAvigSlOUiriAFstzHoYrX5+N9p23wM7nIr7TeBG3+oEXv3tpR3KNFOUwNvXWqvA0+uq+cFt4DUaFCyA+tih6zt8Cj8/Eky8gvqKwl++jNxpBeVKsits7xmPgtAN4alYYBZNiEWfyCb5ZtTndaRT605IsKQXkE1D3y72Vl+GkMJ1IHNyZd3MMAg+7o7zUlzeg8cIbUmh9EPwN3L76HY8KWcFcYYqag9sj0ete6p0XReIN/DykD7xDklC4sDFexn+A7nPF/muE38d3h8e+GBSvXAEmD8ORVPFLLFe9JzijMCxdfAvbXDv6Cyw+9FB6D7FJwlMkGlfEJzXiEBbTA7/tdkcNvmtYvgMkH2w5vTBcrFjWXsgnvpc4u3+BTvyDGqdOncrwYTl1IM7udrPaFLKEYfHK9M9pvTHutwiYWFnBOKkQ2g1sgBCfaI0/upEgnYT7TzqA2MKFpSvml/HFhS+BTaknce05w4l3V6KfyyLcKdEYdWye47GiKizu7kDZscowrH6Lhd+dgrAS3nVlYtkGPRqeQsDjb9INwymJuzG66dc4lPgBSjacir1resIiq3JcLl8LvHhyDzEmZdJ93V5mv3t3lATsHT8E/07wk16X9i4f5R+REd6/8k5/rVH5OsMXZiWk1xDyQwEKZC6giItGjGkR6bWKYt+LSSwKK9UD43HR0TAt8voVn+LrDR+HP4Fx8QooWijtW2nUexFfDXr/SQKKaPVfZb+OQYF3ev2ncr+vLJT9Wjw3X79qjJq1irN5KZCpgGYYjhNe7/nw4cNsiZUpUwblypXL1jriX7mrVKlShg/Lve332dqZ1sKyhOG3de60ZVB1VmOrNO8HzrhSyuWTM/xiyN5JPWcBIif0XJcCrwWSHgdizb66GKR6OIY2FKBA/hTQfkVj/qwla6XvApphOLM/kazv9VSXX9YwnF+QWA8KUIACFKBAbggocAJLh55Dm5UjUS83NshtUEAGAXUYzs6mxWkUtra22VlFZ5ZlGNaZpmBBKEABClCAAhSgwPsXEOfwRkREZKsg4h/HyO7UiGztQMaFGYZlxOWmKUABClCAAhSgAAV0W4BhWLfbh6WjAAUoQAEKUIACFJBRgGFYRlxumgIUoAAFKEABClBAtwUYhnW7fVg6ClCAAhSgAAUoQAEZBRiGZcTlpilAAQpQgAIUoAAFdFuAYVi324elowAFKEABClCAAhSQUYBhWEZcbpoCFKAABShAAQpQQLcFGIZ1u31YOgpQgAIUoAAFKEABGQUYhmXE5aYpQAEKUIACFKAABXRbgGFYt9uHpaMABShAAQpQgAIUkFGAYVhGXG6aAhSgAAUoQAEKUEC3BRiGdbt9WDoKUIACFKAABShAARkFGIZlxOWmKUABClCAAhSgAAV0W4BhWLfbh6WjAAUoQAEKUIACFJBRgGFYRlxumgIUoAAFKEABClBAtwUYhnW7fVg6ClCAAhSgAAUoQAEZBRiGZcTlpilAAQpQgAIUoAAFdFuAYVi324elowAFKEABClCAAhSQUYBhWEZcbpoCFKAABShAAQpQQLcFGIZ1u31YOgpQgL34LDMAAACaSURBVAIUoAAFKEABGQUYhmXE5aYpQAEKUIACFKAABXRbgGFYt9uHpaMABShAAQpQgAIUkFGAYVhGXG6aAhSgAAUoQAEKUEC3BRiGdbt9WDoKUIACFKAABShAARkFGIZlxOWmKUABClCAAhSgAAV0W4BhWLfbh6WjAAUoQAEKUIACFJBRgGFYRlxumgIUoAAFKEABClBAtwX+D4/nW2ut3arDAAAAAElFTkSuQmCC)"
196
+ ]
197
+ },
198
+ {
199
+ "cell_type": "markdown",
200
+ "metadata": {
201
+ "id": "WSI47-j0DBur"
202
+ },
203
+ "source": [
204
+ "6. Click the two overlaid squares ![image.png](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAEEAAAA/CAYAAAC/36X0AAABf0lEQVR4Xu2aPYqFMBhFvwcWKlhoqaWFG9DNuwVXYGGprYUgdjMRMTgyPHRiRpk5AeEJ5u+8e6+R5PWhivzz8gKCCBCUC4AAhCUMUQIQUIJeGGAH7IAdsMP2S4FMuDoT+r6Xqqoe8TlWFIWEYXhoLJcqAQiKORC+geC6rvi+f0iSpg+N4yjTNOlmHmOHNE1lvn6jNE0j87UWICgSQLANYQ6/tTiOI0EQ6Pt9MP5ZO5RlqScdRZHkeQ4EIKAEEeygkgAIQFheCCgBCCiBxRIrRqUBIABhiQKUAASUoF+L2AE7YAfssN03IBPIhPOZEMexJEmyFZK1323bStd1un2rmy9n7GBtxgcaBoKCBIQ7IQzDIHVdHxCr/UeyLPuyRfiux9MnVd5lgv2p2ekBCIorEEwheJ4n81rgjjIfBbpqDWKkhDsmv/a53/gxGQsQTO1gQt+07q1K2B7XMZ2ISf39USGTtk7bwaSzp9YFwk8y4an/psm4UAJKWPSDEoCwKOET7bDhYssL2DsAAAAASUVORK5CYII=) to copy the API key; then run the code below and paste it into the input text box and press Enter."
205
+ ]
206
+ },
207
+ {
208
+ "cell_type": "code",
209
+ "execution_count": null,
210
+ "metadata": {
211
+ "id": "uNA8U5gvDBus"
212
+ },
213
+ "outputs": [],
214
+ "source": [
215
+ "os.environ['ACTIVELOOP_TOKEN'] = getpass(\"DeepLake API key: \")"
216
+ ]
217
+ },
218
+ {
219
+ "cell_type": "markdown",
220
+ "metadata": {
221
+ "id": "8CmfEgseDBus"
222
+ },
223
+ "source": [
224
+ "Finally, pick a name for your dataset. It doesn't matter what this is, but keep in mind that it will be shared with the students."
225
+ ]
226
+ },
227
+ {
228
+ "cell_type": "code",
229
+ "execution_count": null,
230
+ "metadata": {
231
+ "id": "dosq4FolDBus"
232
+ },
233
+ "outputs": [],
234
+ "source": [
235
+ "dataset_name = input(\"Enter a name for your dataset: \")"
236
+ ]
237
+ },
238
+ {
239
+ "cell_type": "markdown",
240
+ "metadata": {
241
+ "id": "m0zs864gDBut"
242
+ },
243
+ "source": [
244
+ "## Processing The Document(s)\n",
245
+ "\n",
246
+ "In this part, you will upload the documents you want the students / model to reference; the embeddings will be created from those documents.\n",
247
+ "\n",
248
+ "**Note: The embeddings of all the documents you share will be publicly available. Do not use this for any documents you want to keep private.**"
249
+ ]
250
+ },
251
+ {
252
+ "cell_type": "markdown",
253
+ "metadata": {
254
+ "id": "eOEJ7rnlDBut"
255
+ },
256
+ "source": [
257
+ "First, upload your documents to Google Colab. To do this:\n",
258
+ "1. Click on the 'file' icon ![image1_3.png](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAFQAAABUCAYAAAAcaxDBAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAALEoAACxKAXd6dE0AAAIKSURBVHhe7do/kmFRFMfxM7MAGyCRKZEI60CA0CJkEjKJHZApAevwJxJhAWyABfTM6T7JdDZ1v6/rvfL7VHV1ndfJ7W9f+ql3f338ZYL5Hd8FoqAwBYUpKExBYQoKU1CYgsIUFKagMAWFKShMQWEKClNQmILCFBSmoDAFhSkoTEFhCgpTUJiCwrCDDs/n01arlW23W3s8HnH1/83nc+v1ejEVDxLUYw4GA7ter3ElTZGjIi/56XSKxXTj8fhzpxdR8g713dloNGJiFXGnJgc9HA42HA5j4k0mE6vX6zExyuWyVSqVmFi5D5qVVquVyR/rbW+bjsfj5z/Sy+USVxhvfR/6er1sNpvFxHj7G3vfqff7PaZ0mbyHNptNa7fbMeWLr/d0OsX0Zb1ec+v1oCn2+/1HtVr952uxWMRP88fX9n29/jtQ9FkepqAwBYUpKExBYQoKU1CYgsIUFKagMAWFKShMQWEKClNQmILCFBSmoDAFhSkoTEFhmTxGzvLsUCp/Bv/9/Cr5GDnXp+9+yvl8tlKpFFOa5Je8L6TT6cRUPL52KqZL3qHOd2m/37fb7RZXiqFWq9lms8lfUOdRl8ul7Xa7pDP2P8Hf47vdro1GIzSmw4LKF902wRQUpqAwBYUpKExBYQoKU1CYgsIUFKagMAWFKShMQWEKClNQmILCFBSmoDAFhSkoyuwPHzLZbD4ZgJMAAAAASUVORK5CYII=) at the bottom of the sidebar to the left of these instructions.\n",
259
+ "2. Click on the 'upload file' icon ![image2_1.png](data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAFcAAABaCAYAAADNVsqyAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAALEoAACxKAXd6dE0AAAKKSURBVHhe7dw7bsJAFIVhJx3UtLARaNkILIKakkXAGhAlNSyCmj3QJjnRjOQgY173eO5kzi+h2Fgok08Tg8zAx9dPlaL0GX4qQsIlJlxiwiUmXGLCJSZcYsIlJlxiwiUmXGLCJSZcYsIlJlxiwiUmXGLCJUZ9m+d4PFb7/b46n8/hHm7j8biaz+dhL30U3MvlUq1Wq85Q63kCppwWUsEi/Les1+uwlzZz3MPhkAw25gXYHBd/mIc8AJvjnk6nsJW+1MDmT2iz2Sxs/W2z2YQt2279vnqpnuSKeJ2bagYXgYtSABeDi7oGLgoXdQlcHC7qCrhIXNQFcLG4iA1cNC5iAhePi1jAwg0xgIVbyxpYuFcB2CrhEhMusewvOb4Te6yaucSES0y4xIRLTLjEssLFmgjcmFeyLMsGF6h4iYQb80qWZVngRth6OQC7x22CjXkHdo3bBhvzDOwW9xHYGIC3223Y85NL3GdgY7vd7vdxnnKH+wpsDI/zBOwK9x3YmCdgN7gWsDEvwC5wLWFjHoCT496D7fV6Yau5tuOpgZPjtq1EB9xisQh7zeF4G3DKle7JcbHiGyu/r4uwo9Eo3NMcjt8CTv2xKRfn3Gvg4XBYLZfLu7CxJmAPn0dzgYsiMGABNRgMwpHHqgN7gEVucBFAANTv98M9zwVgzHgPsMgVLnoVNvbsjGfmDvc/JVxiwiUmXGLCJSZcYlrl2JDVWDVziQmXWGenhZxye1rAhZecsxy/Oe50Og1beWY5fnPcyWSS7ezFuDF+qyhPaLhsmBswxotxW0b5RrwYvmoQS41Sf89YW0DFqcByxsaouKWn17nEhEtMuMSES0y4xIRLTLjEhEtMuMSES6uqvgFS+TQXb05HUQAAAABJRU5ErkJggg==) on the left of the Files toolbar.\n",
260
+ "3. Select all of the files you want to upload, then click 'Open'.\n",
261
+ "4. A warning should pop up. Click 'OK' to continue.\n",
262
+ "5. Wait until the spinning circle in the bottom of the 'Files' section disappears. This means that all of the files have been uploaded."
263
+ ]
264
+ },
265
+ {
266
+ "cell_type": "markdown",
267
+ "metadata": {
268
+ "id": "hcRqREcvIeY7"
269
+ },
270
+ "source": [
271
+ "### Adding YouTube Videos / Websites\n",
272
+ "If you have any websites or YouTube videos which also contain content which you want to put into your data lake, paste those links one at a time into the text box below, pressing 'Enter' after each one. Once you have entered all the links, press 'Enter' without typing anything to finish execution of the code cell.\n",
273
+ "\n",
274
+ "If you have no URLs to add, just click on the box and press 'Enter' without typing anything."
275
+ ]
276
+ },
277
+ {
278
+ "cell_type": "code",
279
+ "execution_count": null,
280
+ "metadata": {
281
+ "id": "19tljTkqIiz9"
282
+ },
283
+ "outputs": [],
284
+ "source": [
285
+ "url_list = []\n",
286
+ "while (url := input(\"Enter a YouTube / website link: \")): url_list.append(url)"
287
+ ]
288
+ },
289
+ {
290
+ "cell_type": "markdown",
291
+ "metadata": {
292
+ "id": "qAVRPqDsDBuu"
293
+ },
294
+ "source": [
295
+ "## Embedding & Database Creation"
296
+ ]
297
+ },
298
+ {
299
+ "cell_type": "markdown",
300
+ "metadata": {
301
+ "id": "_N4erDDkDBuu"
302
+ },
303
+ "source": [
304
+ "Finally, run the following code."
305
+ ]
306
+ },
307
+ {
308
+ "cell_type": "code",
309
+ "execution_count": null,
310
+ "metadata": {
311
+ "id": "G2DNpm5dDqJ8"
312
+ },
313
+ "outputs": [],
314
+ "source": [
315
+ "# Functions to extract the text from YouTube videos / websites\n",
316
+ "def save_text(text, text_name = None):\n",
317
+ " if not text_name: text_name = text[:20]\n",
318
+ " text_path = os.path.join(\"/content\",text_name+\".txt\")\n",
319
+ " with open(text_path, \"x\") as f:\n",
320
+ " f.write(text)\n",
321
+ " # Return the location at which the transcript is saved\n",
322
+ " return text_path\n",
323
+ "\n",
324
+ "def save_youtube_transcript(url, save_dir = \"sample_data\"):\n",
325
+ " # Transcribe the videos to text and save to file in /content\n",
326
+ " # save_dir: directory to save audio files\n",
327
+ " youtube_loader = GenericLoader(YoutubeAudioLoader([url], save_dir),\n",
328
+ " OpenAIWhisperParser())\n",
329
+ " youtube_docs = youtube_loader.load()\n",
330
+ " # Combine doc\n",
331
+ " combined_docs = [doc.page_content for doc in youtube_docs]\n",
332
+ " text = \" \".join(combined_docs)\n",
333
+ " # Save text to file\n",
334
+ " video_path = youtube_docs[0].metadata[\"source\"]\n",
335
+ " youtube_name = os.path.splitext(os.path.basename(video_path))[0]\n",
336
+ " return save_text(text, youtube_name)\n",
337
+ "\n",
338
+ "# Multiple ways of extracting web content\n",
339
+ "\n",
340
+ "def website_webbase(url):\n",
341
+ " website_loader = WebBaseLoader(url)\n",
342
+ " website_data = website_loader.load()\n",
343
+ " # Combine doc\n",
344
+ " combined_docs = [doc.page_content for doc in website_data]\n",
345
+ " text = \" \".join(combined_docs)\n",
346
+ " return text\n",
347
+ "\n",
348
+ "def website_trafilatura(url):\n",
349
+ " downloaded = trafilatura.fetch_url(url)\n",
350
+ " return trafilatura.extract(downloaded)\n",
351
+ "\n",
352
+ "def website_justext(url):\n",
353
+ " response = requests.get(url)\n",
354
+ " paragraphs = justext.justext(response.content, justext.get_stoplist(\"English\"))\n",
355
+ " content = [paragraph.text for paragraph in paragraphs \\\n",
356
+ " if not paragraph.is_boilerplate]\n",
357
+ " text = \" \".join(content)\n",
358
+ " return text\n",
359
+ "\n",
360
+ "def get_website_youtube_text_file(urls):\n",
361
+ " for url in urls:\n",
362
+ " # This is a bit of a hacky way to determine whether it's youtube video\n",
363
+ " if \"youtube.com\" in url or \"youtu.be\" in url: save_youtube_transcript(url)\n",
364
+ " else: save_text(website_webbase(url))"
365
+ ]
366
+ },
367
+ {
368
+ "cell_type": "code",
369
+ "execution_count": null,
370
+ "metadata": {
371
+ "id": "H3gtJjSfoK5C"
372
+ },
373
+ "outputs": [],
374
+ "source": [
375
+ "get_website_youtube_text_file(url_list)"
376
+ ]
377
+ },
378
+ {
379
+ "cell_type": "code",
380
+ "execution_count": null,
381
+ "metadata": {
382
+ "id": "gFNbFdEKDBuu"
383
+ },
384
+ "outputs": [],
385
+ "source": [
386
+ "### Embedding Creation ###\n",
387
+ "\n",
388
+ "# See https://github.com/hwchase17/langchain/discussions/3786\n",
389
+ "# for discussion of which splitter to use\n",
390
+ "text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)\n",
391
+ "\n",
392
+ "all_document_segments = []\n",
393
+ "print(\"The following files have been uploaded:\")\n",
394
+ "for filename in os.listdir():\n",
395
+ " if filename not in [\".config\", \"sample_data\"]:\n",
396
+ " print(\"\\t\"+filename)\n",
397
+ " loader = UnstructuredFileLoader(filename)\n",
398
+ " documents = loader.load()\n",
399
+ " document_segments = text_splitter.split_documents(documents)\n",
400
+ " all_document_segments.extend(document_segments)"
401
+ ]
402
+ },
403
+ {
404
+ "cell_type": "markdown",
405
+ "metadata": {
406
+ "id": "cBcOSf2rDBuv"
407
+ },
408
+ "source": [
409
+ "Make sure that all of your documents are shown in the output from the previous code cell, then continue execution."
410
+ ]
411
+ },
412
+ {
413
+ "cell_type": "code",
414
+ "execution_count": null,
415
+ "metadata": {
416
+ "id": "HH8WMgWSDBuw"
417
+ },
418
+ "outputs": [],
419
+ "source": [
420
+ "model_name = 'text-embedding-ada-002'\n",
421
+ "model_embedding_dimension = 1536\n",
422
+ "\n",
423
+ "embeddings = OpenAIEmbeddings(\n",
424
+ " model=model_name\n",
425
+ ")"
426
+ ]
427
+ },
428
+ {
429
+ "cell_type": "code",
430
+ "execution_count": null,
431
+ "metadata": {
432
+ "id": "HXJYOjYKDBuw"
433
+ },
434
+ "outputs": [],
435
+ "source": [
436
+ "### Dataset Creation ###\n",
437
+ "dataset_path = f\"hub://{DEEPLAKE_USERNAME}/{dataset_name}\"\n",
438
+ "db = DeepLake.from_documents(all_document_segments, dataset_path=dataset_path,\n",
439
+ " embedding=embeddings, public=True)"
440
+ ]
441
+ },
442
+ {
443
+ "cell_type": "markdown",
444
+ "metadata": {
445
+ "id": "01UMqQcHDBux"
446
+ },
447
+ "source": [
448
+ "### Sharing With Students"
449
+ ]
450
+ },
451
+ {
452
+ "cell_type": "code",
453
+ "execution_count": null,
454
+ "metadata": {
455
+ "id": "l3TMy84LDBux"
456
+ },
457
+ "outputs": [],
458
+ "source": [
459
+ "display(Markdown(f'''To let students access the repository, give them the following URL:\n",
460
+ "\n",
461
+ "`{dataset_path}`'''))"
462
+ ]
463
+ },
464
+ {
465
+ "cell_type": "markdown",
466
+ "metadata": {
467
+ "id": "iUeTuV-WDBuy"
468
+ },
469
+ "source": [
470
+ "Distribute the URL above to students. They will copy and paste it into the LLM learning application, which then allows their models to use all of the documents you uploaded as reference sources when responding to or creating questions."
471
+ ]
472
+ }
473
+ ],
474
+ "metadata": {
475
+ "colab": {
476
+ "include_colab_link": true,
477
+ "provenance": []
478
+ },
479
+ "kernelspec": {
480
+ "display_name": "Python 3",
481
+ "name": "python3"
482
+ },
483
+ "language_info": {
484
+ "name": "python"
485
+ }
486
+ },
487
+ "nbformat": 4,
488
+ "nbformat_minor": 0
489
+ }
prompt_with_context.ipynb ADDED
@@ -0,0 +1,962 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "colab_type": "text",
7
+ "id": "view-in-github"
8
+ },
9
+ "source": [
10
+ "<a href=\"https://colab.research.google.com/github/vanderbilt-data-science/lo-achievement/blob/main/prompt_with_context.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
11
+ ]
12
+ },
13
+ {
14
+ "cell_type": "markdown",
15
+ "metadata": {
16
+ "id": "S4MkldrwPA_S"
17
+ },
18
+ "source": [
19
+ "# LLMs for Self-Study\n",
20
+ "> A prompt and code template for better understanding texts\n",
21
+ "\n",
22
+ "This notebook provides a guide for using LLMs for self-study programmatically. A number of prompt templates are provided to assist with generating great assessments for self-study, and code is additionally provided for fast usage. This notebook is best leveraged for a set of documents (text or PDF preferred) **to be uploaded** for interaction with the model.\n",
23
+ "\n",
24
+ "This version of the notebook is best suited for those who prefer to use files from their local drive as context rather than copy and pasting directly into the notebook to be used as context for the model. If you prefer to copy and paste text, you should direct yourself to the [prompt_with_context](https://colab.research.google.com/github/vanderbilt-data-science/lo-achievement/blob/main/prompt_with_context.ipynb) notebook."
25
+ ]
26
+ },
27
+ {
28
+ "cell_type": "markdown",
29
+ "metadata": {
30
+ "id": "6Gf-_aZVPA_S"
31
+ },
32
+ "source": [
33
+ "# Code Setup\n",
34
+ "Run the following cells to setup the rest of the environment for prompting. In the following section, we set up the computational environment with imported code, setup your API key access to OpenAI, and loading access to your language model. Note that the following cells may take a long time to run."
35
+ ]
36
+ },
37
+ {
38
+ "cell_type": "markdown",
39
+ "metadata": {
40
+ "id": "6040J5eXPA_T"
41
+ },
42
+ "source": [
43
+ "## Library installation and loading\n",
44
+ "The following `pip install` code should be run if you're using Google Colab, or otherwise do not have a computational environment (e.g., _venv_, _conda virtual environment_, _Docker, Singularity, or other container_) with these packages installed."
45
+ ]
46
+ },
47
+ {
48
+ "cell_type": "code",
49
+ "execution_count": 1,
50
+ "metadata": {
51
+ "colab": {
52
+ "base_uri": "https://localhost:8080/"
53
+ },
54
+ "id": "7_XCtEMbPA_T",
55
+ "outputId": "aa36bcf0-1d11-4947-f96b-d28c67d4e431"
56
+ },
57
+ "outputs": [
58
+ {
59
+ "name": "stdout",
60
+ "output_type": "stream",
61
+ "text": [
62
+ "\u001b[33mDEPRECATION: pyodbc 4.0.0-unsupported has a non-standard version number. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pyodbc or contact the author to suggest that they release a version with a conforming version number. Discussion can be found at https://github.com/pypa/pip/issues/12063\u001b[0m\u001b[33m\n",
63
+ "\u001b[0m"
64
+ ]
65
+ }
66
+ ],
67
+ "source": [
68
+ "# run this code if you're using Google Colab or don't have these packages installed in your computing environment\n",
69
+ "! pip install -q langchain openai gradio numpy tiktoken"
70
+ ]
71
+ },
72
+ {
73
+ "cell_type": "code",
74
+ "execution_count": 2,
75
+ "metadata": {
76
+ "id": "6fdfMar8PA_U"
77
+ },
78
+ "outputs": [],
79
+ "source": [
80
+ "# import required libraries\n",
81
+ "import numpy as np\n",
82
+ "import getpass\n",
83
+ "import os\n",
84
+ "from langchain.chat_models import ChatOpenAI\n",
85
+ "from langchain.chains import RetrievalQA\n",
86
+ "from langchain.schema import SystemMessage, HumanMessage, AIMessage"
87
+ ]
88
+ },
89
+ {
90
+ "cell_type": "markdown",
91
+ "metadata": {
92
+ "id": "5iI75yjaPA_V"
93
+ },
94
+ "source": [
95
+ "## API and model setup\n",
96
+ "\n",
97
+ "Use these cells to load the API keys required for this notebook and create a basic OpenAI LLM model. The code below uses the variable you created above when you input your API Key."
98
+ ]
99
+ },
100
+ {
101
+ "cell_type": "code",
102
+ "execution_count": 3,
103
+ "metadata": {
104
+ "colab": {
105
+ "base_uri": "https://localhost:8080/"
106
+ },
107
+ "id": "8IgmjGEFPA_V",
108
+ "outputId": "5ac82647-2f93-4bd3-9284-e18dd04b1c81"
109
+ },
110
+ "outputs": [
111
+ {
112
+ "name": "stdout",
113
+ "output_type": "stream",
114
+ "text": [
115
+ "··········\n"
116
+ ]
117
+ }
118
+ ],
119
+ "source": [
120
+ "# Set up OpenAI API Key\n",
121
+ "openai_api_key = getpass.getpass()\n",
122
+ "os.environ[\"OPENAI_API_KEY\"] = openai_api_key\n",
123
+ "\n",
124
+ "llm = ChatOpenAI(model='gpt-3.5-turbo-16k')\n",
125
+ "messages = [\n",
126
+ " SystemMessage(content=\"You are a world-class tutor helping students to perform better on oral and written exams though interactive experiences.\"),\n",
127
+ " HumanMessage(content=\"\")\n",
128
+ "]\n"
129
+ ]
130
+ },
131
+ {
132
+ "cell_type": "markdown",
133
+ "metadata": {
134
+ "id": "Zlx1_Y8rYGn3"
135
+ },
136
+ "source": [
137
+ "# Add your context and assign the prefix to your query.\n",
138
+ "The query assigned here serves as an example."
139
+ ]
140
+ },
141
+ {
142
+ "cell_type": "code",
143
+ "execution_count": 4,
144
+ "metadata": {
145
+ "id": "g_CAvacjEuCd"
146
+ },
147
+ "outputs": [],
148
+ "source": [
149
+ "context = \"\"\" Two roads diverged in a yellow wood,\n",
150
+ "And sorry I could not travel both\n",
151
+ "And be one traveler, long I stood\n",
152
+ "And looked down one as far as I could\n",
153
+ "To where it bent in the undergrowth;\n",
154
+ "Then took the other, as just as fair,\n",
155
+ "And having perhaps the better claim,\n",
156
+ "Because it was grassy and wanted wear;\n",
157
+ "Though as for that the passing there\n",
158
+ "Had worn them really about the same,\n",
159
+ "And both that morning equally lay\n",
160
+ "In leaves no step had trodden black.\n",
161
+ "Oh, I kept the first for another day!\n",
162
+ "Yet knowing how way leads on to way,\n",
163
+ "I doubted if I should ever come back.\n",
164
+ "I shall be telling this with a sigh\n",
165
+ "Somewhere ages and ages hence:\n",
166
+ "Two roads diverged in a wood, and I—\n",
167
+ "I took the one less traveled by,\n",
168
+ "And that has made all the difference.\n",
169
+ "—-Robert Frost—-\n",
170
+ "Education Place: http://www.eduplace.com \"\"\"\n",
171
+ "\n",
172
+ "# Query prefix\n",
173
+ "query_prefix = \"The following text should be used as the basis for the instructions which follow: \" + context + '\\n'\n",
174
+ "\n",
175
+ "# Query\n",
176
+ "query = \"\"\"Please design a 5 question quiz about Robert Frost's \"Road Not Taken\" which reflects the learning objectives:\n",
177
+ "1. Identify the key elements of the poem: narrator, setting, and underlying message.\n",
178
+ "2. Understand the literary devices used in poetry and their purposes. The questions should be multiple choice.\n",
179
+ "Provide one question at a time, and wait for my response before providing me with feedback.\n",
180
+ "Again, while the quiz asks for 5 questions, you should\n",
181
+ "only provide ONE question in you initial response. Do not include the answer in your response.\n",
182
+ "If I get an answer wrong, provide me with an explanation of why it was incorrect, and then give me additional\n",
183
+ "chances to respond until I get the correct choice. Explain why the correct choice is right. \"\"\""
184
+ ]
185
+ },
186
+ {
187
+ "cell_type": "markdown",
188
+ "metadata": {
189
+ "id": "hLMJRXc8PA_W"
190
+ },
191
+ "source": [
192
+ "# A guide to prompting for self-study\n",
193
+ "In this section, we provide a number of different approaches for using AI to help you assess and explain the knowledge of your document. Start by interacting with the model and then try out the rest of the prompts!"
194
+ ]
195
+ },
196
+ {
197
+ "cell_type": "markdown",
198
+ "metadata": {
199
+ "id": "jNBwByBFaJVP"
200
+ },
201
+ "source": [
202
+ "## Interact with the model\n",
203
+ "\n",
204
+ "Now that your vector store is created, you can begin interacting with the model! Below, we have a comprehensive list of examples using different question types, but feel free to use this code block to experiment with the model.\n",
205
+ "\n",
206
+ "Input your prompt into the empty string in the code cell. See example below:\n",
207
+ "\n",
208
+ "\n",
209
+ "\n",
210
+ "```\n",
211
+ "query = 'Your Prompt Here'\n",
212
+ "```\n",
213
+ "\n"
214
+ ]
215
+ },
216
+ {
217
+ "cell_type": "code",
218
+ "execution_count": 5,
219
+ "metadata": {
220
+ "id": "3fPd2b3yUZic"
221
+ },
222
+ "outputs": [],
223
+ "source": [
224
+ "def prompt_define(query, prefix, context):\n",
225
+ " prompt = (query + prefix)\n",
226
+ " prompt = prompt + context\n",
227
+ " return prompt\n",
228
+ "\n",
229
+ "def get_result(prompt):\n",
230
+ " messages[1] = HumanMessage(content=prompt)\n",
231
+ " result = llm(messages)\n",
232
+ " str_result = str(result)\n",
233
+ " import re\n",
234
+ " str_result = str_result.replace(r'\\n', '\\n')\n",
235
+ " str_result = str_result.replace(\"content='\", \"\")\n",
236
+ " str_result = str_result.replace(\"' additional_kwargs={} example=False\", \"\")\n",
237
+ " return str_result"
238
+ ]
239
+ },
240
+ {
241
+ "cell_type": "code",
242
+ "execution_count": null,
243
+ "metadata": {
244
+ "id": "cskXB5p5DupT"
245
+ },
246
+ "outputs": [],
247
+ "source": [
248
+ "query = 'Your Prompt Here'\n",
249
+ "\n",
250
+ "prompt = prompt_define(query, query_prefix, context)\n",
251
+ "\n",
252
+ "result = get_result(prompt)\n",
253
+ "\n",
254
+ "print(result)"
255
+ ]
256
+ },
257
+ {
258
+ "cell_type": "markdown",
259
+ "metadata": {
260
+ "id": "FLy4qDuWclWF"
261
+ },
262
+ "source": [
263
+ "### Our example using query from cell 103\n",
264
+ "\n",
265
+ "Run the following code to see a simple example using the prompt defined in an earlier cell (#103)."
266
+ ]
267
+ },
268
+ {
269
+ "cell_type": "code",
270
+ "execution_count": 6,
271
+ "metadata": {
272
+ "colab": {
273
+ "base_uri": "https://localhost:8080/"
274
+ },
275
+ "id": "4QBBlJI-adPf",
276
+ "outputId": "0d421ad5-74ca-40eb-bd68-afcb47768275"
277
+ },
278
+ "outputs": [
279
+ {
280
+ "name": "stdout",
281
+ "output_type": "stream",
282
+ "text": [
283
+ "Question 1: Who is the narrator of the poem \"Road Not Taken\"?\n",
284
+ "\n",
285
+ "a) Robert Frost\n",
286
+ "b) A traveler\n",
287
+ "c) The poet\\'s friend\n",
288
+ "d) The reader\n"
289
+ ]
290
+ }
291
+ ],
292
+ "source": [
293
+ "# Experiment with interacting with the model by inputting your own prompts into the empty string below.\n",
294
+ "def prompt_define(query, prefix, context):\n",
295
+ " prompt = (query + prefix)\n",
296
+ " prompt = prompt + context\n",
297
+ " return prompt\n",
298
+ "\n",
299
+ "prompt = prompt_define(query, query_prefix, context)\n",
300
+ "\n",
301
+ "def get_result(prompt):\n",
302
+ " messages[1] = HumanMessage(content=prompt)\n",
303
+ " result = llm(messages)\n",
304
+ " str_result = str(result)\n",
305
+ " import re\n",
306
+ " str_result = str_result.replace(r'\\n', '\\n')\n",
307
+ " str_result = str_result.replace(\"content='\", \"\")\n",
308
+ " str_result = str_result.replace(\"' additional_kwargs={} example=False\", \"\")\n",
309
+ " return str_result\n",
310
+ "\n",
311
+ "result = get_result(prompt)\n",
312
+ "\n",
313
+ "print(result)"
314
+ ]
315
+ },
316
+ {
317
+ "cell_type": "markdown",
318
+ "metadata": {
319
+ "id": "KYro6H82bENS"
320
+ },
321
+ "source": [
322
+ "## Types of Questions and Prompts\n",
323
+ "\n",
324
+ "Below is a comprehensive list of question types and prompt templates designed by our team. There are also example code blocks, where you can see how the model performed with the example and try it for yourself using the prompt template."
325
+ ]
326
+ },
327
+ {
328
+ "cell_type": "markdown",
329
+ "metadata": {
330
+ "id": "WAUSc7qJPA_X"
331
+ },
332
+ "source": [
333
+ "### Multiple Choice\n",
334
+ "\n",
335
+ "Prompt: The following text should be used as the basis for the instructions which follow: {context}. Please design a {number of questions} question quiz about {name or reference to context} which reflects the learning objectives: {list of learning objectives}. The questions should be multiple choice. Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide ONE question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect,and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right."
336
+ ]
337
+ },
338
+ {
339
+ "cell_type": "code",
340
+ "execution_count": 7,
341
+ "metadata": {
342
+ "colab": {
343
+ "base_uri": "https://localhost:8080/"
344
+ },
345
+ "id": "nFMRDL4cPA_X",
346
+ "outputId": "b341fcb8-49df-43e6-9810-7ff3f232949e"
347
+ },
348
+ "outputs": [
349
+ {
350
+ "name": "stdout",
351
+ "output_type": "stream",
352
+ "text": [
353
+ "Question 1: Who is the narrator of the poem \"Road Not Taken\"?\n",
354
+ "\n",
355
+ "A) Robert Frost\n",
356
+ "B) The reader\n",
357
+ "C) The traveler\n",
358
+ "D) The poet\n"
359
+ ]
360
+ }
361
+ ],
362
+ "source": [
363
+ "# Multiple choice code example\n",
364
+ "query = \"\"\"Please design a 5 question quiz about Robert Frost's \"Road Not Taken\" which reflects the learning objectives:\n",
365
+ "1. Identify the key elements of the poem: narrator, setting, and underlying message.\n",
366
+ "2. Understand the literary devices used in poetry and their purposes. The questions should be multiple choice.\n",
367
+ "Provide one question at a time, and wait for my response before providing me with feedback.\n",
368
+ "Again, while the quiz asks for 5 questions, you should\n",
369
+ "only provide ONE question in you initial response. Do not include the answer in your response\n",
370
+ "If I get an answer wrong, provide me with an explanation of why it was incorrect, and then give me additional\n",
371
+ "chances to respond until I get the correct choice. Explain why the correct choice is right. \"\"\"\n",
372
+ "\n",
373
+ "def prompt_define(query, prefix, context):\n",
374
+ " prompt = (query + prefix)\n",
375
+ " prompt = prompt + context\n",
376
+ " return prompt\n",
377
+ "\n",
378
+ "prompt = prompt_define(query, query_prefix, context)\n",
379
+ "\n",
380
+ "def get_result(prompt):\n",
381
+ " messages[1] = HumanMessage(content=prompt)\n",
382
+ " result = llm(messages)\n",
383
+ " str_result = str(result)\n",
384
+ " import re\n",
385
+ " str_result = str_result.replace(r'\\n', '\\n')\n",
386
+ " str_result = str_result.replace(\"content='\", \"\")\n",
387
+ " str_result = str_result.replace(\"' additional_kwargs={} example=False\", \"\")\n",
388
+ " return str_result\n",
389
+ "\n",
390
+ "result = get_result(prompt)\n",
391
+ "\n",
392
+ "print(result)"
393
+ ]
394
+ },
395
+ {
396
+ "cell_type": "markdown",
397
+ "metadata": {
398
+ "id": "WkSIU94GPA_Y"
399
+ },
400
+ "source": [
401
+ "### Short Answer\n",
402
+ "\n",
403
+ "Prompt: Please design a {number of questions} question quiz about {context} which reflects the learning objectives: {list of learning objectives}. The questions should be short answer. Expect the correct answers to be {anticipated length} long. Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide ONE question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect,and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right."
404
+ ]
405
+ },
406
+ {
407
+ "cell_type": "code",
408
+ "execution_count": 8,
409
+ "metadata": {
410
+ "colab": {
411
+ "base_uri": "https://localhost:8080/"
412
+ },
413
+ "id": "enC8ydfEPA_Y",
414
+ "outputId": "639efa60-3304-4b63-c0e8-a975d2d7dfaf"
415
+ },
416
+ "outputs": [
417
+ {
418
+ "name": "stdout",
419
+ "output_type": "stream",
420
+ "text": [
421
+ "Question 1: Who is the narrator of the poem \"Road Not Taken\"?\n"
422
+ ]
423
+ }
424
+ ],
425
+ "source": [
426
+ "# Short answer code example\n",
427
+ "query = \"\"\" Please design a 5-question quiz about Robert Frost's\n",
428
+ "\"Road Not Taken\" which reflects the learning objectives:\n",
429
+ "1. Identify the key elements of the poem: narrator, setting, and underlying message.\n",
430
+ "2. Understand the literary devices used in poetry and their purposes.\n",
431
+ "The questions should be short answer. Expect the correct answers to be\n",
432
+ "1-2 sentences long.\n",
433
+ "Provide one question at a time, and wait for my response before providing me with feedback.\n",
434
+ "Again, while the quiz asks for 5 questions, you should\n",
435
+ "only provide ONE question in you initial response. Do not include the answer in your response\n",
436
+ "If I get any part of the answer wrong,\n",
437
+ "provide me with an explanation of why it was incorrect,\n",
438
+ "and then give me additional chances to respond until I get the correct choice. \"\"\"\n",
439
+ "\n",
440
+ "def prompt_define(query, prefix, context):\n",
441
+ " prompt = (query + prefix)\n",
442
+ " prompt = prompt + context\n",
443
+ " return prompt\n",
444
+ "\n",
445
+ "prompt = prompt_define(query, query_prefix, context)\n",
446
+ "\n",
447
+ "def get_result(prompt):\n",
448
+ " messages[1] = HumanMessage(content=prompt)\n",
449
+ " result = llm(messages)\n",
450
+ " str_result = str(result)\n",
451
+ " import re\n",
452
+ " str_result = str_result.replace(r'\\n', '\\n')\n",
453
+ " str_result = str_result.replace(\"content='\", \"\")\n",
454
+ " str_result = str_result.replace(\"' additional_kwargs={} example=False\", \"\")\n",
455
+ " return str_result\n",
456
+ "\n",
457
+ "result = get_result(prompt)\n",
458
+ "\n",
459
+ "print(result)"
460
+ ]
461
+ },
462
+ {
463
+ "cell_type": "markdown",
464
+ "metadata": {
465
+ "id": "Ta4A527kPA_Y"
466
+ },
467
+ "source": [
468
+ "### Fill-in-the-blank\n",
469
+ "\n",
470
+ "Prompt: Create a {number of questions} question fill in the blank quiz refrencing {context}. The quiz should reflect the learning objectives: {learning objectives}. The \"blank\" part of the question should appear as \"________\". The answers should reflect what word(s) should go in the blank an accurate statement.\n",
471
+ "\n",
472
+ "An example is the follow: \"The author of the book is \"________.\"\n",
473
+ "\n",
474
+ "The question should be a statement. Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide ONE question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect,and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right."
475
+ ]
476
+ },
477
+ {
478
+ "cell_type": "code",
479
+ "execution_count": 9,
480
+ "metadata": {
481
+ "colab": {
482
+ "base_uri": "https://localhost:8080/"
483
+ },
484
+ "id": "szMOxEoiPA_Z",
485
+ "outputId": "0c4c823b-6f07-4ba0-cbf3-54b3e28d0d6c"
486
+ },
487
+ "outputs": [
488
+ {
489
+ "name": "stdout",
490
+ "output_type": "stream",
491
+ "text": [
492
+ "Question 1: In \"The Road Not Taken,\" the narrator comes to a point where two roads ____________ in a yellow wood.\n"
493
+ ]
494
+ }
495
+ ],
496
+ "source": [
497
+ "# Fill in the blank code example\n",
498
+ "query = \"\"\" Create a 5 question fill in the blank quiz refrencing Robert Frost's \"The Road Not Taken.\"\n",
499
+ "The \"blank\" part of the question should appear as \"________\". The answers should reflect what word(s) should go in the blank an accurate statement.\n",
500
+ "An example is the follow: \"The author of the book is ______.\" The question should be a statement.\n",
501
+ "The quiz should reflect the learning objectives:\n",
502
+ "1. Identify the key elements of the poem: narrator, setting, and underlying message.\n",
503
+ "2. Understand the literary devices used in poetry and their purposes.\n",
504
+ "Provide one question at a time, and wait for my response before providing me with feedback.\n",
505
+ "Again, while the quiz asks for 5 questions, you should\n",
506
+ "only provide ONE question in you initial response. Do not include the answer in your response\n",
507
+ "If I answer incorrectly, please explain why my answer is incorrect. \"\"\"\n",
508
+ "\n",
509
+ "def prompt_define(query, prefix, context):\n",
510
+ " prompt = (query + prefix)\n",
511
+ " prompt = prompt + context\n",
512
+ " return prompt\n",
513
+ "\n",
514
+ "prompt = prompt_define(query, query_prefix, context)\n",
515
+ "\n",
516
+ "def get_result(prompt):\n",
517
+ " messages[1] = HumanMessage(content=prompt)\n",
518
+ " result = llm(messages)\n",
519
+ " str_result = str(result)\n",
520
+ " import re\n",
521
+ " str_result = str_result.replace(r'\\n', '\\n')\n",
522
+ " str_result = str_result.replace(\"content='\", \"\")\n",
523
+ " str_result = str_result.replace(\"' additional_kwargs={} example=False\", \"\")\n",
524
+ " return str_result\n",
525
+ "\n",
526
+ "result = get_result(prompt)\n",
527
+ "\n",
528
+ "print(result)"
529
+ ]
530
+ },
531
+ {
532
+ "cell_type": "markdown",
533
+ "metadata": {
534
+ "id": "yEwzAB28PA_Z"
535
+ },
536
+ "source": [
537
+ "### Sequencing\n",
538
+ "\n",
539
+ "Prompt: Please develop a {number of questions} question questionnaire that will ask me to recall the steps involved in the following learning objectives in regard to {context}: {learning objectives}. Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide ONE question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect, and then give me additional chances to respond until I get the correct choice. After I respond, explain their sequence to me."
540
+ ]
541
+ },
542
+ {
543
+ "cell_type": "code",
544
+ "execution_count": 10,
545
+ "metadata": {
546
+ "colab": {
547
+ "base_uri": "https://localhost:8080/"
548
+ },
549
+ "id": "3YRyhhWtPA_Z",
550
+ "outputId": "7e26ed3b-99c7-4577-c808-fb85c3e11b4c"
551
+ },
552
+ "outputs": [
553
+ {
554
+ "name": "stdout",
555
+ "output_type": "stream",
556
+ "text": [
557
+ "Question 1: Who is the narrator of the poem?\n"
558
+ ]
559
+ }
560
+ ],
561
+ "source": [
562
+ "# Sequence example\n",
563
+ "query = \"\"\" Please develop a 5 question questionnaire that will ask me to recall the steps involved in the following learning objectives in regard to Robert Frost's \"The Road Not Taken\":\n",
564
+ "1. Identify the key elements of the poem: narrator, setting, and underlying message.\n",
565
+ "2. Understand the literary devices used in poetry and their purposes.\n",
566
+ "Provide one question at a time, and wait for my response before providing me with feedback.\n",
567
+ "Again, while the quiz asks for 5 questions, you should\n",
568
+ "only provide ONE question in you initial response. Do not include the answer in your response.\n",
569
+ "After I respond, explain their sequence to me.\"\"\"\n",
570
+ "\n",
571
+ "def prompt_define(query, prefix, context):\n",
572
+ " prompt = (query + prefix)\n",
573
+ " prompt = prompt + context\n",
574
+ " return prompt\n",
575
+ "\n",
576
+ "prompt = prompt_define(query, query_prefix, context)\n",
577
+ "\n",
578
+ "def get_result(prompt):\n",
579
+ " messages[1] = HumanMessage(content=prompt)\n",
580
+ " result = llm(messages)\n",
581
+ " str_result = str(result)\n",
582
+ " import re\n",
583
+ " str_result = str_result.replace(r'\\n', '\\n')\n",
584
+ " str_result = str_result.replace(\"content='\", \"\")\n",
585
+ " str_result = str_result.replace(\"' additional_kwargs={} example=False\", \"\")\n",
586
+ " return str_result\n",
587
+ "\n",
588
+ "result = get_result(prompt)\n",
589
+ "\n",
590
+ "print(result)"
591
+ ]
592
+ },
593
+ {
594
+ "cell_type": "markdown",
595
+ "metadata": {
596
+ "id": "0DGBiJofPA_Z"
597
+ },
598
+ "source": [
599
+ "### Relationships/drawing connections\n",
600
+ "\n",
601
+ "Prompt: Please design a {number of questions} question quiz that asks me to explain the relationships that exist within the following learning objectives, referencing {context}: {learning objectives}. Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide ONE question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect,and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right."
602
+ ]
603
+ },
604
+ {
605
+ "cell_type": "code",
606
+ "execution_count": 11,
607
+ "metadata": {
608
+ "colab": {
609
+ "base_uri": "https://localhost:8080/"
610
+ },
611
+ "id": "Mw3RBpsnPA_a",
612
+ "outputId": "3f721340-7fef-4617-cc2c-2420779ecb2d"
613
+ },
614
+ "outputs": [
615
+ {
616
+ "name": "stdout",
617
+ "output_type": "stream",
618
+ "text": [
619
+ "Question 1: Identify the key elements of the poem: narrator, setting, and underlying message.\n"
620
+ ]
621
+ }
622
+ ],
623
+ "source": [
624
+ "# Relationships example\n",
625
+ "query = \"\"\" Please design a 5 question quiz that asks me to explain the relationships that exist within the following learning objectives, referencing Robert Frost's \"The Road Not Taken\":\n",
626
+ "1. Identify the key elements of the poem: narrator, setting, and underlying message.\n",
627
+ "2. Understand the literary devices used in poetry and their purposes.\n",
628
+ "Provide one question at a time, and wait for my response before providing me with feedback.\n",
629
+ "Again, while the quiz asks for 5 questions, you should\n",
630
+ "only provide ONE question in you initial response. Do not include the answer in your response.\"\"\"\n",
631
+ "\n",
632
+ "def prompt_define(query, prefix, context):\n",
633
+ " prompt = (query + prefix)\n",
634
+ " prompt = prompt + context\n",
635
+ " return prompt\n",
636
+ "\n",
637
+ "prompt = prompt_define(query, query_prefix, context)\n",
638
+ "\n",
639
+ "def get_result(prompt):\n",
640
+ " messages[1] = HumanMessage(content=prompt)\n",
641
+ " result = llm(messages)\n",
642
+ " str_result = str(result)\n",
643
+ " import re\n",
644
+ " str_result = str_result.replace(r'\\n', '\\n')\n",
645
+ " str_result = str_result.replace(\"content='\", \"\")\n",
646
+ " str_result = str_result.replace(\"' additional_kwargs={} example=False\", \"\")\n",
647
+ " return str_result\n",
648
+ "\n",
649
+ "result = get_result(prompt)\n",
650
+ "\n",
651
+ "print(result)"
652
+ ]
653
+ },
654
+ {
655
+ "cell_type": "markdown",
656
+ "metadata": {
657
+ "id": "4YO3wCTwPA_a"
658
+ },
659
+ "source": [
660
+ "### Concepts and Definitions\n",
661
+ "\n",
662
+ "Prompt: Design a {number of questions} question quiz that asks me about definitions related to the following learning objectives: {learning objectives} - based on {context}\".\n",
663
+ "Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide ONE question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect,and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right.\n"
664
+ ]
665
+ },
666
+ {
667
+ "cell_type": "code",
668
+ "execution_count": 12,
669
+ "metadata": {
670
+ "colab": {
671
+ "base_uri": "https://localhost:8080/"
672
+ },
673
+ "id": "TbvJPhkmPA_a",
674
+ "outputId": "f174e720-a31a-4476-8701-36ed9f637770"
675
+ },
676
+ "outputs": [
677
+ {
678
+ "name": "stdout",
679
+ "output_type": "stream",
680
+ "text": [
681
+ "Question 1: Who is the narrator of the poem \"The Road Not Taken\"?\n",
682
+ "\n",
683
+ "Please provide your response.\n"
684
+ ]
685
+ }
686
+ ],
687
+ "source": [
688
+ "# Concepts and definitions example\n",
689
+ "query = \"\"\" Design a 5 question quiz that asks me about definitions related to the following learning objectives:\n",
690
+ "1. Identify the key elements of the poem: narrator, setting, and underlying message, and\n",
691
+ "2. Understand the literary devices used in poetry and their purposes - based on Robert Frost's \"The Road Not Taken\".\n",
692
+ "Provide one question at a time, and wait for my response before providing me with feedback.\n",
693
+ "Again, while the quiz asks for 5 questions, you should\n",
694
+ "only provide ONE question in you initial response. Do not include the answer in your response\n",
695
+ "Once I write out my response, provide me with your own response, highlighting why my answer is correct or incorrect.\"\"\"\n",
696
+ "\n",
697
+ "def prompt_define(query, prefix, context):\n",
698
+ " prompt = (query + prefix)\n",
699
+ " prompt = prompt + context\n",
700
+ " return prompt\n",
701
+ "\n",
702
+ "prompt = prompt_define(query, query_prefix, context)\n",
703
+ "\n",
704
+ "def get_result(prompt):\n",
705
+ " messages[1] = HumanMessage(content=prompt)\n",
706
+ " result = llm(messages)\n",
707
+ " str_result = str(result)\n",
708
+ " import re\n",
709
+ " str_result = str_result.replace(r'\\n', '\\n')\n",
710
+ " str_result = str_result.replace(\"content='\", \"\")\n",
711
+ " str_result = str_result.replace(\"' additional_kwargs={} example=False\", \"\")\n",
712
+ " return str_result\n",
713
+ "\n",
714
+ "result = get_result(prompt)\n",
715
+ "\n",
716
+ "print(result)"
717
+ ]
718
+ },
719
+ {
720
+ "cell_type": "markdown",
721
+ "metadata": {
722
+ "id": "vc-llAgfPA_a"
723
+ },
724
+ "source": [
725
+ "### Real Word Examples\n",
726
+ "\n",
727
+ "Prompt: Demonstrate how {context} can be applied to solve a real-world problem related to the following learning objectives: {learning objectives}. Ask me questions regarding this theory/concept.\n",
728
+ "\n",
729
+ "Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide ONE question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect,and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right."
730
+ ]
731
+ },
732
+ {
733
+ "cell_type": "code",
734
+ "execution_count": 14,
735
+ "metadata": {
736
+ "colab": {
737
+ "base_uri": "https://localhost:8080/"
738
+ },
739
+ "id": "sSwRLkR8PA_a",
740
+ "outputId": "4d91df15-b7b8-4bc9-d1b2-3af7e629be4f"
741
+ },
742
+ "outputs": [
743
+ {
744
+ "name": "stdout",
745
+ "output_type": "stream",
746
+ "text": [
747
+ "Question 1: Who is the narrator of the poem?\n"
748
+ ]
749
+ }
750
+ ],
751
+ "source": [
752
+ "# Real word example\n",
753
+ "query = \"\"\" Demonstrate how Robert Frost’s “The Road Not Taken” can be applied to solve a real-world problem. Ask me questions regarding\n",
754
+ "this theory/concept and relate them to the following learning objectives:\n",
755
+ "1. Identify the key elements of the poem: narrator, setting, and underlying message.\n",
756
+ "2. Understand the literary devices used in poetry and their purposes.\n",
757
+ "Provide one question at a time, and wait for my response before providing me with feedback.\n",
758
+ "Again, while the quiz asks for 5 questions, you should\n",
759
+ "only provide ONE question in you initial response. Do not include the answer in your response\n",
760
+ "\"\"\"\n",
761
+ "\n",
762
+ "def prompt_define(query, prefix, context):\n",
763
+ " prompt = (query + prefix)\n",
764
+ " prompt = prompt + context\n",
765
+ " return prompt\n",
766
+ "\n",
767
+ "prompt = prompt_define(query, query_prefix, context)\n",
768
+ "\n",
769
+ "def get_result(prompt):\n",
770
+ " messages[1] = HumanMessage(content=prompt)\n",
771
+ " result = llm(messages)\n",
772
+ " str_result = str(result)\n",
773
+ " import re\n",
774
+ " str_result = str_result.replace(r'\\n', '\\n')\n",
775
+ " str_result = str_result.replace(\"content='\", \"\")\n",
776
+ " str_result = str_result.replace(\"' additional_kwargs={} example=False\", \"\")\n",
777
+ " return str_result\n",
778
+ "\n",
779
+ "result = get_result(prompt)\n",
780
+ "\n",
781
+ "print(result)"
782
+ ]
783
+ },
784
+ {
785
+ "cell_type": "markdown",
786
+ "metadata": {
787
+ "id": "2y6gHhGKPA_b"
788
+ },
789
+ "source": [
790
+ "### Randomized Question Types\n",
791
+ "\n",
792
+ "Prompt: Please generate a high-quality assessment consisting of {number of questions} varying questions, each of different types (open-ended, multiple choice, etc.), to determine if I achieved the following learning objectives in regards to {context}: {learning objectives}.\n",
793
+ "\n",
794
+ "Provide one question at a time, and wait for my response before providing me with feedback. Again, while the quiz may ask for multiple questions, you should only provide ONE question in you initial response. Do not include the answer in your response. If I get an answer wrong, provide me with an explanation of why it was incorrect,and then give me additional chances to respond until I get the correct choice. Explain why the correct choice is right."
795
+ ]
796
+ },
797
+ {
798
+ "cell_type": "code",
799
+ "execution_count": 15,
800
+ "metadata": {
801
+ "colab": {
802
+ "base_uri": "https://localhost:8080/"
803
+ },
804
+ "id": "xC4t4WMwPA_b",
805
+ "outputId": "331b1ef7-ef34-4d21-ddbd-e2ebae1d2fd4"
806
+ },
807
+ "outputs": [
808
+ {
809
+ "name": "stdout",
810
+ "output_type": "stream",
811
+ "text": [
812
+ "Question 1: \n",
813
+ "Identify the narrator of the poem \"The Road not Taken\" by Robert Frost.\n",
814
+ "\n",
815
+ "Question 2: \n",
816
+ "What is the setting of the poem \"The Road not Taken\" by Robert Frost?\n",
817
+ "\n",
818
+ "Question 3: \n",
819
+ "What is the underlying message or theme of the poem \"The Road not Taken\" by Robert Frost?\n",
820
+ "\n",
821
+ "Question 4 (Multiple Choice):\n",
822
+ "Which literary device is NOT used in the poem \"The Road not Taken\" by Robert Frost?\n",
823
+ "a) Metaphor\n",
824
+ "b) Simile\n",
825
+ "c) Alliteration\n",
826
+ "d) Personification\n",
827
+ "\n",
828
+ "Question 5 (Open-ended):\n",
829
+ "Provide an example of a literary device used in the poem \"The Road not Taken\" by Robert Frost and explain its purpose.\n"
830
+ ]
831
+ }
832
+ ],
833
+ "source": [
834
+ "# Randomized question types\n",
835
+ "query = \"\"\" Please generate a high-quality assessment consisting of 5 varying questions,\n",
836
+ "each of different types (open-ended, multiple choice, etc.),\n",
837
+ "to determine if I achieved the following learning objectives in regards to Robert Frost’s “The Road not Taken\":\n",
838
+ "1. Identify the key elements of the poem: narrator, setting, and underlying message.\n",
839
+ "2. Understand the literary devices used in poetry and their purposes. If I answer incorrectly for any of the questions,\n",
840
+ "please explain why my answer is incorrect.\n",
841
+ "Provide one question at a time, and wait for my response before providing me with feedback.\n",
842
+ "Again, while the quiz asks for 5 questions, you should\n",
843
+ "only provide ONE question in you initial response. Do not include the answer in your response.\n",
844
+ "\"\"\"\n",
845
+ "\n",
846
+ "def prompt_define(query, prefix, context):\n",
847
+ " prompt = (query + prefix)\n",
848
+ " prompt = prompt + context\n",
849
+ " return prompt\n",
850
+ "\n",
851
+ "prompt = prompt_define(query, query_prefix, context)\n",
852
+ "\n",
853
+ "def get_result(prompt):\n",
854
+ " messages[1] = HumanMessage(content=prompt)\n",
855
+ " result = llm(messages)\n",
856
+ " str_result = str(result)\n",
857
+ " import re\n",
858
+ " str_result = str_result.replace(r'\\n', '\\n')\n",
859
+ " str_result = str_result.replace(\"content='\", \"\")\n",
860
+ " str_result = str_result.replace(\"' additional_kwargs={} example=False\", \"\")\n",
861
+ " return str_result\n",
862
+ "\n",
863
+ "result = get_result(prompt)\n",
864
+ "\n",
865
+ "print(result)"
866
+ ]
867
+ },
868
+ {
869
+ "cell_type": "markdown",
870
+ "metadata": {
871
+ "id": "ndo_cdWYPA_b"
872
+ },
873
+ "source": [
874
+ "### Quantiative evaluation the correctness of a student's answer\n",
875
+ "\n",
876
+ "Prompt: (A continuation of the previous chat) Please generate the main points of the student’s answer to the previous question, and evaluate on a scale of 1 to 5 how comprehensive the student’s answer was in relation to the learning objectives, and explain why he or she received this rating, including what was missed in his or her answer if the student’s answer wasn’t complete.\n"
877
+ ]
878
+ },
879
+ {
880
+ "cell_type": "code",
881
+ "execution_count": 118,
882
+ "metadata": {
883
+ "colab": {
884
+ "base_uri": "https://localhost:8080/"
885
+ },
886
+ "id": "WhhNI0FZPA_b",
887
+ "outputId": "fb037251-2448-44e7-acef-3e6a28da7cb3"
888
+ },
889
+ "outputs": [
890
+ {
891
+ "name": "stdout",
892
+ "output_type": "stream",
893
+ "text": [
894
+ "Question 1: Who is the narrator of the poem \"The Road not Taken\"?\n",
895
+ "a) Robert Frost\n",
896
+ "b) The traveler\n",
897
+ "c) The reader\n",
898
+ "d) The poet\\'s imagination\n"
899
+ ]
900
+ }
901
+ ],
902
+ "source": [
903
+ "# qualitative evaluation\n",
904
+ "qualitative_query = \"\"\" Please generate the main points of the student’s answer to the previous question,\n",
905
+ " and evaluate on a scale of 1 to 5 how comprehensive the student’s answer was in relation to the learning objectives,\n",
906
+ " and explain why he or she received this rating, including what was missed in his or her answer if the student’s answer wasn’t complete.\"\"\"\n",
907
+ "\n",
908
+ "# Note that this uses the previous result and query in the context\n",
909
+ "def prompt_define(query, prefix, context):\n",
910
+ " prompt = (query + prefix)\n",
911
+ " prompt = prompt + context\n",
912
+ " return prompt\n",
913
+ "\n",
914
+ "prompt = prompt_define(query, query_prefix, context)\n",
915
+ "\n",
916
+ "def get_result(prompt):\n",
917
+ " messages[1] = HumanMessage(content=prompt)\n",
918
+ " result = llm(messages)\n",
919
+ " str_result = str(result)\n",
920
+ " import re\n",
921
+ " str_result = str_result.replace(r'\\n', '\\n')\n",
922
+ " str_result = str_result.replace(\"content='\", \"\")\n",
923
+ " str_result = str_result.replace(\"' additional_kwargs={} example=False\", \"\")\n",
924
+ " return str_result\n",
925
+ "\n",
926
+ "result = get_result(prompt)\n",
927
+ "\n",
928
+ "print(result)"
929
+ ]
930
+ }
931
+ ],
932
+ "metadata": {
933
+ "colab": {
934
+ "include_colab_link": true,
935
+ "provenance": []
936
+ },
937
+ "kernelspec": {
938
+ "display_name": "Python 3.11.5 64-bit",
939
+ "language": "python",
940
+ "name": "python3"
941
+ },
942
+ "language_info": {
943
+ "codemirror_mode": {
944
+ "name": "ipython",
945
+ "version": 3
946
+ },
947
+ "file_extension": ".py",
948
+ "mimetype": "text/x-python",
949
+ "name": "python",
950
+ "nbconvert_exporter": "python",
951
+ "pygments_lexer": "ipython3",
952
+ "version": "3.11.5"
953
+ },
954
+ "vscode": {
955
+ "interpreter": {
956
+ "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
957
+ }
958
+ }
959
+ },
960
+ "nbformat": 4,
961
+ "nbformat_minor": 0
962
+ }
prompt_with_vector_store.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
prompt_with_vector_store_w_grading_intr.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
speech_to_text_models.ipynb ADDED
The diff for this file is too large to render. See raw diff