Ask ANRG Project Description
Our demo is available at here.
A concise and structured guide to setting up and understanding the ANRG project.
π Setup
Clone the Repository:
git clone git@github.com:ANRGUSC/ask-anrg.git
Navigate to the Directory:
cd ask-anrg/
Create a Conda Environment:
conda create --name ask_anrg
Activate the Conda Environment:
conda activate ask_anrg
Install Required Dependencies:
pip3 install -r requirements.txt
Download database from here for demo purpose, unzip it, and put it directly under the root directory, or place your own documents under the original_documents
ask-anrg/ |-- database/ |-- original_documents/ |-- openai_function_utils/ |-- openai_function_impl.py |-- openai_function_interface.py |-- configs.py |-- requirements.txt |-- utils.py |-- main.py |-- Readme.md |-- project_description.md |-- result_report.txt |-- .gitignore
set up database data If you place your own documents inside the original_documents directory, please run the following command to prepare embeddings for your documents.
python3 utils.py
It will create
/database/embeddings
to store the embeddings of the original documents, and create a csv filedatabase/document_name_to_embedding.csv
that stores document name and its embedding vector.
π₯οΈ How to Run
python main.py
After the prompt "Hi! What question do you have for ANGR? Press 0 to exit", you can reply with your question.
π Structure
- database: Contains scraped and processed data related to the lab.
- embeddings: Processed embeddings for the publications.
- original_documents: Original texts scraped from the lab website.
- document_name_to_embedding.csv: Embeddings for all publications.
- openai_function_utils: Utility functions related to OpenAI.
- openai_function_impl.py: Implementations of the OpenAI functions.
- openai_function_interface.py: Interfaces (descriptions) for the OpenAI functions.
- configs.py: Configuration settings, e.g., OpenAI API key.
- requirements.txt: Required Python libraries for the project.
- utils.py: Utility functions, such as embedding, searching, and retrieving answers from ChatGPT.
- main.py: Main entry point of the project.
π οΈ Implemented Functions for OPENAI
These functions are selected to be used by ChatGPT during handling user questions:
get_lab_member_info
: Retrieve details (name, photo URL, links, description) of a lab member by name.get_lab_member_detailed_info
: Detailed information(link, photo, description) of a lab member.get_publication_by_year
: List all publication information for a given year.get_pub_info
: Access details (title, venue, authors, year, link) of a publication by its title.get_pub_by_name
: Get information on all publications written by a specific lab member.
More details on the functions can be checked under openai_function_utils/
.
Evaluation: Turing Test
We follow the steps below to evaluate our chatbot:
- Based on the information scraped from lab's website, we come up with questions that chatbot's users may ask, including both general (applied to any lab) and lab-specific questions. Here are some examples:
- Who works here?
- List all publications of this lab.
- What are some recent publications by this lab in the area of [x]?
- What conferences does this lab usually publish to?
- What kind of undergraduate projects does this lab work on?
- Give me the link to [x]'s homepage.
- Give me a publication written by [x].
- How long has [x] been doing research in [y] area?
- Who in the lab is currently working on [x]?
- Where does former member [x] work now?
- Given 4 team members A, B, C, D. We will have A and B manually write down and provide answers to the evaluation questions for questions from each category.
- Then, C will test the questions on the ChatBot and collect the answers.
- Without knowing which answers are provided by human/chatbot, D will compare the answers for every question and choose which one is more preferable by human.
- Chatbot's winning rate (i.e. how many times the Chatbot manages to win over the human answerer) will be calculated.
Overall Winning Rate |
---|
N/A |
Refer to ask_anrg_eval_question.csv for more details regarding the questions used for evaluation & evaluation results.