File size: 4,828 Bytes
de3c2ee
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113

# Ask ANRG Project Description

Our demo is available at [here](https://huggingface.co/spaces/FloraJ/Ask-ANRG).

A concise and structured guide to setting up and understanding the ANRG project.

---

## πŸš€ Setup

1. **Clone the Repository**:
   ```
   git clone git@github.com:ANRGUSC/ask-anrg.git
   ```

2. **Navigate to the Directory**:
   ```
   cd ask-anrg/
   ```

3. **Create a Conda Environment**:
   ```
   conda create --name ask_anrg
   ```

4. **Activate the Conda Environment**:
   ```
   conda activate ask_anrg
   ```

5. **Install Required Dependencies**:
   ```
   pip3 install -r requirements.txt
   ```
   
6. **Download database from [here](https://drive.google.com/file/d/1-TV70IFIzjO4uPzNRzef3FLhssAfK2g3/view?usp=sharing) for demo purpose, unzip it, and put it directly under the root directory, or place your own documents under the [original_documents](database/original_documents)**
   ```
   ask-anrg/
   |-- database/
      |-- original_documents/
   |-- openai_function_utils/
      |-- openai_function_impl.py
      |-- openai_function_interface.py
   |-- configs.py
   |-- requirements.txt
   |-- utils.py
   |-- main.py
   |-- Readme.md
   |-- project_description.md
   |-- result_report.txt
   |-- .gitignore
   ```
7. **set up database data**
   If you place your own documents inside the [original_documents](database/original_documents) directory, please run the following command to prepare embeddings for your documents.
   ```
   python3 utils.py
   ```
   It will create `/database/embeddings` to store the embeddings of the original documents, and create a csv file ```database/document_name_to_embedding.csv``` that stores document name and its embedding vector.
   
## πŸ–₯️ How to Run
```
python main.py
```
After the prompt "Hi! What question do you have for ANGR? Press 0 to exit", you can reply with your question.

## πŸ“‚ Structure
* database: Contains scraped and processed data related to the lab.
  * embeddings: Processed embeddings for the publications.
  * original_documents: Original texts scraped from the lab website.
  * document_name_to_embedding.csv: Embeddings for all publications.
* openai_function_utils: Utility functions related to OpenAI.
  * openai_function_impl.py: Implementations of the OpenAI functions.
  * openai_function_interface.py: Interfaces (descriptions) for the OpenAI functions.
* configs.py: Configuration settings, e.g., OpenAI API key.
* requirements.txt: Required Python libraries for the project.
* utils.py: Utility functions, such as embedding, searching, and retrieving answers from ChatGPT.
* main.py: Main entry point of the project.

## πŸ› οΈ Implemented Functions for OPENAI
These functions are selected to be used by ChatGPT during handling user questions:

- `get_lab_member_info`: Retrieve details (name, photo URL, links, description) of a lab member by name.
- `get_lab_member_detailed_info`: Detailed information(link, photo, description) of a lab member.
- `get_publication_by_year`: List all publication information for a given year.
- `get_pub_info`: Access details (title, venue, authors, year, link) of a publication by its title.
- `get_pub_by_name`: Get information on all publications written by a specific lab member.

More details on the functions can be checked under `openai_function_utils/`.

## Evaluation: Turing Test
We follow the steps below to evaluate our chatbot:
1. Based on the information scraped from lab's website, we come up with questions that chatbot's users may ask, including both general (applied to any lab) and lab-specific questions. Here are some examples:
   - Who works here?
   - List all publications of this lab.
   - What are some recent publications by this lab in the area of [x]?
   - What conferences does this lab usually publish to?
   - What kind of undergraduate projects does this lab work on?
   - Give me the link to [x]'s homepage.
   - Give me a publication written by [x].
   - How long has [x] been doing research in [y] area?
   - Who in the lab is currently working on [x]?
   - Where does former member [x] work now?
2. Given 4 team members A, B, C, D. We will have A and B manually write down and provide answers to the evaluation questions for questions from each category.
3. Then, C will test the questions on the ChatBot and collect the answers.
4. Without knowing which answers are provided by human/chatbot, D will compare the answers for every question and choose which one is more preferable by human.
5. Chatbot's winning rate (i.e. how many times the Chatbot manages to win over the human answerer) will be calculated.

|      Overall Winning Rate      | 
|:-----------------------------: |
| N/A |

Refer to [ask_anrg_eval_question.csv](ask_anrg_eval_question.csv) for more details regarding the questions used for evaluation & evaluation results.