andyqin18 commited on
Commit
66f2f1a
1 Parent(s): ea09ee5

Updated README

Browse files
Files changed (3) hide show
  1. README.md +191 -5
  2. app.py +1 -1
  3. test_model.py +6 -6
README.md CHANGED
@@ -11,13 +11,192 @@ pinned: false
11
 
12
  # AI Project: Finetuning Language Models - Toxic Tweets
13
 
14
- Hello! This is a project for CS-UY 4613: Artificial Intelligence. I'm providing a step-by-step instruction on finetuning language models for detecting toxic tweets.
15
 
16
- # Milestone 3
 
17
 
18
- This milestone includes finetuning a language model in HuggingFace for sentiment analysis.
19
 
20
- Link to app: https://huggingface.co/spaces/andyqin18/sentiment-analysis-app
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  Here's the setup block that includes all modules:
23
  ```
@@ -121,4 +300,11 @@ trainer.push_to_hub()
121
 
122
  Modify [app.py](app.py) so that it takes in one text and generate an analysis using one of the provided models. Details are explained in comment lines. The app should look like this:
123
 
124
- ![](milestone3/appUI.png)
 
 
 
 
 
 
 
 
11
 
12
  # AI Project: Finetuning Language Models - Toxic Tweets
13
 
14
+ Hello! This is a project for CS-UY 4613: Artificial Intelligence. I'm providing a step-by-step instruction on finetuning language models for detecting toxic tweets. All codes are well commented.
15
 
16
+ # Everthing you need to know
17
+ Link to HuggingFace space: https://huggingface.co/spaces/andyqin18/sentiment-analysis-app
18
 
19
+ ----Code behind app: [app.py](app.py)
20
 
21
+ Link to finetuned model: https://huggingface.co/andyqin18/finetuned-bert-uncased
22
+
23
+ ----Code for how to finetune a language model: [finetune.ipynb](milestone3/finetune.ipynb)
24
+
25
+ Performance of the model using [test_model.py](test_model.py) is shown below. The result is generated on 2000 randomly selected samples from [train.csv](milestone3/comp/train.csv)
26
+
27
+ ```
28
+ {'label_accuracy': 0.9821666666666666,
29
+ 'prediction_accuracy': 0.9195,
30
+ 'precision': 0.8263888888888888,
31
+ 'recall': 0.719758064516129}
32
+ ```
33
+
34
+ Now let's walk through the details :)
35
+
36
+ # Milestone 1 - Setup
37
+
38
+ This milestone includes setting up docker and creating a development environment on Windows 11.
39
+
40
+ ## 1. Enable WSL2 feature
41
+
42
+ The Windows Subsystem for Linux (WSL) lets developers install a Linux distribution on Windows.
43
+
44
+ ```
45
+ wsl --install
46
+ ```
47
+
48
+ Ubuntu is the default distribution installed and WSL2 is the default version.
49
+ After creating linux username and password, Ubuntu can be seen in Windows Terminal now.
50
+ Details can be found [here](https://learn.microsoft.com/en-us/windows/wsl/install).
51
+
52
+ ![](milestone1/wsl2.png)
53
+
54
+ ## 2. Download and install the Linux kernel update package
55
+
56
+ The package needs to be downloaded before installing Docker Desktop.
57
+ However, this error might occur:
58
+
59
+ `Error: wsl_update_x64.msi unable to run because "This update only applies to machines with the Windows Subsystem for Linux"`
60
+
61
+ Solution: Opened Windows features and enabled "Windows Subsystem for Linux".
62
+ Successfully ran update [package](https://docs.microsoft.com/windows/wsl/wsl2-kernel).
63
+
64
+ ![](milestone1/kernal_update_sol.png)
65
+
66
+ ## 3. Download Docker Desktop
67
+
68
+ After downloading the [Docker App](https://www.docker.com/products/docker-desktop/), WSL2 based engine is automatically enabled.
69
+ If not, follow [this link](https://docs.docker.com/desktop/windows/wsl/) for steps to turn on WSL2 backend.
70
+ Open the app and input `docker version` in Terminal to check server running.
71
+
72
+ ![](milestone1/docker_version.png)
73
+ Docker is ready to go.
74
+
75
+ ## 4. Create project container and image
76
+
77
+ First we download the Ubuntu image from Docker’s library with:
78
+ ```
79
+ docker pull ubuntu
80
+ ```
81
+ We can check the available images with:
82
+ ```
83
+ docker image ls
84
+ ```
85
+ We can create a container named *AI_project* based on Ubuntu image with:
86
+ ```
87
+ docker run -it --name=AI_project ubuntu
88
+ ```
89
+ The `–it` options instruct the container to launch in interactive mode and enable a Terminal typing interface.
90
+ After this, a shell is generated and we are directed to Linux Terminal within the container.
91
+ `root` represents the currently logged-in user with highest privileges, and `249cf37645b4` is the container ID.
92
+
93
+ ![](milestone1/docker_create_container.png)
94
+
95
+ ## 5. Hello World!
96
+
97
+ Now we can mess with the container by downloading python and pip needed for the project.
98
+ First we update and upgrade packages by: (`apt` is Advanced Packaging Tool)
99
+ ```
100
+ apt update && apt upgrade
101
+ ```
102
+ Then we download python and pip with:
103
+ ```
104
+ apt install python3 pip
105
+ ```
106
+ We can confirm successful installation by checking the current version of python and pip.
107
+ Then create a script file of *hello_world.py* under `root` directory, and run the script.
108
+ You will see the following in VSCode and Terminal.
109
+
110
+ ![](milestone1/vscode.png)
111
+ ![](milestone1/hello_world.png)
112
+
113
+ ## 6. Commit changes to a new image specifically for the project
114
+
115
+ After setting up the container we can commit changes to a specific project image with a tag of *milestone1* with:
116
+ ```
117
+ docker commit [CONTAINER] [NEW_IMAGE]:[TAG]
118
+ ```
119
+ Now if we check the available images there should be a new image for the project. If we list all containers we should be able to identify the one we were working on through container ID.
120
+
121
+ ![](milestone1/commit_to_new_image.png)
122
+
123
+ The Docker Desktop app should match the image list we see on Terminal.
124
+
125
+ ![](milestone1/app_image_list.png)
126
+
127
+ # Milestone 2 - Sentiment Analysis App w/ Pretrained Model
128
+
129
+ This milestone includes creating a Streamlit app in HuggingFace for sentiment analysis.
130
+
131
+ ## 1. Space setup
132
+
133
+ After creating a HuggingFace account, we can create our app as a space and choose Streamlit as the space SDK.
134
+
135
+ ![](milestone2/new_HF_space.png)
136
+
137
+ Then we can go back to our Github Repo and create the following files.
138
+ In order for the space to run properly, there must be at least three files in the root directory:
139
+ [README.md](README.md), [app.py](app.py), and [requirements.txt](requirements.txt)
140
+
141
+ Make sure the following metadata is at the top of **README.md** for HuggingFace to identify.
142
+ ```
143
+ ---
144
+ title: Sentiment Analysis App
145
+ emoji: 🚀
146
+ colorFrom: green
147
+ colorTo: purple
148
+ sdk: streamlit
149
+ sdk_version: 1.17.0
150
+ app_file: app.py
151
+ pinned: false
152
+ ---
153
+ ```
154
+
155
+ The **app.py** file is the main code of the app and **requirements.txt** should include all the libraries the code uses. HuggingFace will install the libraries listed before running the virtual environment
156
+
157
+
158
+ ## 2. Connect and sync to HuggingFace
159
+
160
+ Then we go to settings of the Github Repo and create a secret token to access the new HuggingFace space.
161
+
162
+ ![](milestone2/HF_token.png)
163
+ ![](milestone2/github_token.png)
164
+
165
+ Next, we need to setup a workflow in Github Actions. Click "set up a workflow yourself" and replace all the code in `main.yaml` with the following: (Replace `HF_USERNAME` and `SPACE_NAME` with our own)
166
+
167
+ ```
168
+ name: Sync to Hugging Face hub
169
+ on:
170
+ push:
171
+ branches: [main]
172
+
173
+ # to run this workflow manually from the Actions tab
174
+ workflow_dispatch:
175
+
176
+ jobs:
177
+ sync-to-hub:
178
+ runs-on: ubuntu-latest
179
+ steps:
180
+ - uses: actions/checkout@v3
181
+ with:
182
+ fetch-depth: 0
183
+ lfs: true
184
+ - name: Push to hub
185
+ env:
186
+ HF_TOKEN: ${{ secrets.HF_TOKEN }}
187
+ run: git push --force https://HF_USERNAME:$HF_TOKEN@huggingface.co/spaces/HF_USERNAME/SPACE_NAME main
188
+ ```
189
+ The Repo is now connected and synced with HuggingFace space!
190
+
191
+ ## 3. Create the app
192
+
193
+ Modify [app.py](app.py) so that it takes in one text and generate an analysis using one of the provided models. Details are explained in comment lines. The app should look like this:
194
+
195
+ ![](milestone2/app_UI.png)
196
+
197
+ # Milestone 3 - Finetuning Language Models
198
+
199
+ This milestone we wish to finetuning our own language model in HuggingFace for sentiment analysis.
200
 
201
  Here's the setup block that includes all modules:
202
  ```
 
300
 
301
  Modify [app.py](app.py) so that it takes in one text and generate an analysis using one of the provided models. Details are explained in comment lines. The app should look like this:
302
 
303
+ ![](milestone3/appUI.png)
304
+
305
+ ## Reference:
306
+ For connecting Github with HuggingFace, check this [video](https://www.youtube.com/watch?v=8hOzsFETm4I).
307
+
308
+ For creating the app, check this [video](https://www.youtube.com/watch?v=GSt00_-0ncQ)
309
+
310
+ The HuggingFace documentation is [here](https://huggingface.co/docs), and Streamlit APIs [here](https://docs.streamlit.io/library/api-reference).
app.py CHANGED
@@ -18,7 +18,7 @@ def analyze(model_name: str, text: str, top_k=1) -> dict:
18
  return classifier(text)
19
 
20
  # App title
21
- st.title("Sentiment Analysis App - Milestone3")
22
  st.write("This app is to analyze the sentiments behind a text.")
23
  st.write("You can choose to use my fine-tuned model or pre-trained models.")
24
 
 
18
  return classifier(text)
19
 
20
  # App title
21
+ st.title("Toxic Tweet Detection and Sentiment Analysis App")
22
  st.write("This app is to analyze the sentiments behind a text.")
23
  st.write("You can choose to use my fine-tuned model or pre-trained models.")
24
 
test_model.py CHANGED
@@ -6,8 +6,8 @@ from tqdm import tqdm
6
 
7
 
8
  # Global var
9
- TEST_SIZE = 1000
10
- FINE_TUNED_MODEL = "andyqin18/test-finetuned"
11
 
12
 
13
  # Define analyze function
@@ -77,8 +77,8 @@ for comment_idx in tqdm(range(TEST_SIZE), desc="Analyzing..."):
77
 
78
  # Calculate performance
79
  performance = {}
80
- performance["label_accuracy"] = total_true/(len(labels) * TEST_SIZE)
81
- performance["prediction_accuracy"] = total_success/TEST_SIZE
82
- performance["precision"] = TP / (TP + FP)
83
- performance["recall"] = TP / (TP + FN)
84
  print(performance)
 
6
 
7
 
8
  # Global var
9
+ TEST_SIZE = 2000
10
+ FINE_TUNED_MODEL = "andyqin18/finetuned-bert-uncased"
11
 
12
 
13
  # Define analyze function
 
77
 
78
  # Calculate performance
79
  performance = {}
80
+ performance["label_accuracy"] = total_true/(len(labels) * TEST_SIZE) # Success prediction of each label
81
+ performance["prediction_accuracy"] = total_success/TEST_SIZE # Success prediction of all 6 labels for 1 sample
82
+ performance["precision"] = TP / (TP + FP) # Label precision
83
+ performance["recall"] = TP / (TP + FN) # Label recall
84
  print(performance)