Spaces:

andyqin18
/

sentiment-analysis-app

Running

App Files Files Community

andyqin18 commited on Apr 30, 2023

Commit

66f2f1a

•

1 Parent(s): ea09ee5

Updated README

Browse files

Files changed (3) hide show

README.md +191 -5
app.py +1 -1
test_model.py +6 -6

README.md CHANGED Viewed

@@ -11,13 +11,192 @@ pinned: false
 # AI Project: Finetuning Language Models - Toxic Tweets
-Hello! This is a project for CS-UY 4613: Artificial Intelligence. I'm providing a step-by-step instruction on finetuning language models for detecting toxic tweets.
-# Milestone 3
-This milestone includes finetuning a language model in HuggingFace for sentiment analysis.
-Link to app: https://huggingface.co/spaces/andyqin18/sentiment-analysis-app
 Here's the setup block that includes all modules:
 ```
@@ -121,4 +300,11 @@ trainer.push_to_hub()
 Modify [app.py](app.py) so that it takes in one text and generate an analysis using one of the provided models. Details are explained in comment lines. The app should look like this:
-![](milestone3/appUI.png)

 # AI Project: Finetuning Language Models - Toxic Tweets
+Hello! This is a project for CS-UY 4613: Artificial Intelligence. I'm providing a step-by-step instruction on finetuning language models for detecting toxic tweets. All codes are well commented.
+# Everthing you need to know
+Link to HuggingFace space: https://huggingface.co/spaces/andyqin18/sentiment-analysis-app
+----Code behind app: [app.py](app.py)
+Link to finetuned model: https://huggingface.co/andyqin18/finetuned-bert-uncased
+----Code for how to finetune a language model: [finetune.ipynb](milestone3/finetune.ipynb)
+Performance of the model using [test_model.py](test_model.py) is shown below. The result is generated on 2000 randomly selected samples from [train.csv](milestone3/comp/train.csv)
+```
+{'label_accuracy': 0.9821666666666666,
+ 'prediction_accuracy': 0.9195,
+ 'precision': 0.8263888888888888,
+ 'recall': 0.719758064516129}
+```
+Now let's walk through the details :)
+# Milestone 1 - Setup
+This milestone includes setting up docker and creating a development environment on Windows 11.
+## 1. Enable WSL2 feature
+The Windows Subsystem for Linux (WSL) lets developers install a Linux distribution on Windows.
+```
+wsl --install
+```
+Ubuntu is the default distribution installed and WSL2 is the default version.
+After creating linux username and password, Ubuntu can be seen in Windows Terminal now.
+Details can be found [here](https://learn.microsoft.com/en-us/windows/wsl/install).
+![](milestone1/wsl2.png)
+## 2. Download and install the Linux kernel update package
+The package needs to be downloaded before installing Docker Desktop.
+However, this error might occur:
+`Error: wsl_update_x64.msi unable to run because "This update only applies to machines with the Windows Subsystem for Linux"`
+Solution: Opened Windows features and enabled "Windows Subsystem for Linux".
+Successfully ran update [package](https://docs.microsoft.com/windows/wsl/wsl2-kernel).
+![](milestone1/kernal_update_sol.png)
+## 3. Download Docker Desktop
+After downloading the [Docker App](https://www.docker.com/products/docker-desktop/), WSL2 based engine is automatically enabled.
+If not, follow [this link](https://docs.docker.com/desktop/windows/wsl/) for steps to turn on WSL2 backend.
+Open the app and input `docker version` in Terminal to check server running.
+![](milestone1/docker_version.png)
+Docker is ready to go.
+## 4. Create project container and image
+First we download the Ubuntu image from Docker’s library with:
+```
+docker pull ubuntu
+```
+We can check the available images with:
+```
+docker image ls
+```
+We can create a container named *AI_project* based on Ubuntu image with:
+```
+docker run -it --name=AI_project ubuntu
+```
+The `–it` options instruct the container to launch in interactive mode and enable a Terminal typing interface.
+After this, a shell is generated and we are directed to Linux Terminal within the container.
+`root` represents the currently logged-in user with highest privileges, and `249cf37645b4` is the container ID.
+![](milestone1/docker_create_container.png)
+## 5. Hello World!
+Now we can mess with the container by downloading python and pip needed for the project.
+First we update and upgrade packages by: (`apt` is Advanced Packaging Tool)
+```
+apt update && apt upgrade
+```
+Then we download python and pip with:
+```
+apt install python3 pip
+```
+We can confirm successful installation by checking the current version of python and pip.
+Then create a script file of *hello_world.py* under `root` directory, and run the script.
+You will see the following in VSCode and Terminal.
+![](milestone1/vscode.png)
+![](milestone1/hello_world.png)
+## 6. Commit changes to a new image specifically for the project
+After setting up the container we can commit changes to a specific project image with a tag of *milestone1* with:
+```
+docker commit [CONTAINER] [NEW_IMAGE]:[TAG]
+```
+Now if we check the available images there should be a new image for the project. If we list all containers we should be able to identify the one we were working on through container ID.
+![](milestone1/commit_to_new_image.png)
+The Docker Desktop app should match the image list we see on Terminal.
+![](milestone1/app_image_list.png)
+# Milestone 2 - Sentiment Analysis App w/ Pretrained Model
+This milestone includes creating a Streamlit app in HuggingFace for sentiment analysis.
+## 1. Space setup
+After creating a HuggingFace account, we can create our app as a space and choose Streamlit as the space SDK.
+![](milestone2/new_HF_space.png)
+Then we can go back to our Github Repo and create the following files.
+In order for the space to run properly, there must be at least three files in the root directory:
+[README.md](README.md), [app.py](app.py), and [requirements.txt](requirements.txt)
+Make sure the following metadata is at the top of **README.md** for HuggingFace to identify.
+```
+---
+title: Sentiment Analysis App
+emoji: 🚀
+colorFrom: green
+colorTo: purple
+sdk: streamlit
+sdk_version: 1.17.0
+app_file: app.py
+pinned: false
+---
+```
+The **app.py** file is the main code of the app and **requirements.txt** should include all the libraries the code uses. HuggingFace will install the libraries listed before running the virtual environment
+## 2. Connect and sync to HuggingFace
+Then we go to settings of the Github Repo and create a secret token to access the new HuggingFace space.
+![](milestone2/HF_token.png)
+![](milestone2/github_token.png)
+Next, we need to setup a workflow in Github Actions. Click "set up a workflow yourself" and replace all the code in `main.yaml` with the following: (Replace `HF_USERNAME` and `SPACE_NAME` with our own)
+```
+name: Sync to Hugging Face hub
+on:
+  push:
+    branches: [main]
+  # to run this workflow manually from the Actions tab
+  workflow_dispatch:
+jobs:
+  sync-to-hub:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+        with:
+          fetch-depth: 0
+          lfs: true
+      - name: Push to hub
+        env:
+          HF_TOKEN: ${{ secrets.HF_TOKEN }}
+        run: git push --force https://HF_USERNAME:$HF_TOKEN@huggingface.co/spaces/HF_USERNAME/SPACE_NAME main
+```
+The Repo is now connected and synced with HuggingFace space!
+## 3. Create the app
+Modify [app.py](app.py) so that it takes in one text and generate an analysis using one of the provided models. Details are explained in comment lines. The app should look like this:
+![](milestone2/app_UI.png)
+# Milestone 3 - Finetuning Language Models
+This milestone we wish to finetuning our own language model in HuggingFace for sentiment analysis.
 Here's the setup block that includes all modules:
 ```
 Modify [app.py](app.py) so that it takes in one text and generate an analysis using one of the provided models. Details are explained in comment lines. The app should look like this:
+![](milestone3/appUI.png)
+## Reference:
+For connecting Github with HuggingFace, check this [video](https://www.youtube.com/watch?v=8hOzsFETm4I).
+For creating the app, check this [video](https://www.youtube.com/watch?v=GSt00_-0ncQ)
+The HuggingFace documentation is [here](https://huggingface.co/docs), and Streamlit APIs [here](https://docs.streamlit.io/library/api-reference).

app.py CHANGED Viewed

@@ -18,7 +18,7 @@ def analyze(model_name: str, text: str, top_k=1) -> dict:
     return classifier(text)
 # App title
-st.title("Sentiment Analysis App - Milestone3")
 st.write("This app is to analyze the sentiments behind a text.")
 st.write("You can choose to use my fine-tuned model or pre-trained models.")

     return classifier(text)
 # App title
+st.title("Toxic Tweet Detection and Sentiment Analysis App")
 st.write("This app is to analyze the sentiments behind a text.")
 st.write("You can choose to use my fine-tuned model or pre-trained models.")

test_model.py CHANGED Viewed

@@ -6,8 +6,8 @@ from tqdm import tqdm
 # Global var
-TEST_SIZE = 1000
-FINE_TUNED_MODEL = "andyqin18/test-finetuned"
 # Define analyze function
@@ -77,8 +77,8 @@ for comment_idx in tqdm(range(TEST_SIZE), desc="Analyzing..."):
 # Calculate performance
 performance = {}
-performance["label_accuracy"] = total_true/(len(labels) * TEST_SIZE)
-performance["prediction_accuracy"] = total_success/TEST_SIZE
-performance["precision"] = TP / (TP + FP)
-performance["recall"] = TP / (TP + FN)
 print(performance)

 # Global var
+TEST_SIZE = 2000
+FINE_TUNED_MODEL = "andyqin18/finetuned-bert-uncased"
 # Define analyze function
 # Calculate performance
 performance = {}
+performance["label_accuracy"] = total_true/(len(labels) * TEST_SIZE)  # Success prediction of each label
+performance["prediction_accuracy"] = total_success/TEST_SIZE  # Success prediction of all 6 labels for 1 sample
+performance["precision"] = TP / (TP + FP)  # Label precision
+performance["recall"] = TP / (TP + FN)  # Label recall
 print(performance)