Mohammed Foud commited on
Commit
79d2a14
·
1 Parent(s): 8da7632

first commit

Browse files
Files changed (11) hide show
  1. .cursorignore +13 -0
  2. .cursorrules +166 -0
  3. .gitattributes +1 -0
  4. 3.md +166 -0
  5. Dockerfile +26 -0
  6. app.py +107 -0
  7. d.sh +4 -0
  8. dataset.csv +3 -0
  9. docker-compose.yml +12 -0
  10. requirements.txt +8 -0
  11. tree.sh +15 -0
.cursorignore ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ node_modules
2
+ trash
3
+ build
4
+ etc
5
+ pnpm-lock.yaml
6
+ dist
7
+ etc
8
+ .gitignore
9
+ # .cursorignore
10
+ .vscode
11
+ .env
12
+ .env.local
13
+ dataset.csv
.cursorrules ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ![logo_ironhack_blue](https://user-images.githubusercontent.com/23629340/40541063-a07a0a8a-601a-11e8-91b5-2f13e4e6b441.png)
2
+
3
+ # Project NLP | Business Case: Automated Customer Reviews
4
+
5
+ ## Project Goal
6
+
7
+ This project aims to develop a product review system powered by NLP models that aggregate customer feedback from different sources. The key tasks include classifying reviews, clustering product categories, and using generative AI to summarize reviews into recommendation articles.
8
+
9
+ ## Problem Statement
10
+
11
+ With thousands of reviews available across multiple platforms, manually analyzing them is inefficient. This project seeks to automate the process using NLP models to extract insights and provide users with valuable product recommendations.
12
+
13
+ ## Main Tasks
14
+
15
+ ### 1. Review Classification
16
+ - **Objective**: Classify customer reviews into **positive**, **negative**, or **neutral** categories to help the company improve its products and services.
17
+ - **Task**: Create a model for classifying the **textual content** of reviews into these three categories.
18
+
19
+ #### Mapping Star Ratings to Sentiment Classes
20
+ Since the dataset contains **star ratings (1 to 5)**, you should map them to three sentiment classes as follows:
21
+
22
+ | **Star Rating** | **Sentiment Class** |
23
+ |---------------|------------------|
24
+ | 1 - 2 | **Negative** |
25
+ | 3 | **Neutral** |
26
+ | 4 - 5 | **Positive** |
27
+
28
+ This is a simple approach, but you are encouraged to experiment with different mappings!
29
+
30
+
31
+ **Model Building**
32
+
33
+ For classifying customer reviews into **positive, negative, or neutral**, use **pretrained transformer-based models** to leverage powerful language representations without training from scratch.
34
+
35
+ #### Suggested Models
36
+ - **`distilbert-base-uncased`** – Lightweight and fast, ideal for limited resources.
37
+ - **`bert-base-uncased`** – A strong general-purpose model for sentiment analysis.
38
+ - **`roberta-base`** – More robust to nuanced sentiment variations.
39
+ - **`nlptown/bert-base-multilingual-uncased-sentiment`** – Handles multiple languages, useful for diverse datasets.
40
+ - **`cardiffnlp/twitter-roberta-base-sentiment`** – Optimized for short texts like social media reviews.
41
+
42
+ Explore models on [Hugging Face](https://huggingface.co/models) and experiment with fine-tuning to improve accuracy.
43
+
44
+ ### Model Evaluation
45
+
46
+ #### Evaluation Metrics
47
+
48
+ - Evaluated the model's performance on a separate test dataset using various evaluation metrics:
49
+ - Accuracy: Percentage of correctly classified instances.
50
+ - Precision: Proportion of true positive predictions among all positive predictions.
51
+ - Recall: Proportion of true positive predictions among all actual positive instances.
52
+ - F1-score: Harmonic mean of precision and recall.
53
+ - Calculated confusion matrix to analyze model's performance across different classes.
54
+
55
+ #### Results
56
+
57
+ - Model achieved an accuracy of X% on the test dataset.
58
+ - Precision, recall, and F1-score for each class are as follows:
59
+ - Class 1: Precision=X%, Recall=X%, F1-score=X%
60
+ - Class 2: Precision=X%, Recall=X%, F1-score=X%
61
+ - ...
62
+ - Confusion matrix showing table and graphical representations
63
+
64
+
65
+ ### 2. Product Category Clustering
66
+ - **Objective**: Simplify the dataset by clustering product categories into **4-6 meta-categories**.
67
+ - **Task**: Create a model to group all reviews into 4-6 broader categories. Example suggestions:
68
+ - Ebook readers
69
+ - Batteries
70
+ - Accessories (keyboards, laptop stands, etc.)
71
+ - Non-electronics (Nespresso pods, pet carriers, etc.)
72
+ - **Note**: Analyze the dataset in depth to determine the most appropriate categories.
73
+
74
+ ### 3. Review Summarization Using Generative AI
75
+ - **Objective**: Summarize reviews into articles that recommend the top products for each category.
76
+ - **Task**: Create a model that generates a short article (like a blog post) for each product category. The output should include:
77
+ - **Top 3 products** and key differences between them.
78
+ - **Top complaints** for each of those products.
79
+ - **Worst product** in the category and why it should be avoided.
80
+
81
+ Consider using **Pretrained Generative Models** like **T5**, **GPT-3**, or **BART** for generating coherent and well-structured summaries. These models excel at tasks like summarization and text generation, and can be fine-tuned to produce high-quality outputs based on the extracted insights from reviews.
82
+ You are encouraged to explore other **Transformer-based models** available on platforms like **Hugging Face**. Fine-tuning any of these pre-trained models on your specific dataset could further improve the relevance and quality of the generated summaries.
83
+
84
+ ## Datasets
85
+
86
+ - **Primary Dataset**: [Amazon Product Reviews](https://www.kaggle.com/datasets/datafiniti/consumer-reviews-of-amazon-products/data)
87
+ - **Larger Dataset**: [Amazon Reviews Dataset](https://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_reviews)
88
+ - **Additional Datasets**: You are free to use other datasets from sources like HuggingFace, Kaggle, or any other platform.
89
+
90
+ <!-- ## Deployment -->
91
+
92
+ <!-- - **Hosting**: You are free to host the models on your laptop or any cloud platform.
93
+ - **Framework**: You can use any framework of your choice (e.g., Gradio, AWS, etc.).
94
+
95
+ - **Options**:
96
+ - List the models on HuggingFace.
97
+ - Deploy a text file with the final results.
98
+ - Create a website that displays the final results.
99
+ - Build a live review aggregator.
100
+ - Develop a website that generates recommendations by uploading a file with reviews.
101
+
102
+ - **Inspiration**: Look at websites like Consumer Reviews, The Verge, or The Wirecutter for ideas. -->
103
+
104
+
105
+ ## Deployment Guidelines
106
+
107
+ ### Expectations
108
+
109
+ - You are expected to showcase a webpage or web app in which some simple user interactions are possible (for example through buttons, text boxes, sliders, ...).
110
+ - All your three components (classification, clustering, and text summarizer) should be visible or possible to interact with on the page in some form.
111
+ - You are free to host the models on your laptop or any cloud platform (e.g., Gradio, AWS, etc.).
112
+
113
+ We provide you with some ideas below. However, you are not limited to these options. Feel free to build a web app or website that does different things to what listed below.
114
+
115
+ 1. **Create a website for the marketing department in your company**, who needs to gain insights on how well the products are received by customers (from reviews) and what other competitive products exist in the market. For example, users in your webpage can choose between product categories and be shown statistics insights (distribution of ratings, best product ratings, etc), and text summarization for that specific category (which are the best product in this category, etc).
116
+ 2. **Build a live review aggregator**: this could be a website like, for example, https://www.trustpilot.com/ or https://www.yelp.com/, organizing reviews strategically for buyers. You could add functionality for users to add reviews (for example, through a form, a user could write about a product, selecting which cluster category it belongs to and the rating given). Once a review is submitted, it could be displayed on the page as a ‘recently added review’. Feel free to come up with your own ideas about how you would like your live review aggregator to look like and behave
117
+ 3. **Develop a website that generates recommendations by allowing users to upload a csv file with reviews**. For example, this website could allow business owners to upload a dataset of their products and respective reviews. Your website would process these, classifying them, clustering them, and showing insights in the form of small articles listing top products, main product issues, etc., for example (e.g., a list of articles, one per product; a list of articles, one per cluster).
118
+ 4. **Develop a website that allows users to search for information about a product or product category through a text box**. This could be a text box where users type in what they are looking for / would like to buy. The output could display recommendations of products in text summary format, the category of the product, and the sentiment distribution for that product.
119
+
120
+ ## Deliverables
121
+
122
+ 1. **Source Code**:
123
+ - Well-organized and linted code (use tools like `pylint`).
124
+ - Notebooks should be structured with clear headers/sections.
125
+ - Alternatively, provide plain Python files with a `main()` function.
126
+ 2. **README**:
127
+ - A detailed README file explaining how to run the code and reproduce the results.
128
+ 3. **Final Output**:
129
+ - Generated blog posts with product recommendations.
130
+ - A website, text file, or Word document containing the final results.
131
+ 4. **PPT Presentation**:
132
+ - A presentation (no more than 15 minutes) tailored for both technical and non-technical audiences.
133
+ 5. **Deployed Model**:
134
+ - A deployed web app using the framework of your choice.
135
+ - Bonus: Host the app so it can be queried by anyone.
136
+
137
+ ## Evaluation Criteria
138
+
139
+ | **Task** | **Points** |
140
+ |---------------------------------------|------------|
141
+ | Data Preprocessing | 15 |
142
+ | Model for Review Classification | 20 |
143
+ | Clustering Model | 20 |
144
+ | Summarization Model | 20 |
145
+ | Deployment of the Model | 10 |
146
+ | PDF Report (Approach, Results, Analysis) | 5 |
147
+ | PPT Presentation | 10 |
148
+ | **Bonus**: Hosting the App Publicly | 10 |
149
+
150
+ **Passing Score**: 70 points.
151
+
152
+ ## Additional Notes
153
+
154
+ - **Teamwork**: Work in groups of no more than 3 people. If necessary, one group may have 4 members.
155
+ - **Presentation**: Tailor your presentation for both technical and non-technical audiences. Refer to the "Create Presentation" guidelines in the Student Portal.
156
+
157
+ ## Suggested Workflow
158
+
159
+ 1. **Data Collection**: Gather and preprocess the dataset(s).
160
+ 2. **Model Development**:
161
+ - Build and evaluate the review classification model.
162
+ - Develop and test the clustering model.
163
+ - Create the summarization model using Generative AI.
164
+ 3. **Deployment**: Deploy the models using your chosen framework.
165
+ 4. **Documentation**: Prepare the README, PDF report, and PPT presentation.
166
+ 5. **Final Delivery**: Submit all deliverables, including the deployed app and final output.
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ dataset.csv filter=lfs diff=lfs merge=lfs -text
3.md ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ![logo_ironhack_blue](https://user-images.githubusercontent.com/23629340/40541063-a07a0a8a-601a-11e8-91b5-2f13e4e6b441.png)
2
+
3
+ # Project NLP | Business Case: Automated Customer Reviews
4
+
5
+ ## Project Goal
6
+
7
+ This project aims to develop a product review system powered by NLP models that aggregate customer feedback from different sources. The key tasks include classifying reviews, clustering product categories, and using generative AI to summarize reviews into recommendation articles.
8
+
9
+ ## Problem Statement
10
+
11
+ With thousands of reviews available across multiple platforms, manually analyzing them is inefficient. This project seeks to automate the process using NLP models to extract insights and provide users with valuable product recommendations.
12
+
13
+ ## Main Tasks
14
+
15
+ ### 1. Review Classification
16
+ - **Objective**: Classify customer reviews into **positive**, **negative**, or **neutral** categories to help the company improve its products and services.
17
+ - **Task**: Create a model for classifying the **textual content** of reviews into these three categories.
18
+
19
+ #### Mapping Star Ratings to Sentiment Classes
20
+ Since the dataset contains **star ratings (1 to 5)**, you should map them to three sentiment classes as follows:
21
+
22
+ | **Star Rating** | **Sentiment Class** |
23
+ |---------------|------------------|
24
+ | 1 - 2 | **Negative** |
25
+ | 3 | **Neutral** |
26
+ | 4 - 5 | **Positive** |
27
+
28
+ This is a simple approach, but you are encouraged to experiment with different mappings!
29
+
30
+
31
+ **Model Building**
32
+
33
+ For classifying customer reviews into **positive, negative, or neutral**, use **pretrained transformer-based models** to leverage powerful language representations without training from scratch.
34
+
35
+ #### Suggested Models
36
+ - **`distilbert-base-uncased`** – Lightweight and fast, ideal for limited resources.
37
+ - **`bert-base-uncased`** – A strong general-purpose model for sentiment analysis.
38
+ - **`roberta-base`** – More robust to nuanced sentiment variations.
39
+ - **`nlptown/bert-base-multilingual-uncased-sentiment`** – Handles multiple languages, useful for diverse datasets.
40
+ - **`cardiffnlp/twitter-roberta-base-sentiment`** – Optimized for short texts like social media reviews.
41
+
42
+ Explore models on [Hugging Face](https://huggingface.co/models) and experiment with fine-tuning to improve accuracy.
43
+
44
+ ### Model Evaluation
45
+
46
+ #### Evaluation Metrics
47
+
48
+ - Evaluated the model's performance on a separate test dataset using various evaluation metrics:
49
+ - Accuracy: Percentage of correctly classified instances.
50
+ - Precision: Proportion of true positive predictions among all positive predictions.
51
+ - Recall: Proportion of true positive predictions among all actual positive instances.
52
+ - F1-score: Harmonic mean of precision and recall.
53
+ - Calculated confusion matrix to analyze model's performance across different classes.
54
+
55
+ #### Results
56
+
57
+ - Model achieved an accuracy of X% on the test dataset.
58
+ - Precision, recall, and F1-score for each class are as follows:
59
+ - Class 1: Precision=X%, Recall=X%, F1-score=X%
60
+ - Class 2: Precision=X%, Recall=X%, F1-score=X%
61
+ - ...
62
+ - Confusion matrix showing table and graphical representations
63
+
64
+
65
+ ### 2. Product Category Clustering
66
+ - **Objective**: Simplify the dataset by clustering product categories into **4-6 meta-categories**.
67
+ - **Task**: Create a model to group all reviews into 4-6 broader categories. Example suggestions:
68
+ - Ebook readers
69
+ - Batteries
70
+ - Accessories (keyboards, laptop stands, etc.)
71
+ - Non-electronics (Nespresso pods, pet carriers, etc.)
72
+ - **Note**: Analyze the dataset in depth to determine the most appropriate categories.
73
+
74
+ ### 3. Review Summarization Using Generative AI
75
+ - **Objective**: Summarize reviews into articles that recommend the top products for each category.
76
+ - **Task**: Create a model that generates a short article (like a blog post) for each product category. The output should include:
77
+ - **Top 3 products** and key differences between them.
78
+ - **Top complaints** for each of those products.
79
+ - **Worst product** in the category and why it should be avoided.
80
+
81
+ Consider using **Pretrained Generative Models** like **T5**, **GPT-3**, or **BART** for generating coherent and well-structured summaries. These models excel at tasks like summarization and text generation, and can be fine-tuned to produce high-quality outputs based on the extracted insights from reviews.
82
+ You are encouraged to explore other **Transformer-based models** available on platforms like **Hugging Face**. Fine-tuning any of these pre-trained models on your specific dataset could further improve the relevance and quality of the generated summaries.
83
+
84
+ ## Datasets
85
+
86
+ - **Primary Dataset**: [Amazon Product Reviews](https://www.kaggle.com/datasets/datafiniti/consumer-reviews-of-amazon-products/data)
87
+ - **Larger Dataset**: [Amazon Reviews Dataset](https://cseweb.ucsd.edu/~jmcauley/datasets.html#amazon_reviews)
88
+ - **Additional Datasets**: You are free to use other datasets from sources like HuggingFace, Kaggle, or any other platform.
89
+
90
+ <!-- ## Deployment -->
91
+
92
+ <!-- - **Hosting**: You are free to host the models on your laptop or any cloud platform.
93
+ - **Framework**: You can use any framework of your choice (e.g., Gradio, AWS, etc.).
94
+
95
+ - **Options**:
96
+ - List the models on HuggingFace.
97
+ - Deploy a text file with the final results.
98
+ - Create a website that displays the final results.
99
+ - Build a live review aggregator.
100
+ - Develop a website that generates recommendations by uploading a file with reviews.
101
+
102
+ - **Inspiration**: Look at websites like Consumer Reviews, The Verge, or The Wirecutter for ideas. -->
103
+
104
+
105
+ ## Deployment Guidelines
106
+
107
+ ### Expectations
108
+
109
+ - You are expected to showcase a webpage or web app in which some simple user interactions are possible (for example through buttons, text boxes, sliders, ...).
110
+ - All your three components (classification, clustering, and text summarizer) should be visible or possible to interact with on the page in some form.
111
+ - You are free to host the models on your laptop or any cloud platform (e.g., Gradio, AWS, etc.).
112
+
113
+ We provide you with some ideas below. However, you are not limited to these options. Feel free to build a web app or website that does different things to what listed below.
114
+
115
+ 1. **Create a website for the marketing department in your company**, who needs to gain insights on how well the products are received by customers (from reviews) and what other competitive products exist in the market. For example, users in your webpage can choose between product categories and be shown statistics insights (distribution of ratings, best product ratings, etc), and text summarization for that specific category (which are the best product in this category, etc).
116
+ 2. **Build a live review aggregator**: this could be a website like, for example, https://www.trustpilot.com/ or https://www.yelp.com/, organizing reviews strategically for buyers. You could add functionality for users to add reviews (for example, through a form, a user could write about a product, selecting which cluster category it belongs to and the rating given). Once a review is submitted, it could be displayed on the page as a ‘recently added review’. Feel free to come up with your own ideas about how you would like your live review aggregator to look like and behave
117
+ 3. **Develop a website that generates recommendations by allowing users to upload a csv file with reviews**. For example, this website could allow business owners to upload a dataset of their products and respective reviews. Your website would process these, classifying them, clustering them, and showing insights in the form of small articles listing top products, main product issues, etc., for example (e.g., a list of articles, one per product; a list of articles, one per cluster).
118
+ 4. **Develop a website that allows users to search for information about a product or product category through a text box**. This could be a text box where users type in what they are looking for / would like to buy. The output could display recommendations of products in text summary format, the category of the product, and the sentiment distribution for that product.
119
+
120
+ ## Deliverables
121
+
122
+ 1. **Source Code**:
123
+ - Well-organized and linted code (use tools like `pylint`).
124
+ - Notebooks should be structured with clear headers/sections.
125
+ - Alternatively, provide plain Python files with a `main()` function.
126
+ 2. **README**:
127
+ - A detailed README file explaining how to run the code and reproduce the results.
128
+ 3. **Final Output**:
129
+ - Generated blog posts with product recommendations.
130
+ - A website, text file, or Word document containing the final results.
131
+ 4. **PPT Presentation**:
132
+ - A presentation (no more than 15 minutes) tailored for both technical and non-technical audiences.
133
+ 5. **Deployed Model**:
134
+ - A deployed web app using the framework of your choice.
135
+ - Bonus: Host the app so it can be queried by anyone.
136
+
137
+ ## Evaluation Criteria
138
+
139
+ | **Task** | **Points** |
140
+ |---------------------------------------|------------|
141
+ | Data Preprocessing | 15 |
142
+ | Model for Review Classification | 20 |
143
+ | Clustering Model | 20 |
144
+ | Summarization Model | 20 |
145
+ | Deployment of the Model | 10 |
146
+ | PDF Report (Approach, Results, Analysis) | 5 |
147
+ | PPT Presentation | 10 |
148
+ | **Bonus**: Hosting the App Publicly | 10 |
149
+
150
+ **Passing Score**: 70 points.
151
+
152
+ ## Additional Notes
153
+
154
+ - **Teamwork**: Work in groups of no more than 3 people. If necessary, one group may have 4 members.
155
+ - **Presentation**: Tailor your presentation for both technical and non-technical audiences. Refer to the "Create Presentation" guidelines in the Student Portal.
156
+
157
+ ## Suggested Workflow
158
+
159
+ 1. **Data Collection**: Gather and preprocess the dataset(s).
160
+ 2. **Model Development**:
161
+ - Build and evaluate the review classification model.
162
+ - Develop and test the clustering model.
163
+ - Create the summarization model using Generative AI.
164
+ 3. **Deployment**: Deploy the models using your chosen framework.
165
+ 4. **Documentation**: Prepare the README, PDF report, and PPT presentation.
166
+ 5. **Final Delivery**: Submit all deliverables, including the deployed app and final output.
Dockerfile ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Use Python 3.9 as base image
2
+ FROM python:3.9-slim
3
+
4
+ # Set working directory
5
+ WORKDIR /app
6
+
7
+ # Copy requirements first to leverage Docker cache
8
+ COPY requirements.txt .
9
+
10
+ # Install dependencies
11
+ RUN pip install --no-cache-dir -r requirements.txt
12
+
13
+ # Copy the rest of the application
14
+ COPY . .
15
+
16
+ # Create directory for model
17
+ RUN mkdir -p /app/final_model
18
+
19
+ # Expose port 7860 for Gradio
20
+ EXPOSE 7860
21
+
22
+ # Set environment variables
23
+ ENV GRADIO_SERVER_NAME=0.0.0.0
24
+
25
+ # Command to run the application
26
+ CMD ["python", "app.py"]
app.py ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import pandas as pd
3
+ import numpy as np
4
+ import matplotlib.pyplot as plt
5
+ import seaborn as sns
6
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
7
+ import torch
8
+ from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
9
+ import io
10
+ import base64
11
+
12
+ # Load the model and tokenizer
13
+ model_path = "./final_model"
14
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
15
+ model = AutoModelForSequenceClassification.from_pretrained(model_path)
16
+
17
+ def predict_sentiment(text):
18
+ # Preprocess text
19
+ text = text.lower()
20
+
21
+ # Tokenize
22
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
23
+
24
+ # Get prediction
25
+ with torch.no_grad():
26
+ outputs = model(**inputs)
27
+ logits = outputs.logits
28
+ probabilities = torch.nn.functional.softmax(logits, dim=-1)
29
+ predicted_class = torch.argmax(probabilities, dim=-1).item()
30
+
31
+ # Map class to sentiment
32
+ sentiment_map = {0: "Negative", 1: "Neutral", 2: "Positive"}
33
+ sentiment = sentiment_map[predicted_class]
34
+
35
+ # Get probabilities
36
+ probs = probabilities[0].tolist()
37
+ prob_dict = {sentiment_map[i]: f"{prob*100:.2f}%" for i, prob in enumerate(probs)}
38
+
39
+ return sentiment, prob_dict
40
+
41
+ def analyze_reviews(reviews_text):
42
+ # Split reviews by newline
43
+ reviews = [r.strip() for r in reviews_text.split('\n') if r.strip()]
44
+
45
+ if not reviews:
46
+ return "Please enter at least one review.", None
47
+
48
+ # Process each review
49
+ results = []
50
+ for review in reviews:
51
+ sentiment, probs = predict_sentiment(review)
52
+ results.append({
53
+ 'Review': review,
54
+ 'Sentiment': sentiment,
55
+ 'Confidence': probs
56
+ })
57
+
58
+ # Create DataFrame for display
59
+ df = pd.DataFrame(results)
60
+
61
+ # Create visualization
62
+ plt.figure(figsize=(10, 6))
63
+ sentiment_counts = df['Sentiment'].value_counts()
64
+ plt.bar(sentiment_counts.index, sentiment_counts.values)
65
+ plt.title('Sentiment Distribution')
66
+ plt.xlabel('Sentiment')
67
+ plt.ylabel('Count')
68
+
69
+ # Save plot to bytes
70
+ buf = io.BytesIO()
71
+ plt.savefig(buf, format='png')
72
+ buf.seek(0)
73
+ plot_base64 = base64.b64encode(buf.read()).decode('utf-8')
74
+ plt.close()
75
+
76
+ return df, f'<img src="data:image/png;base64,{plot_base64}" style="max-width:100%;">'
77
+
78
+ # Create Gradio interface
79
+ with gr.Blocks(title="Amazon Review Sentiment Analysis") as demo:
80
+ gr.Markdown("# Amazon Review Sentiment Analysis")
81
+ gr.Markdown("Enter one or more reviews (one per line) to analyze their sentiment.")
82
+
83
+ with gr.Row():
84
+ with gr.Column():
85
+ reviews_input = gr.Textbox(
86
+ label="Enter Reviews",
87
+ placeholder="Enter your reviews here (one per line)...",
88
+ lines=10
89
+ )
90
+ analyze_btn = gr.Button("Analyze Reviews")
91
+
92
+ with gr.Column():
93
+ results_table = gr.Dataframe(
94
+ headers=["Review", "Sentiment", "Confidence"],
95
+ datatype=["str", "str", "str"],
96
+ col_count=(3, "fixed")
97
+ )
98
+ plot_output = gr.HTML()
99
+
100
+ analyze_btn.click(
101
+ fn=analyze_reviews,
102
+ inputs=reviews_input,
103
+ outputs=[results_table, plot_output]
104
+ )
105
+
106
+ if __name__ == "__main__":
107
+ demo.launch(share=True)
d.sh ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ #!/bin/bash
2
+ git add .
3
+ git commit -m "first commit"
4
+ git push -u origin main
dataset.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9718a894d000fa2c116b45f171f7efc5ca2042262a1ab3f593409a938967e3ac
3
+ size 99558441
docker-compose.yml ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ version: '3.8'
2
+
3
+ services:
4
+ sentiment-analysis:
5
+ build: .
6
+ ports:
7
+ - "7860:7860"
8
+ volumes:
9
+ - ./final_model:/app/final_model
10
+ environment:
11
+ - GRADIO_SERVER_NAME=0.0.0.0
12
+ restart: unless-stopped
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ gradio>=3.50.2
2
+ pandas>=2.0.0
3
+ numpy>=1.24.0
4
+ matplotlib>=3.7.0
5
+ seaborn>=0.12.0
6
+ torch>=2.0.0
7
+ transformers>=4.30.0
8
+ scikit-learn>=1.2.0
tree.sh ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ # Directory to generate the tree for (default is the current directory)
4
+ directory="${1:-.}"
5
+
6
+ # Output file (default is output.txt)
7
+ output_file="${2:-output.txt}"
8
+
9
+ # Exclude directories
10
+ excluded_dirs="node_modules|trash|build|dist"
11
+
12
+ # Run the tree command and save output to file, excluding the specified folders
13
+ tree --prune -I "$excluded_dirs" "$directory" > "$output_file"
14
+
15
+ echo "Directory tree saved to $output_file, excluding specified folders."