JerylKhoo's picture
Upload folder using huggingface_hub
63b0412 verified

A newer version of the Gradio SDK is available: 5.49.1

Upgrade
metadata
title: Filtering-Google-Reviews-with-NB-and-NLP
app_file: main.py
sdk: gradio
sdk_version: 5.44.1

Filtering-Trustworthy-Google-Reviews-with-Naive-Bayes-Natural-Language-Processing

Project Overview

Inspiration

With information being shared widely over the internet such as Google reviews, how do we determine whether it is legitimate or it is just a noise? Our team thus wanted to create a system where it can help online users differentiate the valid reviews from those that are promotional and irrelevant.

What it does

Our model is designed to analyze and differentiate online reviews, allowing user to identify which will be truly useful to them. It can detect genuine reviews about businesses which often reflects real experiences and opinions on the business. It is also able to identify advertisements such as those that include promotional links which are not helpful to online users. It can also reduce misleading reviews such as those written by those without first-hand visit which will be classified under "rant without visit". Our model also provides a probability confidence scoring, providing user transparency on the certainty of the predicted classification of the review.

How we built it

We started with a labeled dataset covering different review types: real experiences, ads, irrelevant chatter, and hearsay. After cleaning and preparing the text, we applied NLP techniques like TF-IDF to turn words into meaningful patterns the model could understand.

To enhance this, we integrated the Bayesian network with NLP, allowing us to not only capture the importance of words but also the likelihood of certain features occurring togetherโ€”such as URLs signaling ads or the mention of business names indicating relevance. This integration gave our model the ability to handle uncertainty and weigh different clues more effectively before making a decision.

Challenges We Faced

The toughest part was dealing with unbalanced dataโ€”some review classification types were rare, so the model struggled to learn those patterns. Another challenge was teaching the model to recognize subtle signals that tell a true customer story apart from hearsay or an ad.

Accomplishments that we're proud of

Creating a system that feels intuitive by mixing NLP with Bayesian reasoning Effectively spotting promotional content using URL detection combined with probabilistic rules Building a simple interface for users to test and see the classifications in action Improving our understanding of both machine learning and the problem of fake or irrelevant reviews

What we learned

Through this project, we discovered the complexity of interpreting individuals' reviews as ideas can be expressed in multiple way and they vary from person to person. We gain insights on how we can make use of NLP to turn those texts into meaningful features that computers are able to analyze. This project also helped strengthened our skills in model training and deploying interactive tools that have significant impact.

Setup

Step 1: run pip install -r requirements.txt

Step 2: run py main.py

Step 3: Click browser link

Step 4: Have fun!!! ๐Ÿ˜€๐Ÿ˜€๐Ÿ˜€๐Ÿ˜€

How to reproduce results

Advertisement

Verified Seller โœ… More than 100 brands available. Lowest price guaranteed. DM now for orders! @jimmy
Discover exclusive collections from top designers and local artisans.

Up to 50% off seasonal apparel โ€“ Shop Now

Personalized styling consultations โ€“ Book Here

Buy 2, get 1 free on select accessories โ€“ Grab Deal

From chic dresses to casual wear, LuxeMart ensures your wardrobe is always on-trend. Donโ€™t miss our members-only fashion previews โ€“ Join LuxeClub today.

Rant without visit

I heard from my friendโ€™s cousin that the food here makes you sick, so Iโ€™ll never go.
From the reviews I read, this place is the worst ever. Won't ever visit it

Irrelevant

Iโ€™m learning how to code in Python these days, itโ€™s fun.
Bought my laptop here last week, works perfectly fine so far. Price is reasonable.