metadata

title: FactChecker
emoji: 📚
colorFrom: pink
colorTo: red
sdk: docker
pinned: false
license: mit
short_description: 'FactChecker: Fake News Detector'

FactChecker: Fake News Detection Web Application

FactChecker is a web application that detects fake news using various machine learning models. The system analyzes text input and predicts whether the content is likely to be real or fake news, providing confidence scores and visualizations to help users understand the results.

Features

Multiple ML Models: Choose between three different models or use all of them together:
- Logistic Regression (Accuracy: 90.42%, F1 Score: 87.62%)
- Random Forest (Accuracy: 90.83%, F1 Score: 87.52%)
- DistilBERT (Accuracy: 91.00%, F1 Score: 88.45%)
Ensemble Approach: When selecting "All Models," the system combines predictions using a voting mechanism for more robust results
Real-time Analysis: Instantly assess the credibility of news articles or statements
Confidence Scores: View the model's level of certainty in its predictions
Visual Interface: Color-coded results (green for real, red for fake) for intuitive understanding

Technology Stack

Backend

Python 3.11 with Flask 2.0.1
NLTK 3.9.1 for natural language processing
Scikit-learn 1.6.1 for traditional machine learning models
PyTorch 2.6.0 and Transformers 4.49.0 for the DistilBERT model
Gunicorn 20.1.0 for production deployment Verify the versions before running the BACKEND

Frontend

React.js for the user interface
Modern JavaScript (ES6+)
CSS for styling

Data Processing

Pandas and NumPy for data manipulation
TF-IDF Vectorization for feature extraction
Regular expressions for text cleaning

Project Structure

FactChecker/
├── build/                    # React build files(compiled frontend)
│   ├── static/
│   │   ├── css/              # Compiled CSS
│   │   └── js/               # Compiled JavaScript
│   ├── asset-manifest.json
│   ├── index.html            # Main HTML file
│   ├── logo.ico
│   ├── logo.png
│   └── manifest.json
├── model_training/           # Model training materials
│   ├── visualizations/       # Generated visualization images
│   └── model_training.ipynb  # Jupyter notebook for model training
├── models/                   # Saved ML models
│   ├── tfidf_vectorizer.pkl  # TF-IDF vectorizer
│   ├── lr_model.pkl          # Logistic Regression model
│   ├── rf_model.pkl          # Random Forest model
│   └── distilbert_model.pt   # DistilBERT model
├── .gitattributes
├── Dockerfile                # Docker configuration
├── README.md
├── app.py                    # Flask application
└── requirements.txt          # Python dependencies

Steps

For Backend:

Clone the repository
Create a virtual environment and install the dependencies.
pip install -r requirements.txt
Download NLTK resources:
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('wordnet')"
Run the application
python app.py

For Frontend:

Install dependencies:
npm install
Build the frontend:
npm run build

Model Training

To retrain the models:

Upload the notebook in Google Colab.
Download the ISOT(true.csv, fake.csv) datasets and upload it to the google drive.
Set runtime type to GPU for optimal performance:
Go to Runtime → Change runtime type → GPU → Save
Activate the runtime.
Execute the notebook cells sequentially to retrain the models.