Spaces:
Running
title: FactChecker
emoji: π
colorFrom: pink
colorTo: red
sdk: docker
pinned: false
license: mit
short_description: 'FactChecker: Fake News Detector'
FactChecker: Fake News Detection Web Application
FactChecker is a web application that detects fake news using various machine learning models. The system analyzes text input and predicts whether the content is likely to be real or fake news, providing confidence scores and visualizations to help users understand the results.
Features
Multiple ML Models: Choose between three different models or use all of them together:
- Logistic Regression (Accuracy: 90.42%, F1 Score: 87.62%)
- Random Forest (Accuracy: 90.83%, F1 Score: 87.52%)
- DistilBERT (Accuracy: 91.00%, F1 Score: 88.45%)
Ensemble Approach: When selecting "All Models," the system combines predictions using a voting mechanism for more robust results
Real-time Analysis: Instantly assess the credibility of news articles or statements
Confidence Scores: View the model's level of certainty in its predictions
Visual Interface: Color-coded results (green for real, red for fake) for intuitive understanding
Technology Stack
Backend
- Python 3.11 with Flask 2.0.1
- NLTK 3.9.1 for natural language processing
- Scikit-learn 1.6.1 for traditional machine learning models
- PyTorch 2.6.0 and Transformers 4.49.0 for the DistilBERT model
- Gunicorn 20.1.0 for production deployment Verify the versions before running the BACKEND
Frontend
- React.js for the user interface
- Modern JavaScript (ES6+)
- CSS for styling
Data Processing
- Pandas and NumPy for data manipulation
- TF-IDF Vectorization for feature extraction
- Regular expressions for text cleaning
Project Structure
FactChecker/
βββ build/ # React build files(compiled frontend)
β βββ static/
β β βββ css/ # Compiled CSS
β β βββ js/ # Compiled JavaScript
β βββ asset-manifest.json
β βββ index.html # Main HTML file
β βββ logo.ico
β βββ logo.png
β βββ manifest.json
βββ model_training/ # Model training materials
β βββ visualizations/ # Generated visualization images
β βββ model_training.ipynb # Jupyter notebook for model training
βββ models/ # Saved ML models
β βββ tfidf_vectorizer.pkl # TF-IDF vectorizer
β βββ lr_model.pkl # Logistic Regression model
β βββ rf_model.pkl # Random Forest model
β βββ distilbert_model.pt # DistilBERT model
βββ .gitattributes
βββ Dockerfile # Docker configuration
βββ README.md
βββ app.py # Flask application
βββ requirements.txt # Python dependencies
Steps
For Backend:
- Clone the repository
- Create a virtual environment and install the dependencies.
pip install -r requirements.txt
- Download NLTK resources:
python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('wordnet')"
- Run the application
python app.py
For Frontend:
- Install dependencies:
npm install
- Build the frontend:
npm run build
Model Training
To retrain the models:
- Upload the notebook in Google Colab.
- Download the ISOT(true.csv, fake.csv) datasets and upload it to the google drive.
- Set runtime type to GPU for optimal performance:
Go to Runtime β Change runtime type β GPU β Save
- Activate the runtime.
- Execute the notebook cells sequentially to retrain the models.