Spaces:
Running
Running
title: FactChecker | |
emoji: π | |
colorFrom: pink | |
colorTo: red | |
sdk: docker | |
pinned: false | |
license: mit | |
short_description: 'FactChecker: Fake News Detector' | |
#  FactChecker: Fake News Detection Web Application | |
FactChecker is a web application that detects fake news using various machine learning models. | |
The system analyzes text input and predicts whether the content is likely to be real or fake news, | |
providing confidence scores and visualizations to help users understand the results. | |
## Features | |
- **Multiple ML Models**: Choose between three different models or use all of them together: | |
- Logistic Regression (Accuracy: 90.42%, F1 Score: 87.62%) | |
- Random Forest (Accuracy: 90.83%, F1 Score: 87.52%) | |
- DistilBERT (Accuracy: 91.00%, F1 Score: 88.45%) | |
- **Ensemble Approach**: When selecting "All Models," the system combines predictions using a voting mechanism for more robust results | |
- **Real-time Analysis**: Instantly assess the credibility of news articles or statements | |
- **Confidence Scores**: View the model's level of certainty in its predictions | |
- **Visual Interface**: Color-coded results (green for real, red for fake) for intuitive understanding | |
## Technology Stack | |
### Backend | |
- Python 3.11 with Flask 2.0.1 | |
- NLTK 3.9.1 for natural language processing | |
- Scikit-learn 1.6.1 for traditional machine learning models | |
- PyTorch 2.6.0 and Transformers 4.49.0 for the DistilBERT model | |
- Gunicorn 20.1.0 for production deployment | |
**Verify the versions before running the BACKEND** | |
### Frontend | |
- React.js for the user interface | |
- Modern JavaScript (ES6+) | |
- CSS for styling | |
### Data Processing | |
- Pandas and NumPy for data manipulation | |
- TF-IDF Vectorization for feature extraction | |
- Regular expressions for text cleaning | |
## Project Structure | |
``` | |
FactChecker/ | |
βββ build/ # React build files(compiled frontend) | |
β βββ static/ | |
β β βββ css/ # Compiled CSS | |
β β βββ js/ # Compiled JavaScript | |
β βββ asset-manifest.json | |
β βββ index.html # Main HTML file | |
β βββ logo.ico | |
β βββ logo.png | |
β βββ manifest.json | |
βββ model_training/ # Model training materials | |
β βββ visualizations/ # Generated visualization images | |
β βββ model_training.ipynb # Jupyter notebook for model training | |
βββ models/ # Saved ML models | |
β βββ tfidf_vectorizer.pkl # TF-IDF vectorizer | |
β βββ lr_model.pkl # Logistic Regression model | |
β βββ rf_model.pkl # Random Forest model | |
β βββ distilbert_model.pt # DistilBERT model | |
βββ .gitattributes | |
βββ Dockerfile # Docker configuration | |
βββ README.md | |
βββ app.py # Flask application | |
βββ requirements.txt # Python dependencies | |
``` | |
## Steps | |
### For Backend: | |
1. Clone the repository | |
2. Create a virtual environment and install the dependencies. | |
```pip install -r requirements.txt``` | |
3. Download NLTK resources: | |
```python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('wordnet')"``` | |
4. Run the application | |
```python app.py``` | |
#### For Frontend: | |
1. Install dependencies: | |
```npm install ``` | |
3. Build the frontend: | |
```npm run build``` | |
#### Model Training | |
To retrain the models: | |
1. Upload the notebook in Google Colab. | |
2. Download the ISOT(true.csv, fake.csv) datasets and upload it to the google drive. | |
3. Set runtime type to GPU for optimal performance: | |
```Go to Runtime β Change runtime type β GPU β Save``` | |
4. Activate the runtime. | |
5. Execute the notebook cells sequentially to retrain the models. | |