Spaces:

AliHashir
/

ai_for_all

Sleeping

App Files Files Community

ai_for_all / README.md

AliHashir

Add Hugging Face Space metadata to README

42113a2 2 months ago

preview code

raw

history blame contribute delete

8.25 kB

metadata

title: AI For All - Fact Checker
emoji: 🔍
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit

AI For All - Fact Checker

A sophisticated fact-checking API built with FastAPI that verifies claims using web search, content analysis, and natural language inference.

Features

Multi-Source Verification: Search and analyze claims across multiple web sources
ML-Powered Analysis: Uses advanced NLP models for semantic understanding and inference
Smart Content Extraction: Intelligent web scraping with multiple fallback strategies
Verdict Aggregation: Combines evidence from multiple sources for accurate assessment
Post Generation: Creates shareable social media content based on findings
Persistent Storage: Save and share results with unique URLs
Web Interface: User-friendly HTML interface with real-time updates

Demo

🚀 Live Demo: Deploy your own on Railway

Try These Example Claims:

"The Earth is flat"
"Vaccines cause autism"
"Climate change is a hoax"
"The Great Wall of China is visible from space"

How It Works:

Enter a claim in the web interface at /
AI searches multiple sources across the web using Serper API
ML models analyze content for relevance and accuracy using DeBERTa and sentence-transformers
Get instant verdict with supporting evidence and confidence scores
Share results with unique URLs at /r/{share_id}

Quick Start

Clone and setup:

git clone <repository-url>
cd ai_for_all

Install dependencies:
```
pip install -r requirements.txt
```

Configure environment:

# Create .env file with your Serper API key
echo "SERPER_API_KEY=your_serper_api_key_here" > .env

Run the server:
```
uvicorn app.main:app --reload
```
Test the application:
- Visit http://localhost:8000 for the web interface
- Or test the API: curl -X POST http://localhost:8000/check -H "Content-Type: application/json" -d '{"claim": "The Earth is round"}'

API Endpoints

Core Endpoints

GET / - Web interface homepage
POST /check - Fact-check a claim (JSON API)
POST /ui/check - Fact-check via web form (HTMX)
GET /r/{share_id} - View shareable fact-check result

Example API Usage

Request:

curl -X POST http://localhost:8000/check \
  -H "Content-Type: application/json" \
  -d '{"claim": "The Earth is flat"}'

Response:

{
  "claim": "The Earth is flat",
  "verdict": "False",
  "confidence": 0.95,
  "sources": [
    {
      "url": "https://example.com/earth-round",
      "title": "Scientific Evidence for Earth's Spherical Shape",
      "snippet": "Multiple lines of evidence confirm...",
      "relevance": 0.92
    }
  ],
  "reasoning": "Based on overwhelming scientific evidence...",
  "post": "🔍 Fact Check: The claim 'The Earth is flat' is FALSE...",
  "share_id": "abc123def456"
}

Architecture

Core Components

Search Module (app/search/): Serper API integration with deduplication
Fetch Module (app/fetch/): Multi-strategy content extraction (trafilatura, readability, BeautifulSoup)
NLP Module (app/nlp/): Embeddings (sentence-transformers) and NLI (DeBERTa)
Logic Module (app/logic/): Pipeline orchestration and post generation
Storage Module (app/store/): SQLite database with JSON blob storage
Web Module (app/web/): Jinja2 templates with HTMX integration

Technology Stack

Backend: FastAPI with async/await support
ML/NLP: sentence-transformers (all-MiniLM-L6-v2), transformers (DeBERTa-v3-base-mnli)
Search: Serper API for web search
Storage: SQLite with JSON serialization
Frontend: HTMX + Jinja2 templates (no build step required)
Web Scraping: trafilatura, readability-lxml, BeautifulSoup

Deployment

Railway Deployment (Recommended)

This project is configured for one-click deployment on Railway:

Fork this repository on GitHub
Connect to Railway:
- Go to Railway
- Click "Deploy from GitHub repo"
- Select your fork
Set environment variables in Railway dashboard:
- SERPER_API_KEY: Your Serper API key (get from serper.dev)
Deploy automatically - Railway uses Procfile and runtime.txt

The app will be live at your Railway-provided URL (e.g., https://your-app.up.railway.app)

Hugging Face Spaces Deployment

Deploy on Hugging Face Spaces for free ML model hosting:

Create a new Space on Hugging Face
Select Docker as the Space SDK
Upload your repository files including the Dockerfile
Set environment variables in Space settings:
- SERPER_API_KEY: Your Serper API key
Deploy automatically - Spaces will build using the Dockerfile

The app will be live at your Space URL: https://<your-space>.<your-username>.hf.space/

API Example for Spaces:

curl -X POST "https://<your-space>.<your-username>.hf.space/check" \
  -H "Content-Type: application/json" \
  -d '{"claim": "The Earth orbits the Sun."}'

Manual Deployment

For other platforms, the project includes:

Procfile: web: uvicorn app.main:app --host 0.0.0.0 --port $PORT
runtime.txt: python-3.11.9
Dockerfile: Docker container configuration for Spaces
requirements.txt: All dependencies with versions

Environment Variables

SERPER_API_KEY: Required - Get from serper.dev
DATABASE_URL: Optional - SQLite database path (default: ./factcheck.db)

Testing

Run the comprehensive test suite (18 tests covering all components):

# Install test dependencies
pip install pytest

# Run all tests
pytest tests/ -v

# Expected output: 18 tests passed

Tests cover:

API endpoints and response formats
ML pipeline components (search, NLP, logic)
Database operations and JSON serialization
Error handling and edge cases

Technical Implementation

Pipeline Flow

Claim Input: User submits claim via web UI or API
Web Search: Serper API searches for relevant sources
Content Extraction: Multi-strategy scraping of source content
Relevance Filtering: Sentence embeddings rank source relevance
Fact Verification: DeBERTa model performs natural language inference
Verdict Aggregation: Confidence-weighted averaging of individual verdicts
Post Generation: AI creates shareable social media content
Result Storage: SQLite database saves results with unique share IDs

Key Features

Domain Deduplication: Prevents bias from multiple sources from same domain
Confidence Scoring: ML-based confidence estimation for verdicts
Robust Error Handling: Graceful degradation when sources fail to load
JSON Serialization: Proper handling of Pydantic models for database storage
HTMX Integration: Dynamic UI updates without JavaScript build complexity

Troubleshooting

Common Issues

"No search results found"

Check your SERPER_API_KEY is set correctly
Verify the claim is in English and well-formed

"Model loading errors"

Ensure you have sufficient disk space (~2GB for models)
Models download automatically on first run

"Database errors"

Check write permissions in the app directory
SQLite database is created automatically

Development Notes

Pydantic v2 Compatibility: The project uses model_dump(mode='json') for proper URL serialization when saving to database.

Model Caching: Transformer models are cached locally after first download. Subsequent runs are much faster.

Rate Limiting: Serper API has rate limits. Consider implementing caching for production use.

Environment Variables

SERPER_API_KEY: Required - Get from serper.dev
DATABASE_URL: Optional - SQLite database path (default: ./factcheck.db)

Tech Stack

FastAPI + Uvicorn
Transformers (DeBERTa, sentence-transformers)
SQLite for caching and result storage
httpx, trafilatura for web scraping