EchoTruth / README.md
lightmate's picture
added config in readme
81f2695 verified
|
raw
history blame
6.38 kB
metadata
title: Truthfulness Checker
emoji: 📰
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 5.4.0
app_file: app.py
pinned: false
license: apache-2.0

Implementation Steps: Validating Information with Context

Validating the accuracy or degree of truthfulness of a given piece of information requires context—factual and relevant details surrounding the claim. Here’s how we approach this process step-by-step:


Step 1: Retrieving Context from Knowledge Graph Substitute - FAISS with Semantic Search

Instead of relying on a traditional Knowledge Graph (KG), we use FAISS (Facebook AI Similarity Search), a faster, scalable, and flexible alternative for semantic search.

Why FAISS is Better than a Traditional KG

  1. Sentence-Level Retrieval: Unlike traditional KGs that often rely on pre-defined entities and relationships, FAISS uses dense embeddings to directly match the semantic meaning of entire sentences.
  2. Scalable and High-Speed Retrieval: FAISS efficiently handles millions of embeddings, making it highly scalable for real-world applications.
  3. Flexibility: It works with unstructured text, removing the need to pre-process information into entities and relations, which is often time-consuming.
  4. Generalization: FAISS enables approximate nearest neighbor (ANN) search, allowing retrieval of contextually related results, even if they are not exact matches.

Dataset Used

We leverage the News Category Dataset (Kaggle Link), which contains news headlines and short descriptions across various categories.

  • Why This Dataset?
    It covers a wide range of topics, making it useful for general-purpose context building.
    • Headlines and descriptions provide rich semantic embeddings for similarity searches.
    • Categories allow filtering relevant results if required (e.g., "science" or "technology").

Process:

  1. We use SentenceTransformer (all-MiniLM-L6-v2) to generate embeddings for the query (the input news).
  2. We search against pre-computed embeddings stored in a FAISS index to retrieve the top-K most relevant entries.
  3. These results form the initial context, capturing related information already present in the dataset.

Step 2: Online Search for Real-Time Context

To augment the context retrieved from FAISS, we incorporate real-time online search using an API.

Why Online Search is Critical?

  • Fresh Information: News and facts evolve, especially in areas like science, technology, or politics. Online search ensures access to the latest updates that may not exist in the static dataset.
  • Diverse Sources: It broadens the scope by pulling information from multiple credible sources, reducing bias and enhancing reliability.
  • Fact-Checking: Search engines often index trusted fact-checking websites that we can incorporate into the context.

Process:

  1. Use an API with a search query derived from the input news.
  2. Retrieve relevant snippets, headlines, or summaries.
  3. Append these results to the context built using FAISS.

Step 3: Building Context from Combined Sources

Both FAISS-based retrieval and online search results are combined into a single context string. This provides a comprehensive knowledge base around the input information.

  • Why Combine Both?
    • FAISS offers pre-indexed knowledge—ideal for static facts or concepts.
    • Online search complements it with dynamic and up-to-date insights—perfect for verifying recent developments.

This layered context improves the model’s ability to assess the truthfulness of the given information.


Step 4: Truthfulness Prediction with Zero-Shot Classification Model

We use the Facebook/BART-Large-MNLI model, a zero-shot classification model, for evaluation.

Why BART-Large-MNLI?

  1. Zero-Shot Capability: It can handle claims and hypotheses without needing task-specific training—perfect for this flexible, multi-domain use case.
  2. Contextual Matching: It compares the input claim (news) with the constructed context to assess semantic consistency.
  3. High Accuracy: Pre-trained on natural language inference tasks, making it adept at understanding relationships like entailment and contradiction.
  4. Multi-Label Support: Can evaluate multiple labels simultaneously, ideal for degrees of truthfulness.

Process:

  1. Input the news as the claim and the context as the hypothesis.
  2. Compute a truthfulness score between 0 and 1, where:
    • 0: Completely false.
    • 1: Completely true.
  3. Generate explanations based on the score and suggest actions (e.g., further verification if uncertain).

End-to-End Example

Input News:
"Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment."

Context Built:

  • FAISS Search: Finds prior research on quantum time reversal and entanglement theories.
  • Online Search: Retrieves recent articles discussing quantum breakthroughs and expert views.

Model Evaluation:

  • Model compares the news with the combined context and outputs:
    Score: 0.72 (Likely True).

Result Explanation:

News: "Scientists Demonstrate 'Negative Time' In Groundbreaking Quantum Experiment."
Truthfulness Score: 0.72 (Likely true)
Analysis: You can reasonably trust this information, but further verification is always recommended for critical decisions.

Why This Approach Works?

  1. Balanced Context: Combines static knowledge (KG substitute) and dynamic knowledge (real-time search).
  2. Model Flexibility: Zero-shot model adapts to diverse topics without retraining.
  3. Scalable and Cost-Effective: Uses pre-trained models, FAISS indexing, and simple APIs for implementation.
  4. Interpretability: Outputs include confidence scores and explanations for transparency.

This modular approach ensures that the truthfulness assessment is scalable, explainable, and adaptable to new domains.