File size: 8,375 Bytes
3b7f62e
 
 
 
 
 
f79211f
4c2a969
09e7a27
0236fc5
6a6ef27
3b7f62e
 
 
16cd190
bcb986c
 
 
3b0238f
 
 
dbd4f9e
3b0238f
 
 
dbd4f9e
3b0238f
 
 
cbd0b83
 
55e565f
 
 
16cd190
55e565f
16cd190
 
 
 
dbd4f9e
 
 
 
 
55e565f
cbd0b83
bcb986c
 
 
16cd190
 
 
 
55e565f
 
bcb986c
 
16cd190
 
55e565f
16cd190
cbd0b83
 
16cd190
55e565f
 
dbd4f9e
 
 
55e565f
 
 
 
7a393a7
55e565f
 
 
d3c6528
55e565f
 
 
 
 
 
 
16cd190
55e565f
cbd0b83
 
 
 
 
 
 
 
 
55e565f
 
 
16cd190
3836a0c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
---
title: Fact Checking rocks!
emoji: 🎸
colorFrom: purple
colorTo: blue
sdk: streamlit
sdk_version: 1.19.0
app_file: Rock_fact_checker.py
pinned: true
models: [sentence-transformers/msmarco-distilbert-base-tas-b, microsoft/deberta-v2-xlarge-mnli, google/flan-t5-large]
tags: [fact-checking, rock, natural language inference, dense retrieval, large language models, haystack, neural search]
license: apache-2.0
---

# Fact Checking 🎸 Rocks!   [![Generic badge](https://img.shields.io/badge/πŸ€—-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/anakin87/fact-checking-rocks) [![Generic badge](https://img.shields.io/github/stars/anakin87/fact-checking-rocks?label=Github&style=social)](https://github.com/anakin87/fact-checking-rocks)

## *Fact checking baseline combining dense retrieval and textual entailment*

- [Fact Checking 🎸 Rocks!    ](#fact-checking--rocks---)
  - [*Fact checking baseline combining dense retrieval and textual entailment*](#fact-checking-baseline-combining-dense-retrieval-and-textual-entailment)
    - [Idea](#idea)
    - [Presentation](#presentation)
    - [System description](#system-description)
      - [Indexing pipeline](#indexing-pipeline)
      - [Search pipeline](#search-pipeline)
      - [Explain using a LLM](#explain-using-a-llm)
    - [Limits and possible improvements](#limits-and-possible-improvements)
    - [Repository structure](#repository-structure)
    - [Installation](#installation)
      - [Entailment Checker node](#entailment-checker-node)
      - [Fact Checking 🎸 Rocks!](#fact-checking--rocks)

### Idea
πŸ’‘ This project aims to show that a *naive and simple baseline* for fact checking can be built by combining dense retrieval and a textual entailment task.
In a nutshell, the flow is as follows:
* the user enters a factual statement
* the relevant passages are retrieved from the knowledge base using dense retrieval
* the system computes the text entailment between each relevant passage and the statement, using a Natural Language Inference model
* the entailment scores are aggregated to produce a summary score.

###  Presentation

- [🍿 Video presentation @ Berlin Buzzwords 2023](https://www.youtube.com/watch?v=4L8Iw9CZNbU)
- [πŸ§‘β€πŸ« Slides](./presentation/fact_checking_rocks.pdf)

### System description
πŸͺ„ This project is strongly based on [πŸ”Ž Haystack](https://github.com/deepset-ai/haystack), an open source NLP framework that enables seamless use of Transformer models and LLMs to interact with your data. The main components of our system are an indexing pipeline and a search pipeline.

#### Indexing pipeline
* [Crawling](https://github.com/anakin87/fact-checking-rocks/blob/321ba7893bbe79582f8c052493acfda497c5b785/notebooks/get_wikipedia_data.ipynb): Crawl data from Wikipedia, starting from the page [List of mainstream rock performers](https://en.wikipedia.org/wiki/List_of_mainstream_rock_performers) and using the [python wrapper](https://github.com/goldsmith/Wikipedia)
* [Indexing](https://github.com/anakin87/fact-checking-rocks/blob/321ba7893bbe79582f8c052493acfda497c5b785/notebooks/indexing.ipynb)
  * preprocess the downloaded documents into chunks consisting of 2 sentences
  * chunks with less than 10 words are discarded, because not very informative
  * instantiate a [FAISS](https://github.com/facebookresearch/faiss) Document store and store the passages on it
  * create embeddings for the passages, using a Sentence Transformer model and save them in FAISS. The retrieval task will involve [*asymmetric semantic search*](https://www.sbert.net/examples/applications/semantic-search/README.html#symmetric-vs-asymmetric-semantic-search) (statements to be verified are usually shorter than inherent passages), therefore I choose the model `msmarco-distilbert-base-tas-b`
  * save FAISS index.

#### Search pipeline

* the user enters a factual statement
* compute the embedding of the user statement using the same Sentence Transformer used for indexing (`msmarco-distilbert-base-tas-b`)
* retrieve the K most relevant text passages stored in FAISS (along with their relevance scores)
* the following steps are performed using the [`EntailmentChecker`, a custom Haystack node](https://github.com/anakin87/haystack-entailment-checker)
* **text entailment task**: compute the text entailment between each text passage (premise) and the user statement (hypothesis), using a Natural Language Inference model (`microsoft/deberta-v2-xlarge-mnli`). For every text passage, we have 3 scores (summing to 1): entailment, contradiction and neutral.
* aggregate the text entailment scores: compute the weighted average of them, where the weight is the relevance score. **Now it is possible to tell if the knowledge base confirms, is neutral or disproves the user statement.**
* *empirical consideration: if in the first N passages (N<K),  there is strong evidence of entailment/contradiction (partial aggregate scores > 0.5), it is better not to consider (K-N) less relevant documents.*

#### Explain using a LLM
* if there is entailment or contradiction, prompt `google/flan-t5-large`, asking why the relevant textual passages entail/contradict the user statement.

### Limits and possible improvements
 ✨ As mentioned, the current approach to fact checking is simple and naive. Some **structural limits of this approach**:
  * there is **no statement detection**. In fact, the statement to be verified is chosen by the user. In real-world applications, this step is often necessary.
  * **Wikipedia is taken as a source of truth**. Unfortunately, Wikipedia does not contain universal knowledge and there is no real guarantee that it is a source of truth. There are certainly very interesting approaches that view a snapshot of the entire web as an uncurated source of knowledge (see [Facebook Research SPHERE](https://arxiv.org/abs/2112.09924)).
  * Several papers and even our experiments show a general effectiveness of **dense retrieval** in retrieving textual passages for evaluating the user statement. However, there may be cases in which the most useful textual passages for fact checking do not emerge from the simple semantic similarity with the statement to be verified.
  * **no organic evaluation** was performed, but only manual experiments.

While keeping this simple approach, some **improvements** could be made:
* For reasons of simplicity and infrastructural limitations, the retrieval uses only a very small portion of the Wikipedia data (artists pages from the [List of mainstream rock performers](https://en.wikipedia.org/wiki/List_of_mainstream_rock_performers)). With these few data available, in many cases the knowledge base remains neutral even with respect to statements about rock albums/songs. Certainly, fact checking **quality could improve by expanding the knowledge base** and possibly extending it to the entire Wikipedia.
* Both the retriever model and the Natural Language Inference model are general purpose models and have not been fine-tuned for our domain. Undoubtedly they can **show better performance if fine-tuned in the rock music domain**. Particularly, the retriever model might be adapted with low effort, using [Generative Pseudo Labelling](https://haystack.deepset.ai/guides/gpl).

### Repository structure
* [Rock_fact_checker.py](Rock_fact_checker.py) and [pages folder](./pages/): multi-page Streamlit web app
* [app_utils folder](./app_utils/): python modules used in the web app
* [notebooks folder](./notebooks/): Jupyter/Colab notebooks to get Wikipedia data and index the text passages (using Haystack)
* [data folder](./data/): all necessary data, including original Wikipedia data, FAISS Index and prepared random statements

### Installation
πŸ’»
#### Entailment Checker node
If you want to build a similar system using the [`EntailmentChecker`](https://github.com/anakin87/haystack-entailment-checker), I strongly suggest taking a look at [the node repository](https://github.com/anakin87/haystack-entailment-checker). It can be easily installed with
```bash
pip install haystack-entailment-checker
```

#### Fact Checking 🎸 Rocks!
 To install this project locally, follow these steps:
* `git clone https://github.com/anakin87/fact-checking-rocks`
* `cd fact-checking-rocks`
* `pip install -r requirements.txt`

To run the web app, simply type: `streamlit run Rock_fact_checker.py`