anakin87 commited on
Commit
55e565f
Β·
1 Parent(s): 16cd190

extended readm

Browse files
Files changed (3) hide show
  1. README.md +41 -11
  2. data/statements.txt +12 -2
  3. requirements.txt +2 -1
README.md CHANGED
@@ -14,16 +14,24 @@ license: apache-2.0
14
 
15
  ## *Fact checking baseline combining dense retrieval and textual entailment*
16
 
17
- ### Idea πŸ’‘
18
- This project aims to show that a *naive and simple baseline* for fact checking can be built by combining dense retrieval and a textual entailment task (based on Natural Language Inference models).
 
 
 
 
 
 
 
 
19
  In a nutshell, the flow is as follows:
20
- * the users enters a factual statement
21
  * the relevant passages are retrieved from the knowledge base using dense retrieval
22
  * the system computes the text entailment between each relevant passage and the statement, using a Natural Language Inference model
23
  * the entailment scores are aggregated to produce a summary score.
24
 
25
- ### System description πŸͺ„
26
- This project is strongly based on [πŸ”Ž Haystack](https://github.com/deepset-ai/haystack), an open source NLP framework to realize search system. The main components of our system are an indexing pipeline and a search pipeline.
27
 
28
 
29
  #### Indexing pipeline
@@ -32,17 +40,39 @@ This project is strongly based on [πŸ”Ž Haystack](https://github.com/deepset-ai/
32
  * preprocess the downloaded documents into chunks consisting of 2 sentences
33
  * chunks with less than 10 words are discarded, because not very informative
34
  * instantiate a [FAISS](https://github.com/facebookresearch/faiss) Document store and store the passages on it
35
- * create embeddings for the passages, using a Sentence Transformer model and save them in FAISS. The retrieval task will involve [*asymmetric semantic search*](https://www.sbert.net/examples/applications/semantic-search/README.html#symmetric-vs-asymmetric-semantic-search) (statements to be verified are usually shorter than inherent passages), therefore I choose the model `msmarco-distilbert-base-tas-b`.
36
- * save FAISS index
37
 
38
  #### Search pipeline
39
 
40
  * the user enters a factual statement
41
- * compute the embedding of the user statement using the same Sentence Transformer (`msmarco-distilbert-base-tas-b`)
42
  * retrieve the K most relevant text passages stored in FAISS (along with their relevance scores)
43
- * **text entailment task**: compute the text entailment between each text passage (premise) and the user statement (hypotesis), using a Natural Language Inference model (`microsoft/deberta-v2-xlarge-mnli`). For every text passage, we have 3 scores (summing to 1): entailment, contradiction, neutral. *(For this task, I developed a custom Haystack node: `EntailmentChecker`)*
44
  * aggregate the text entailment scores: compute the weighted average of them, where the weight is the relevance score. **Now it is possible to tell if the knowledge base confirms, is neutral or disproves the user statement.**
45
- * *empirical consideration: if in the first N documents (N<K), there is a strong evidence of entailment/contradiction (partial aggregate scores > 0.5), it is better not to consider less relevant documents*
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
- ### Limits and possible improvements ✨
 
 
 
 
48
 
 
 
14
 
15
  ## *Fact checking baseline combining dense retrieval and textual entailment*
16
 
17
+ - [Idea πŸ’‘](#idea)
18
+ - [System description πŸͺ„](#system-description)
19
+ - [Indexing pipeline](#indexing-pipeline)
20
+ - [Search pipeline](#search-pipeline)
21
+ - [Limits and possible improvements ✨](#limits-and-possible-improvements)
22
+ - [Repository structure πŸ“](#repository-structure)
23
+ - [Installation πŸ’»](#installation)
24
+
25
+ ### Idea
26
+ πŸ’‘ This project aims to show that a *naive and simple baseline* for fact checking can be built by combining dense retrieval and a textual entailment task.
27
  In a nutshell, the flow is as follows:
28
+ * the user enters a factual statement
29
  * the relevant passages are retrieved from the knowledge base using dense retrieval
30
  * the system computes the text entailment between each relevant passage and the statement, using a Natural Language Inference model
31
  * the entailment scores are aggregated to produce a summary score.
32
 
33
+ ### System description
34
+ πŸͺ„ This project is strongly based on [πŸ”Ž Haystack](https://github.com/deepset-ai/haystack), an open source NLP framework to realize search system. The main components of our system are an indexing pipeline and a search pipeline.
35
 
36
 
37
  #### Indexing pipeline
 
40
  * preprocess the downloaded documents into chunks consisting of 2 sentences
41
  * chunks with less than 10 words are discarded, because not very informative
42
  * instantiate a [FAISS](https://github.com/facebookresearch/faiss) Document store and store the passages on it
43
+ * create embeddings for the passages, using a Sentence Transformer model and save them in FAISS. The retrieval task will involve [*asymmetric semantic search*](https://www.sbert.net/examples/applications/semantic-search/README.html#symmetric-vs-asymmetric-semantic-search) (statements to be verified are usually shorter than inherent passages), therefore I choose the model `msmarco-distilbert-base-tas-b`
44
+ * save FAISS index.
45
 
46
  #### Search pipeline
47
 
48
  * the user enters a factual statement
49
+ * compute the embedding of the user statement using the same Sentence Transformer used for indexing (`msmarco-distilbert-base-tas-b`)
50
  * retrieve the K most relevant text passages stored in FAISS (along with their relevance scores)
51
+ * **text entailment task**: compute the text entailment between each text passage (premise) and the user statement (hypotesis), using a Natural Language Inference model (`microsoft/deberta-v2-xlarge-mnli`). For every text passage, we have 3 scores (summing to 1): entailment, contradiction and neutral. *(For this task, I developed a custom Haystack node: `EntailmentChecker`)*
52
  * aggregate the text entailment scores: compute the weighted average of them, where the weight is the relevance score. **Now it is possible to tell if the knowledge base confirms, is neutral or disproves the user statement.**
53
+ * *empirical consideration: if in the first N passages (N<K), there is strong evidence of entailment/contradiction (partial aggregate scores > 0.5), it is better not to consider (K-N) less relevant documents.*
54
+
55
+ ### Limits and possible improvements
56
+ ✨ As mentioned, the current approach to fact checking is simple and naive. Some **structural limits of this approach**:
57
+ * there is **no statement detection**. In fact, the statement to be verified is chosen by the user. In real-world applications, this step is often necessary.
58
+ * **Wikipedia is taken as a source of truth**. Unfortunately, Wikipedia does not contain universal knowledge and there is no real guarantee that it is a source of truth. There are certainly very interesting approaches that view a snapshot of the entire web as an uncurated source of knowledge (see [Facebook Research SPHERE](https://arxiv.org/abs/2112.09924)).
59
+ * Although several articles and even our experiments show a generic efficacy of **dense retrieval** in recovering the textual passages for the evaluation of the user statement, there could undoubtedly exist cases in which the most useful textual passages for fact checking do not emerge from the simple semantic similarity with the statement to be verified.
60
+ * **no organic evaluation** was performed, but only manual experiments.
61
+
62
+ While keeping this simple approach, some **improvements** could be made:
63
+ * For reasons of simplicity and infrastructural limitations, the retrieval uses only a very small portion of the Wikipedia data (artists page from the [List of mainstream rock performers](https://en.wikipedia.org/wiki/List_of_mainstream_rock_performers)). With these few data available, in many cases the knowledge base remains neutral even with respect to statements about rock albums/songs. Certainly, fact checking **quality could improve by expanding the knowledge base** and possibly extending it to the entire Wikipedia.
64
+ * Both the retriever model and the Natural Language Inference model are general purpose models and have not been fine-tuned for our domain. Undoubtedly they can **show better performance if fine-tuned in the rock music domain**. Particularly, the retriever model might be adapted with low effort, using [Generative Pseudo Labelling](https://haystack.deepset.ai/guides/gpl).
65
+
66
+ ### Repository structure
67
+ * [Rock_fact_checker.py](Rock_fact_checker.py) and [pages folder](./pages/): multi-page Streamlit web app
68
+ * [app_utils folder](./app_utils/): python modules used in the web app
69
+ * [notebooks folder](./notebooks/): Jupyter/Colab notebooks to get Wikipedia data and index the text passages (using Haystack)
70
+ * [data folder](./data/): all necessary data, including original Wikipedia data, FAISS Index and prepared random statements
71
 
72
+ ### Installation
73
+ πŸ’» To install this project locally, follow these steps:
74
+ * `git clone https://github.com/anakin87/fact-checking-rocks`
75
+ * `cd fact-checking-rocks`
76
+ * `pip install -r requirements.txt`
77
 
78
+ To run the web app, simply type: `streamlit run Rock_fact_checker.py`
data/statements.txt CHANGED
@@ -18,7 +18,7 @@ The White Stripes were a trio
18
  The White Stripes were composed by Jack White and Meg White
19
  Scorpions is a German trap band
20
  Sepultura is a heavy metal band
21
- System of a down is a Italian band
22
  The Cure is a pop band
23
  Mick Jagger loves pasta
24
  Ozzy Osbourne was part of the Black Sabbath
@@ -46,4 +46,14 @@ Cannibal Corpse is a pop punk band
46
  Slipknot wear masks
47
  Toto have sold many records
48
  The verve were a British band
49
- Psychokiller is a hit by Talking Heads
 
 
 
 
 
 
 
 
 
 
 
18
  The White Stripes were composed by Jack White and Meg White
19
  Scorpions is a German trap band
20
  Sepultura is a heavy metal band
21
+ System of a down is an Italian band
22
  The Cure is a pop band
23
  Mick Jagger loves pasta
24
  Ozzy Osbourne was part of the Black Sabbath
 
46
  Slipknot wear masks
47
  Toto have sold many records
48
  The verve were a British band
49
+ Psychokiller is a hit by Talking Heads
50
+ Charles Manson had been involved with Beach Boys
51
+ Toxicity is a song by System of a down
52
+ Tracy Chapman released her debut album in 1991
53
+ Zombie is a 1994 song by the Cranberries
54
+ Stratocaster is a famous guitar
55
+ Some Slayer songs involve murder
56
+ Incubus formation includes a DJ
57
+ Blur is a nu metal band
58
+ Keith Emerson committed suicide
59
+ Bono Vox loves pets
requirements.txt CHANGED
@@ -1,2 +1,3 @@
1
  farm-haystack[faiss]==1.7.1
2
- plotly==5.10.0
 
 
1
  farm-haystack[faiss]==1.7.1
2
+ plotly==5.10.0
3
+ streamlit==1.12.0