Spaces:
Running
Running
extract entailment_checker
Browse files- README.md +14 -3
- app_utils/backend_utils.py +1 -1
- app_utils/entailment_checker.py +0 -126
- requirements.txt +2 -1
README.md
CHANGED
@@ -27,6 +27,8 @@ license: apache-2.0
|
|
27 |
- [Limits and possible improvements](#limits-and-possible-improvements)
|
28 |
- [Repository structure](#repository-structure)
|
29 |
- [Installation](#installation)
|
|
|
|
|
30 |
|
31 |
### Idea
|
32 |
π‘ This project aims to show that a *naive and simple baseline* for fact checking can be built by combining dense retrieval and a textual entailment task.
|
@@ -42,7 +44,7 @@ In a nutshell, the flow is as follows:
|
|
42 |
- [π§βπ« Slides](./presentation/fact_checking_rocks.pdf)
|
43 |
|
44 |
### System description
|
45 |
-
πͺ This project is strongly based on [π Haystack](https://github.com/deepset-ai/haystack), an open source NLP framework to
|
46 |
|
47 |
#### Indexing pipeline
|
48 |
* [Crawling](https://github.com/anakin87/fact-checking-rocks/blob/321ba7893bbe79582f8c052493acfda497c5b785/notebooks/get_wikipedia_data.ipynb): Crawl data from Wikipedia, starting from the page [List of mainstream rock performers](https://en.wikipedia.org/wiki/List_of_mainstream_rock_performers) and using the [python wrapper](https://github.com/goldsmith/Wikipedia)
|
@@ -58,7 +60,8 @@ In a nutshell, the flow is as follows:
|
|
58 |
* the user enters a factual statement
|
59 |
* compute the embedding of the user statement using the same Sentence Transformer used for indexing (`msmarco-distilbert-base-tas-b`)
|
60 |
* retrieve the K most relevant text passages stored in FAISS (along with their relevance scores)
|
61 |
-
*
|
|
|
62 |
* aggregate the text entailment scores: compute the weighted average of them, where the weight is the relevance score. **Now it is possible to tell if the knowledge base confirms, is neutral or disproves the user statement.**
|
63 |
* *empirical consideration: if in the first N passages (N<K), there is strong evidence of entailment/contradiction (partial aggregate scores > 0.5), it is better not to consider (K-N) less relevant documents.*
|
64 |
|
@@ -83,7 +86,15 @@ While keeping this simple approach, some **improvements** could be made:
|
|
83 |
* [data folder](./data/): all necessary data, including original Wikipedia data, FAISS Index and prepared random statements
|
84 |
|
85 |
### Installation
|
86 |
-
π»
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
87 |
* `git clone https://github.com/anakin87/fact-checking-rocks`
|
88 |
* `cd fact-checking-rocks`
|
89 |
* `pip install -r requirements.txt`
|
|
|
27 |
- [Limits and possible improvements](#limits-and-possible-improvements)
|
28 |
- [Repository structure](#repository-structure)
|
29 |
- [Installation](#installation)
|
30 |
+
- [Entailment Checker node](#entailment-checker-node)
|
31 |
+
- [Fact Checking πΈ Rocks!](#fact-checking--rocks)
|
32 |
|
33 |
### Idea
|
34 |
π‘ This project aims to show that a *naive and simple baseline* for fact checking can be built by combining dense retrieval and a textual entailment task.
|
|
|
44 |
- [π§βπ« Slides](./presentation/fact_checking_rocks.pdf)
|
45 |
|
46 |
### System description
|
47 |
+
πͺ This project is strongly based on [π Haystack](https://github.com/deepset-ai/haystack), an open source NLP framework that enables seamless use of Transformer models and LLMs to interact with your data. The main components of our system are an indexing pipeline and a search pipeline.
|
48 |
|
49 |
#### Indexing pipeline
|
50 |
* [Crawling](https://github.com/anakin87/fact-checking-rocks/blob/321ba7893bbe79582f8c052493acfda497c5b785/notebooks/get_wikipedia_data.ipynb): Crawl data from Wikipedia, starting from the page [List of mainstream rock performers](https://en.wikipedia.org/wiki/List_of_mainstream_rock_performers) and using the [python wrapper](https://github.com/goldsmith/Wikipedia)
|
|
|
60 |
* the user enters a factual statement
|
61 |
* compute the embedding of the user statement using the same Sentence Transformer used for indexing (`msmarco-distilbert-base-tas-b`)
|
62 |
* retrieve the K most relevant text passages stored in FAISS (along with their relevance scores)
|
63 |
+
* the following steps are performed using the [`EntailmentChecker`, a custom Haystack node](https://github.com/anakin87/haystack-entailment-checker)
|
64 |
+
* **text entailment task**: compute the text entailment between each text passage (premise) and the user statement (hypothesis), using a Natural Language Inference model (`microsoft/deberta-v2-xlarge-mnli`). For every text passage, we have 3 scores (summing to 1): entailment, contradiction and neutral.
|
65 |
* aggregate the text entailment scores: compute the weighted average of them, where the weight is the relevance score. **Now it is possible to tell if the knowledge base confirms, is neutral or disproves the user statement.**
|
66 |
* *empirical consideration: if in the first N passages (N<K), there is strong evidence of entailment/contradiction (partial aggregate scores > 0.5), it is better not to consider (K-N) less relevant documents.*
|
67 |
|
|
|
86 |
* [data folder](./data/): all necessary data, including original Wikipedia data, FAISS Index and prepared random statements
|
87 |
|
88 |
### Installation
|
89 |
+
π»
|
90 |
+
#### Entailment Checker node
|
91 |
+
If you want to build a similar system using the [`EntailmentChecker`](https://github.com/anakin87/haystack-entailment-checker), I strongly suggest taking a look at [the node repository](https://github.com/anakin87/haystack-entailment-checker). It can be easily installed with
|
92 |
+
```bash
|
93 |
+
pip install haystack-entailment-checker
|
94 |
+
```
|
95 |
+
|
96 |
+
#### Fact Checking πΈ Rocks!
|
97 |
+
To install this project locally, follow these steps:
|
98 |
* `git clone https://github.com/anakin87/fact-checking-rocks`
|
99 |
* `cd fact-checking-rocks`
|
100 |
* `pip install -r requirements.txt`
|
app_utils/backend_utils.py
CHANGED
@@ -7,7 +7,7 @@ from haystack.nodes import EmbeddingRetriever, PromptNode
|
|
7 |
from haystack.pipelines import Pipeline
|
8 |
import streamlit as st
|
9 |
|
10 |
-
from
|
11 |
from app_utils.config import (
|
12 |
STATEMENTS_PATH,
|
13 |
INDEX_DIR,
|
|
|
7 |
from haystack.pipelines import Pipeline
|
8 |
import streamlit as st
|
9 |
|
10 |
+
from haystack_entailment_checker import EntailmentChecker
|
11 |
from app_utils.config import (
|
12 |
STATEMENTS_PATH,
|
13 |
INDEX_DIR,
|
app_utils/entailment_checker.py
DELETED
@@ -1,126 +0,0 @@
|
|
1 |
-
from typing import List, Optional
|
2 |
-
|
3 |
-
from transformers import AutoModelForSequenceClassification, AutoTokenizer, AutoConfig
|
4 |
-
import torch
|
5 |
-
from haystack.nodes.base import BaseComponent
|
6 |
-
from haystack.modeling.utils import initialize_device_settings
|
7 |
-
from haystack.schema import Document
|
8 |
-
|
9 |
-
|
10 |
-
class EntailmentChecker(BaseComponent):
|
11 |
-
"""
|
12 |
-
This node checks the entailment between every document content and the query.
|
13 |
-
It enrichs the documents metadata with entailment informations.
|
14 |
-
It also returns aggregate entailment information.
|
15 |
-
"""
|
16 |
-
|
17 |
-
outgoing_edges = 1
|
18 |
-
|
19 |
-
def __init__(
|
20 |
-
self,
|
21 |
-
model_name_or_path: str = "roberta-large-mnli",
|
22 |
-
model_version: Optional[str] = None,
|
23 |
-
tokenizer: Optional[str] = None,
|
24 |
-
use_gpu: bool = True,
|
25 |
-
batch_size: int = 16,
|
26 |
-
entailment_contradiction_threshold: float = 0.5,
|
27 |
-
):
|
28 |
-
"""
|
29 |
-
Load a Natural Language Inference model from Transformers.
|
30 |
-
|
31 |
-
:param model_name_or_path: Directory of a saved model or the name of a public model.
|
32 |
-
See https://huggingface.co/models for full list of available models.
|
33 |
-
:param model_version: The version of model to use from the HuggingFace model hub. Can be tag name, branch name, or commit hash.
|
34 |
-
:param tokenizer: Name of the tokenizer (usually the same as model)
|
35 |
-
:param use_gpu: Whether to use GPU (if available).
|
36 |
-
:param batch_size: Number of Documents to be processed at a time.
|
37 |
-
:param entailment_contradiction_threshold: if in the first N documents there is a strong evidence of entailment/contradiction
|
38 |
-
(aggregate entailment or contradiction are greater than the threshold), the less relevant documents are not taken into account
|
39 |
-
"""
|
40 |
-
super().__init__()
|
41 |
-
|
42 |
-
self.devices, _ = initialize_device_settings(use_cuda=use_gpu, multi_gpu=False)
|
43 |
-
|
44 |
-
tokenizer = tokenizer or model_name_or_path
|
45 |
-
self.tokenizer = AutoTokenizer.from_pretrained(tokenizer)
|
46 |
-
self.model = AutoModelForSequenceClassification.from_pretrained(
|
47 |
-
pretrained_model_name_or_path=model_name_or_path, revision=model_version
|
48 |
-
)
|
49 |
-
self.batch_size = batch_size
|
50 |
-
self.entailment_contradiction_threshold = entailment_contradiction_threshold
|
51 |
-
self.model.to(str(self.devices[0]))
|
52 |
-
|
53 |
-
id2label = AutoConfig.from_pretrained(model_name_or_path).id2label
|
54 |
-
self.labels = [id2label[k].lower() for k in sorted(id2label)]
|
55 |
-
if "entailment" not in self.labels:
|
56 |
-
raise ValueError(
|
57 |
-
"The model config must contain entailment value in the id2label dict."
|
58 |
-
)
|
59 |
-
|
60 |
-
def run(self, query: str, documents: List[Document]):
|
61 |
-
|
62 |
-
scores, agg_con, agg_neu, agg_ent = 0, 0, 0, 0
|
63 |
-
premise_batch = [doc.content for doc in documents]
|
64 |
-
hypotesis_batch = [query] * len(documents)
|
65 |
-
entailment_info_batch = self.get_entailment_batch(premise_batch=premise_batch, hypotesis_batch=hypotesis_batch)
|
66 |
-
for i, (doc, entailment_info) in enumerate(zip(documents, entailment_info_batch)):
|
67 |
-
doc.meta["entailment_info"] = entailment_info
|
68 |
-
|
69 |
-
scores += doc.score
|
70 |
-
con, neu, ent = (
|
71 |
-
entailment_info["contradiction"],
|
72 |
-
entailment_info["neutral"],
|
73 |
-
entailment_info["entailment"],
|
74 |
-
)
|
75 |
-
agg_con += con * doc.score
|
76 |
-
agg_neu += neu * doc.score
|
77 |
-
agg_ent += ent * doc.score
|
78 |
-
|
79 |
-
# if in the first documents there is a strong evidence of entailment/contradiction,
|
80 |
-
# there is no need to consider less relevant documents
|
81 |
-
if max(agg_con, agg_ent) / scores > self.entailment_contradiction_threshold:
|
82 |
-
break
|
83 |
-
|
84 |
-
aggregate_entailment_info = {
|
85 |
-
"contradiction": round(agg_con / scores, 2),
|
86 |
-
"neutral": round(agg_neu / scores, 2),
|
87 |
-
"entailment": round(agg_ent / scores, 2),
|
88 |
-
}
|
89 |
-
|
90 |
-
entailment_checker_result = {
|
91 |
-
"documents": documents[: i + 1],
|
92 |
-
"aggregate_entailment_info": aggregate_entailment_info,
|
93 |
-
}
|
94 |
-
|
95 |
-
return entailment_checker_result, "output_1"
|
96 |
-
|
97 |
-
def run_batch(self, queries: List[str], documents: List[Document]):
|
98 |
-
entailment_checker_result_batch = []
|
99 |
-
entailment_info_batch = self.get_entailment_batch(premise_batch=documents, hypotesis_batch=queries)
|
100 |
-
for doc, entailment_info in zip(documents, entailment_info_batch):
|
101 |
-
doc.meta["entailment_info"] = entailment_info
|
102 |
-
aggregate_entailment_info = {
|
103 |
-
"contradiction": round(entailment_info["contradiction"] / doc.score),
|
104 |
-
"neutral": round(entailment_info["neutral"] / doc.score),
|
105 |
-
"entailment": round(entailment_info["entailment"] / doc.score),
|
106 |
-
}
|
107 |
-
entailment_checker_result_batch.append({
|
108 |
-
"documents": [doc],
|
109 |
-
"aggregate_entailment_info": aggregate_entailment_info,
|
110 |
-
})
|
111 |
-
return entailment_checker_result_batch, "output_1"
|
112 |
-
|
113 |
-
|
114 |
-
def get_entailment_dict(self, probs):
|
115 |
-
entailment_dict = {k.lower(): v for k, v in zip(self.labels, probs)}
|
116 |
-
return entailment_dict
|
117 |
-
|
118 |
-
def get_entailment_batch(self, premise_batch: List[str], hypotesis_batch: List[str]):
|
119 |
-
formatted_texts = [f"{premise}{self.tokenizer.sep_token}{hypotesis}" for premise, hypotesis in zip(premise_batch, hypotesis_batch)]
|
120 |
-
with torch.inference_mode():
|
121 |
-
inputs = self.tokenizer(formatted_texts, return_tensors="pt", padding=True, truncation=True).to(self.devices[0])
|
122 |
-
out = self.model(**inputs)
|
123 |
-
logits = out.logits
|
124 |
-
probs_batch = (torch.nn.functional.softmax(logits, dim=-1).detach().cpu().numpy() )
|
125 |
-
return [self.get_entailment_dict(probs) for probs in probs_batch]
|
126 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
requirements.txt
CHANGED
@@ -1,4 +1,5 @@
|
|
1 |
-
farm-haystack[faiss]==1.
|
|
|
2 |
plotly==5.14.1
|
3 |
|
4 |
# commented to not interfere with streamlit SDK in HF spces
|
|
|
1 |
+
farm-haystack[faiss,inference]==1.18.1
|
2 |
+
haystack-entailment-checker
|
3 |
plotly==5.14.1
|
4 |
|
5 |
# commented to not interfere with streamlit SDK in HF spces
|