Spaces:

anakin87
/

fact-checking-rocks

Running

App Files Files Community

anakin87 commited on Aug 28, 2022

Commit

7a393a7

1 Parent(s): a147158

improve readme

Browse files

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -56,11 +56,11 @@ In a nutshell, the flow is as follows:
  ✨ As mentioned, the current approach to fact checking is simple and naive. Some **structural limits of this approach**:
   * there is **no statement detection**. In fact, the statement to be verified is chosen by the user. In real-world applications, this step is often necessary.
   * **Wikipedia is taken as a source of truth**. Unfortunately, Wikipedia does not contain universal knowledge and there is no real guarantee that it is a source of truth. There are certainly very interesting approaches that view a snapshot of the entire web as an uncurated source of knowledge (see [Facebook Research SPHERE](https://arxiv.org/abs/2112.09924)).
-  * Although several articles and even our experiments show a generic efficacy of **dense retrieval** in recovering the textual passages for the evaluation of the user statement, there could undoubtedly exist cases in which the most useful textual passages for fact checking do not emerge from the simple semantic similarity with the statement to be verified.
   * **no organic evaluation** was performed, but only manual experiments.
 While keeping this simple approach, some **improvements** could be made:
-* For reasons of simplicity and infrastructural limitations, the retrieval uses only a very small portion of the Wikipedia data (artists page from the [List of mainstream rock performers](https://en.wikipedia.org/wiki/List_of_mainstream_rock_performers)). With these few data available, in many cases the knowledge base remains neutral even with respect to statements about rock albums/songs. Certainly, fact checking **quality could improve by expanding the knowledge base** and possibly extending it to the entire Wikipedia.
 * Both the retriever model and the Natural Language Inference model are general purpose models and have not been fine-tuned for our domain. Undoubtedly they can **show better performance if fine-tuned in the rock music domain**. Particularly, the retriever model might be adapted with low effort, using [Generative Pseudo Labelling](https://haystack.deepset.ai/guides/gpl).
 ### Repository structure

  ✨ As mentioned, the current approach to fact checking is simple and naive. Some **structural limits of this approach**:
   * there is **no statement detection**. In fact, the statement to be verified is chosen by the user. In real-world applications, this step is often necessary.
   * **Wikipedia is taken as a source of truth**. Unfortunately, Wikipedia does not contain universal knowledge and there is no real guarantee that it is a source of truth. There are certainly very interesting approaches that view a snapshot of the entire web as an uncurated source of knowledge (see [Facebook Research SPHERE](https://arxiv.org/abs/2112.09924)).
+  * Several papers and even our experiments show a general effectiveness of **dense retrieval** in retrieving textual passages for evaluating the user statement. However, there may be cases in which the most useful textual passages for fact checking do not emerge from the simple semantic similarity with the statement to be verified.
   * **no organic evaluation** was performed, but only manual experiments.
 While keeping this simple approach, some **improvements** could be made:
+* For reasons of simplicity and infrastructural limitations, the retrieval uses only a very small portion of the Wikipedia data (artists pags from the [List of mainstream rock performers](https://en.wikipedia.org/wiki/List_of_mainstream_rock_performers)). With these few data available, in many cases the knowledge base remains neutral even with respect to statements about rock albums/songs. Certainly, fact checking **quality could improve by expanding the knowledge base** and possibly extending it to the entire Wikipedia.
 * Both the retriever model and the Natural Language Inference model are general purpose models and have not been fine-tuned for our domain. Undoubtedly they can **show better performance if fine-tuned in the rock music domain**. Particularly, the retriever model might be adapted with low effort, using [Generative Pseudo Labelling](https://haystack.deepset.ai/guides/gpl).
 ### Repository structure