Stefano Fiorucci commited on
Commit
b89fac0
β€’
1 Parent(s): 522dfc6

big improvement in documentation

Browse files
README.md CHANGED
@@ -11,11 +11,19 @@ license: Apache-2.0
11
  ---
12
 
13
  # Who killed Laura Palmer?   [![Generic badge](https://img.shields.io/badge/πŸ€—-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer) [![Generic badge](https://img.shields.io/github/stars/anakin87/who-killed-laura-palmer?label=Github&style=social)](https://github.com/anakin87/who-killed-laura-palmer)
 
 
 
 
14
 
15
  ## πŸ—»πŸ—» Twin Peaks Question Answering system
16
 
17
  WKLP is a simple Question Answering system, based on data crawled from [Twin Peaks Wiki](https://twinpeaks.fandom.com/wiki/Twin_Peaks_Wiki). It is built using [πŸ” Haystack](https://github.com/deepset-ai/haystack), an awesome open-source framework for building search systems that work intelligently over large document collections.
18
 
 
 
 
 
19
  ---
20
 
21
  ## Project architecture 🧱
@@ -35,6 +43,8 @@ WKLP is a simple Question Answering system, based on data crawled from [Twin Pea
35
  - How to build a nice [Streamlit](https://github.com/streamlit/streamlit) web app to show your QA system
36
  - How to optimize the web app to πŸš€ deploy in [πŸ€— Spaces](https://huggingface.co/spaces)
37
 
 
 
38
  ## Repository structure πŸ“
39
  - [app.py](./app.py): Streamlit web app
40
  - [app_utils folder](./app_utils/): python modules used in the web app
@@ -46,7 +56,7 @@ Within each folder, you can find more in-depth explanations.
46
 
47
  ## Possible improvements ✨
48
  - The reader model (`deepset/roberta-base-squad2`) is a good compromise between speed and accuracy, running on CPU. There are certainly better (and more computationally expensive) models, as you can read in the [Haystack documentation](https://haystack.deepset.ai/pipeline_nodes/reader).
49
- - You can also think about preparing a Twin Peaks QA dataset and fine-tune the reader model to get better accuracy, as explained in [Haystack tutorial](https://haystack.deepset.ai/tutorials/fine-tuning-a-model).
50
  - ...
51
 
52
 
11
  ---
12
 
13
  # Who killed Laura Palmer?   [![Generic badge](https://img.shields.io/badge/πŸ€—-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer) [![Generic badge](https://img.shields.io/github/stars/anakin87/who-killed-laura-palmer?label=Github&style=social)](https://github.com/anakin87/who-killed-laura-palmer)
14
+ [<img src="./data/readme_images/spaces_logo.png" style="display: block;margin-left: auto;
15
+ margin-right: auto; max-width: 70%;}">](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer)
16
+
17
+
18
 
19
  ## πŸ—»πŸ—» Twin Peaks Question Answering system
20
 
21
  WKLP is a simple Question Answering system, based on data crawled from [Twin Peaks Wiki](https://twinpeaks.fandom.com/wiki/Twin_Peaks_Wiki). It is built using [πŸ” Haystack](https://github.com/deepset-ai/haystack), an awesome open-source framework for building search systems that work intelligently over large document collections.
22
 
23
+ - [Project architecture 🧱](#project-architecture-)
24
+ - [What can I learn from this project? πŸ“š](#what-can-i-learn-from-this-project-)
25
+ - [Repository structure πŸ“](#repository-structure-)
26
+ - [Possible improvements ✨](#possible-improvements-)
27
  ---
28
 
29
  ## Project architecture 🧱
43
  - How to build a nice [Streamlit](https://github.com/streamlit/streamlit) web app to show your QA system
44
  - How to optimize the web app to πŸš€ deploy in [πŸ€— Spaces](https://huggingface.co/spaces)
45
 
46
+ ![Web app preview](./data/readme_images/webapp.png)
47
+
48
  ## Repository structure πŸ“
49
  - [app.py](./app.py): Streamlit web app
50
  - [app_utils folder](./app_utils/): python modules used in the web app
56
 
57
  ## Possible improvements ✨
58
  - The reader model (`deepset/roberta-base-squad2`) is a good compromise between speed and accuracy, running on CPU. There are certainly better (and more computationally expensive) models, as you can read in the [Haystack documentation](https://haystack.deepset.ai/pipeline_nodes/reader).
59
+ - You can also think about preparing a Twin Peaks QA dataset and fine-tune the reader model to get better accuracy, as explained in this [Haystack tutorial](https://haystack.deepset.ai/tutorials/fine-tuning-a-model).
60
  - ...
61
 
62
 
app_utils/README.md ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
1
+ # App utils 🧰
2
+ Python modules used in the [web app](../app.py).
3
+
4
+ - [backend_utils.py](./backend_utils.py): backend functions to load the pipeline, answer a question and load random questions; *appropriate Streamlit caching*.
5
+
6
+ - [frontend_utils.py](./frontend_utils.py): functions to manage the Streamlit web app appearance.
7
+
8
+ - βš™οΈ [config.py](./config.py): configurations, including score thresholds to accept answers and Hugging Face model names
data/README.md ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
1
+ # Data πŸ“’πŸ“„πŸ“„
2
+ All necessary data.
3
+
4
+ - [input_docs](./input_docs/): JSON documents downloaded from [Twin Peaks wiki](https://twinpeaks.fandom.com/wiki/Twin_Peaks_Wiki) by the [crawler](../crawler/). Input for our Question Answering system.
5
+
6
+ - [questions](./questions/): automatically generated questions (in [Question generation notebook](../notebooks/question_generation.ipynb)) and manually selected questions (used in the web app).
7
+
8
+ - [index](./index/): files related to FAISS index created in [Indexing and pipeline creation notebook](../notebooks/indexing_and_pipeline_creation.ipynb). The index is used in the web app.
9
+
10
+ - [readme_images](./readme_images/): images used in documentation.
data/readme_images/spaces_logo.png ADDED
Binary file (69.8 kB). View file
data/readme_images/webapp.png ADDED
Binary file (322 kB). View file
notebooks/README.md ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸ““ Notebooks
2
+ Jupyter/Colab notebooks to create the Search pipeline and generate questions, using [ πŸ” Haystack](https://github.com/deepset-ai/haystack).
3
+
4
+ ## [Indexing and pipeline creation](./indexing_and_pipeline_creation.ipynb)
5
+
6
+ This notebook is inspired by ["Build Your First QA System" tutorial](https://haystack.deepset.ai/tutorials/first-qa-system), from Haystack documentation.
7
+
8
+ Here we use a collection of articles about Twin Peaks to answer a variety of questions about that awesome TV series!
9
+
10
+ The following steps are performed:
11
+ - load and preprocess data
12
+ - create (FAISS) document store and write documents
13
+ - initialize retriever and generate document embeddings
14
+ - initialize reader
15
+ - compose and try Question Answering pipeline
16
+ - save and export (FAISS) index
17
+
18
+ ## [Question generation](./question_generation.ipynb)
19
+
20
+ This notebook is inspired by [Question Generation tutorial](https://haystack.deepset.ai/tutorials/question-generation), from Haystack documentation.
21
+
22
+ Here we use a collection of articles about Twin Peaks to generate a variety of questions about that awesome TV series!
23
+
24
+ The following steps are performed:
25
+
26
+ - load data
27
+ - create document store and write documents
28
+ - generate questions and save them
29
+
30
+