Spaces:
Sleeping
Sleeping
Stefano Fiorucci
commited on
Commit
β’
a8158b1
1
Parent(s):
261cff9
big improvement in documentation
Browse files- README.md +11 -1
- app_utils/README.md +8 -0
- data/README.md +10 -0
- data/readme_images/spaces_logo.png +0 -0
- data/readme_images/webapp.png +0 -0
- notebooks/README.md +30 -0
README.md
CHANGED
@@ -11,11 +11,19 @@ license: Apache-2.0
|
|
11 |
---
|
12 |
|
13 |
# Who killed Laura Palmer? [![Generic badge](https://img.shields.io/badge/π€-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer) [![Generic badge](https://img.shields.io/github/stars/anakin87/who-killed-laura-palmer?label=Github&style=social)](https://github.com/anakin87/who-killed-laura-palmer)
|
|
|
|
|
|
|
|
|
14 |
|
15 |
## π»π» Twin Peaks Question Answering system
|
16 |
|
17 |
WKLP is a simple Question Answering system, based on data crawled from [Twin Peaks Wiki](https://twinpeaks.fandom.com/wiki/Twin_Peaks_Wiki). It is built using [π Haystack](https://github.com/deepset-ai/haystack), an awesome open-source framework for building search systems that work intelligently over large document collections.
|
18 |
|
|
|
|
|
|
|
|
|
19 |
---
|
20 |
|
21 |
## Project architecture π§±
|
@@ -35,6 +43,8 @@ WKLP is a simple Question Answering system, based on data crawled from [Twin Pea
|
|
35 |
- How to build a nice [Streamlit](https://github.com/streamlit/streamlit) web app to show your QA system
|
36 |
- How to optimize the web app to π deploy in [π€ Spaces](https://huggingface.co/spaces)
|
37 |
|
|
|
|
|
38 |
## Repository structure π
|
39 |
- [app.py](./app.py): Streamlit web app
|
40 |
- [app_utils folder](./app_utils/): python modules used in the web app
|
@@ -46,7 +56,7 @@ Within each folder, you can find more in-depth explanations.
|
|
46 |
|
47 |
## Possible improvements β¨
|
48 |
- The reader model (`deepset/roberta-base-squad2`) is a good compromise between speed and accuracy, running on CPU. There are certainly better (and more computationally expensive) models, as you can read in the [Haystack documentation](https://haystack.deepset.ai/pipeline_nodes/reader).
|
49 |
-
- You can also think about preparing a Twin Peaks QA dataset and fine-tune the reader model to get better accuracy, as explained in [Haystack tutorial](https://haystack.deepset.ai/tutorials/fine-tuning-a-model).
|
50 |
- ...
|
51 |
|
52 |
|
|
|
11 |
---
|
12 |
|
13 |
# Who killed Laura Palmer? [![Generic badge](https://img.shields.io/badge/π€-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer) [![Generic badge](https://img.shields.io/github/stars/anakin87/who-killed-laura-palmer?label=Github&style=social)](https://github.com/anakin87/who-killed-laura-palmer)
|
14 |
+
[<img src="./data/readme_images/spaces_logo.png" style="display: block;margin-left: auto;
|
15 |
+
margin-right: auto; max-width: 70%;}">](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer)
|
16 |
+
|
17 |
+
|
18 |
|
19 |
## π»π» Twin Peaks Question Answering system
|
20 |
|
21 |
WKLP is a simple Question Answering system, based on data crawled from [Twin Peaks Wiki](https://twinpeaks.fandom.com/wiki/Twin_Peaks_Wiki). It is built using [π Haystack](https://github.com/deepset-ai/haystack), an awesome open-source framework for building search systems that work intelligently over large document collections.
|
22 |
|
23 |
+
- [Project architecture π§±](#project-architecture-)
|
24 |
+
- [What can I learn from this project? π](#what-can-i-learn-from-this-project-)
|
25 |
+
- [Repository structure π](#repository-structure-)
|
26 |
+
- [Possible improvements β¨](#possible-improvements-)
|
27 |
---
|
28 |
|
29 |
## Project architecture π§±
|
|
|
43 |
- How to build a nice [Streamlit](https://github.com/streamlit/streamlit) web app to show your QA system
|
44 |
- How to optimize the web app to π deploy in [π€ Spaces](https://huggingface.co/spaces)
|
45 |
|
46 |
+
![Web app preview](./data/readme_images/webapp.png)
|
47 |
+
|
48 |
## Repository structure π
|
49 |
- [app.py](./app.py): Streamlit web app
|
50 |
- [app_utils folder](./app_utils/): python modules used in the web app
|
|
|
56 |
|
57 |
## Possible improvements β¨
|
58 |
- The reader model (`deepset/roberta-base-squad2`) is a good compromise between speed and accuracy, running on CPU. There are certainly better (and more computationally expensive) models, as you can read in the [Haystack documentation](https://haystack.deepset.ai/pipeline_nodes/reader).
|
59 |
+
- You can also think about preparing a Twin Peaks QA dataset and fine-tune the reader model to get better accuracy, as explained in this [Haystack tutorial](https://haystack.deepset.ai/tutorials/fine-tuning-a-model).
|
60 |
- ...
|
61 |
|
62 |
|
app_utils/README.md
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# App utils π§°
|
2 |
+
Python modules used in the [web app](../app.py).
|
3 |
+
|
4 |
+
- [backend_utils.py](./backend_utils.py): backend functions to load the pipeline, answer a question and load random questions; *appropriate Streamlit caching*.
|
5 |
+
|
6 |
+
- [frontend_utils.py](./frontend_utils.py): functions to manage the Streamlit web app appearance.
|
7 |
+
|
8 |
+
- βοΈ [config.py](./config.py): configurations, including score thresholds to accept answers and Hugging Face model names
|
data/README.md
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Data πππ
|
2 |
+
All necessary data.
|
3 |
+
|
4 |
+
- [input_docs](./input_docs/): JSON documents downloaded from [Twin Peaks wiki](https://twinpeaks.fandom.com/wiki/Twin_Peaks_Wiki) by the [crawler](../crawler/). Input for our Question Answering system.
|
5 |
+
|
6 |
+
- [questions](./questions/): automatically generated questions (in [Question generation notebook](../notebooks/question_generation.ipynb)) and manually selected questions (used in the web app).
|
7 |
+
|
8 |
+
- [index](./index/): files related to FAISS index created in [Indexing and pipeline creation notebook](../notebooks/indexing_and_pipeline_creation.ipynb). The index is used in the web app.
|
9 |
+
|
10 |
+
- [readme_images](./readme_images/): images used in documentation.
|
data/readme_images/spaces_logo.png
ADDED
data/readme_images/webapp.png
ADDED
notebooks/README.md
ADDED
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# π Notebooks
|
2 |
+
Jupyter/Colab notebooks to create the Search pipeline and generate questions, using [ π Haystack](https://github.com/deepset-ai/haystack).
|
3 |
+
|
4 |
+
## [Indexing and pipeline creation](./indexing_and_pipeline_creation.ipynb)
|
5 |
+
|
6 |
+
This notebook is inspired by ["Build Your First QA System" tutorial](https://haystack.deepset.ai/tutorials/first-qa-system), from Haystack documentation.
|
7 |
+
|
8 |
+
Here we use a collection of articles about Twin Peaks to answer a variety of questions about that awesome TV series!
|
9 |
+
|
10 |
+
The following steps are performed:
|
11 |
+
- load and preprocess data
|
12 |
+
- create (FAISS) document store and write documents
|
13 |
+
- initialize retriever and generate document embeddings
|
14 |
+
- initialize reader
|
15 |
+
- compose and try Question Answering pipeline
|
16 |
+
- save and export (FAISS) index
|
17 |
+
|
18 |
+
## [Question generation](./question_generation.ipynb)
|
19 |
+
|
20 |
+
This notebook is inspired by [Question Generation tutorial](https://haystack.deepset.ai/tutorials/question-generation), from Haystack documentation.
|
21 |
+
|
22 |
+
Here we use a collection of articles about Twin Peaks to generate a variety of questions about that awesome TV series!
|
23 |
+
|
24 |
+
The following steps are performed:
|
25 |
+
|
26 |
+
- load data
|
27 |
+
- create document store and write documents
|
28 |
+
- generate questions and save them
|
29 |
+
|
30 |
+
|