Stefano Fiorucci commited on
Commit
261cff9
β€’
1 Parent(s): 1bde059

Improved README

Browse files
Files changed (1) hide show
  1. README.md +21 -0
README.md CHANGED
@@ -11,6 +11,7 @@ license: Apache-2.0
11
  ---
12
 
13
  # Who killed Laura Palmer?   [![Generic badge](https://img.shields.io/badge/πŸ€—-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer) [![Generic badge](https://img.shields.io/github/stars/anakin87/who-killed-laura-palmer?label=Github&style=social)](https://github.com/anakin87/who-killed-laura-palmer)
 
14
  ## πŸ—»πŸ—» Twin Peaks Question Answering system
15
 
16
  WKLP is a simple Question Answering system, based on data crawled from [Twin Peaks Wiki](https://twinpeaks.fandom.com/wiki/Twin_Peaks_Wiki). It is built using [πŸ” Haystack](https://github.com/deepset-ai/haystack), an awesome open-source framework for building search systems that work intelligently over large document collections.
@@ -29,3 +30,23 @@ WKLP is a simple Question Answering system, based on data crawled from [Twin Pea
29
  ---
30
 
31
  ## What can I learn from this project? πŸ“š
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
  # Who killed Laura Palmer?   [![Generic badge](https://img.shields.io/badge/πŸ€—-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer) [![Generic badge](https://img.shields.io/github/stars/anakin87/who-killed-laura-palmer?label=Github&style=social)](https://github.com/anakin87/who-killed-laura-palmer)
14
+
15
  ## πŸ—»πŸ—» Twin Peaks Question Answering system
16
 
17
  WKLP is a simple Question Answering system, based on data crawled from [Twin Peaks Wiki](https://twinpeaks.fandom.com/wiki/Twin_Peaks_Wiki). It is built using [πŸ” Haystack](https://github.com/deepset-ai/haystack), an awesome open-source framework for building search systems that work intelligently over large document collections.
30
  ---
31
 
32
  ## What can I learn from this project? πŸ“š
33
+ - How to quickly ⌚ build a modern Question Answering system using [πŸ” Haystack](https://github.com/deepset-ai/haystack)
34
+ - How to generate questions based on your documents
35
+ - How to build a nice [Streamlit](https://github.com/streamlit/streamlit) web app to show your QA system
36
+ - How to optimize the web app to πŸš€ deploy in [πŸ€— Spaces](https://huggingface.co/spaces)
37
+
38
+ ## Repository structure πŸ“
39
+ - [app.py](./app.py): Streamlit web app
40
+ - [app_utils folder](./app_utils/): python modules used in the web app
41
+ - [crawler folder](./crawler/): Twin Peaks crawler, developed with Scrapy and fandom-py
42
+ - [notebooks folder](./notebooks/): Jupyter/Colab notebooks to create the Search pipeline and generate questions (using Haystack)
43
+ - [data folder](./data/): all necessary data
44
+
45
+ Within each folder, you can find more in-depth explanations.
46
+
47
+ ## Possible improvements ✨
48
+ - The reader model (`deepset/roberta-base-squad2`) is a good compromise between speed and accuracy, running on CPU. There are certainly better (and more computationally expensive) models, as you can read in the [Haystack documentation](https://haystack.deepset.ai/pipeline_nodes/reader).
49
+ - You can also think about preparing a Twin Peaks QA dataset and fine-tune the reader model to get better accuracy, as explained in [Haystack tutorial](https://haystack.deepset.ai/tutorials/fine-tuning-a-model).
50
+ - ...
51
+
52
+