Stefano Fiorucci
title: Who killed Laura Palmer?
emoji: πŸ—»πŸ—»
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.2.0
pinned: false
license: apache-2.0
# Who killed Laura Palmer?   [![Generic badge](πŸ€—-Open%20in%20Spaces-blue.svg)]( [![Generic badge](](
[<img src="./data/readme_images/spaces_logo.png" align="center" style="display: block;margin-left: auto;
margin-right: auto; max-width: 70%;}">](
## πŸ—»πŸ—» Twin Peaks Question Answering system
WKLP is a simple Question Answering system, based on data crawled from [Twin Peaks Wiki]( It is built using [πŸ” Haystack](, an awesome open-source framework for building search systems that work intelligently over large document collections.
- [Project architecture 🧱](#project-architecture-)
- [What can I learn from this project? πŸ“š](#what-can-i-learn-from-this-project-)
- [Repository structure πŸ“](#repository-structure-)
- [Installation πŸ’»](#installation-)
- [Possible improvements ✨](#possible-improvements-)
## Project architecture 🧱
[![Project architecture](./data/readme_images/project_architecture.png)](#)
* Crawler: implemented using [Scrapy]( and [fandom-py](
* Question Answering pipelines: created with [Haystack](
* Web app: developed with [Streamlit](
* Free hosting: [Hugging Face Spaces](
## What can I learn from this project? πŸ“š
- How to quickly ⌚ build a modern Question Answering system using [πŸ” Haystack](
- How to generate questions based on your documents
- How to build a nice [Streamlit]( web app to show your QA system
- How to optimize the web app to πŸš€ deploy in [πŸ€— Spaces](
[![Web app preview](./data/readme_images/webapp.png)](
## Repository structure πŸ“
- [](./ Streamlit web app
- [app_utils folder](./app_utils/): python modules used in the web app
- [crawler folder](./crawler/): Twin Peaks crawler, developed with Scrapy and fandom-py
- [notebooks folder](./notebooks/): Jupyter/Colab notebooks to create the Search pipeline and generate questions (using Haystack)
- [data folder](./data/): all necessary data
- [presentations](./presentations/): Video presentation and slides (PyCon Italy 2022)
Within each folder, you can find more in-depth explanations.
## Installation πŸ’»
To install this project locally, follow these steps:
- `git clone`
- `cd who-killed-laura-palmer`
- `pip install -r requirements.txt`
To run the web app, simply type: `streamlit run`
## Possible improvements ✨
### Project structure
- The project is optimized to be deployed in Hugging Face Spaces and consists of an all-in-one Streamlit web app. In more structured production environments, I suggest dividing the software into three parts:
- Haystack backend API (as explained in [the official documentation](
- Document store service
- Streamlit web app
### Reader
- The reader model (`deepset/roberta-base-squad2`) is a good compromise between speed and accuracy, running on CPU. There are certainly better (and more computationally expensive) models, as you can read in the [Haystack documentation](
- You can also think about preparing a Twin Peaks QA dataset and fine-tuning the reader model to get better accuracy, as explained in this [Haystack tutorial](