Spaces:
Sleeping
Sleeping
title: Who killed Laura Palmer? | |
emoji: π»π» | |
colorFrom: blue | |
colorTo: green | |
sdk: streamlit | |
sdk_version: 1.2.0 | |
app_file: app.py | |
pinned: false | |
license: apache-2.0 | |
# Who killed Laura Palmer? [![Generic badge](https://img.shields.io/badge/π€-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer) [![Generic badge](https://img.shields.io/github/stars/anakin87/who-killed-laura-palmer?label=Github&style=social)](https://github.com/anakin87/who-killed-laura-palmer) | |
[<img src="./data/readme_images/spaces_logo.png" align="center" style="display: block;margin-left: auto; | |
margin-right: auto; max-width: 70%;}">](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer) | |
## π»π» Twin Peaks Question Answering system | |
WKLP is a simple Question Answering system, based on data crawled from [Twin Peaks Wiki](https://twinpeaks.fandom.com/wiki/Twin_Peaks_Wiki). It is built using [π Haystack](https://github.com/deepset-ai/haystack), an awesome open-source framework for building search systems that work intelligently over large document collections. | |
- [Project architecture π§±](#project-architecture-) | |
- [What can I learn from this project? π](#what-can-i-learn-from-this-project-) | |
- [Repository structure π](#repository-structure-) | |
- [Possible improvements β¨](#possible-improvements-) | |
--- | |
## Project architecture 𧱠| |
[![Project architecture](./data/readme_images/project_architecture.png)](#) | |
* Crawler: implemented using [Scrapy](https://github.com/scrapy/scrapy) and [fandom-py](https://github.com/NikolajDanger/fandom-py) | |
* Question Answering pipelines: created with [Haystack](https://github.com/deepset-ai/haystack) | |
* Web app: developed with [Streamlit](https://github.com/streamlit/streamlit) | |
* Free hosting: [Hugging Face Spaces](https://huggingface.co/spaces) | |
--- | |
## What can I learn from this project? π | |
- How to quickly β build a modern Question Answering system using [π Haystack](https://github.com/deepset-ai/haystack) | |
- How to generate questions based on your documents | |
- How to build a nice [Streamlit](https://github.com/streamlit/streamlit) web app to show your QA system | |
- How to optimize the web app to π deploy in [π€ Spaces](https://huggingface.co/spaces) | |
[![Web app preview](./data/readme_images/webapp.png)](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer) | |
## Repository structure π | |
- [app.py](./app.py): Streamlit web app | |
- [app_utils folder](./app_utils/): python modules used in the web app | |
- [crawler folder](./crawler/): Twin Peaks crawler, developed with Scrapy and fandom-py | |
- [notebooks folder](./notebooks/): Jupyter/Colab notebooks to create the Search pipeline and generate questions (using Haystack) | |
- [data folder](./data/): all necessary data | |
Within each folder, you can find more in-depth explanations. | |
## Possible improvements β¨ | |
### Project structure | |
- The project is optimized to be deployed in Hugging Face Spaces and consists of an all-in-one Streamlit web app. In more structured production environments, I suggest dividing the software into three parts: | |
- Haystack backend API (as explained in [the official documentation](https://haystack.deepset.ai/components/rest-api)) | |
- Document store service | |
- Streamlit web app | |
### Reader | |
- The reader model (`deepset/roberta-base-squad2`) is a good compromise between speed and accuracy, running on CPU. There are certainly better (and more computationally expensive) models, as you can read in the [Haystack documentation](https://haystack.deepset.ai/pipeline_nodes/reader). | |
- You can also think about preparing a Twin Peaks QA dataset and fine-tune the reader model to get better accuracy, as explained in this [Haystack tutorial](https://haystack.deepset.ai/tutorials/fine-tuning-a-model). | |