Spaces:
Sleeping
Sleeping
File size: 4,126 Bytes
e49f5ad 3366771 486ead7 7f5df23 e49f5ad be71634 e49f5ad 0ef7038 f44712e a8158b1 261cff9 0ef7038 a8158b1 8e15e0d a8158b1 0ef7038 f44712e 0ef7038 eaa41fe 261cff9 f44712e a8158b1 261cff9 5f2135e 261cff9 943a5e0 261cff9 a8158b1 943a5e0 261cff9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
---
title: Who killed Laura Palmer?
emoji: π»π»
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.2.0
app_file: app.py
pinned: false
license: apache-2.0
---
# Who killed Laura Palmer? [![Generic badge](https://img.shields.io/badge/π€-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer) [![Generic badge](https://img.shields.io/github/stars/anakin87/who-killed-laura-palmer?label=Github&style=social)](https://github.com/anakin87/who-killed-laura-palmer)
[<img src="./data/readme_images/spaces_logo.png" align="center" style="display: block;margin-left: auto;
margin-right: auto; max-width: 70%;}">](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer)
## π»π» Twin Peaks Question Answering system
WKLP is a simple Question Answering system, based on data crawled from [Twin Peaks Wiki](https://twinpeaks.fandom.com/wiki/Twin_Peaks_Wiki). It is built using [π Haystack](https://github.com/deepset-ai/haystack), an awesome open-source framework for building search systems that work intelligently over large document collections.
- [Project architecture π§±](#project-architecture-)
- [What can I learn from this project? π](#what-can-i-learn-from-this-project-)
- [Repository structure π](#repository-structure-)
- [Installation π»](#installation-)
- [Possible improvements β¨](#possible-improvements-)
---
## Project architecture π§±
[![Project architecture](./data/readme_images/project_architecture.png)](#)
* Crawler: implemented using [Scrapy](https://github.com/scrapy/scrapy) and [fandom-py](https://github.com/NikolajDanger/fandom-py)
* Question Answering pipelines: created with [Haystack](https://github.com/deepset-ai/haystack)
* Web app: developed with [Streamlit](https://github.com/streamlit/streamlit)
* Free hosting: [Hugging Face Spaces](https://huggingface.co/spaces)
---
## What can I learn from this project? π
- How to quickly β build a modern Question Answering system using [π Haystack](https://github.com/deepset-ai/haystack)
- How to generate questions based on your documents
- How to build a nice [Streamlit](https://github.com/streamlit/streamlit) web app to show your QA system
- How to optimize the web app to π deploy in [π€ Spaces](https://huggingface.co/spaces)
[![Web app preview](./data/readme_images/webapp.png)](https://huggingface.co/spaces/anakin87/who-killed-laura-palmer)
## Repository structure π
- [app.py](./app.py): Streamlit web app
- [app_utils folder](./app_utils/): python modules used in the web app
- [crawler folder](./crawler/): Twin Peaks crawler, developed with Scrapy and fandom-py
- [notebooks folder](./notebooks/): Jupyter/Colab notebooks to create the Search pipeline and generate questions (using Haystack)
- [data folder](./data/): all necessary data
Within each folder, you can find more in-depth explanations.
## Installation π»
To install this project locally, follow these steps:
- `git clone https://github.com/anakin87/who-killed-laura-palmer`
- `cd who-killed-laura-palmer`
- `pip install -r requirements.txt`
To run the web app, simply type: `streamlit run app.py`
## Possible improvements β¨
### Project structure
- The project is optimized to be deployed in Hugging Face Spaces and consists of an all-in-one Streamlit web app. In more structured production environments, I suggest dividing the software into three parts:
- Haystack backend API (as explained in [the official documentation](https://haystack.deepset.ai/components/rest-api))
- Document store service
- Streamlit web app
### Reader
- The reader model (`deepset/roberta-base-squad2`) is a good compromise between speed and accuracy, running on CPU. There are certainly better (and more computationally expensive) models, as you can read in the [Haystack documentation](https://haystack.deepset.ai/pipeline_nodes/reader).
- You can also think about preparing a Twin Peaks QA dataset and fine-tune the reader model to get better accuracy, as explained in this [Haystack tutorial](https://haystack.deepset.ai/tutorials/fine-tuning-a-model).
|