Spaces:

anakin87
/

who-killed-laura-palmer

Running

App Files Files Community

who-killed-laura-palmer / crawler /README.md

Stefano Fiorucci

added installation section to README

5f2135e over 2 years ago

|

history blame contribute delete

No virus

817 Bytes

A newer version of the Streamlit SDK is available: 1.38.0

Upgrade

Twin Peaks crawler

This crawler download texts and metadata from Twin Peaks Fandom Wiki. The output format is JSON. The crawler is based on the combination of Scrapy and fandom-py.

Several wiki pages are discarded, since they are not related to Twin Peaks plot and create noise in the Question Answering index.

Installation

copy this folder (if needed, see stackoverflow)
pip install -r requirements.txt

Usage

(if needed, activate the virtual environment)
cd tpcrawler
scrapy crawl tpcrawler
you can find the downloaded pages in data subfolder