Spaces:
Sleeping
Sleeping
Twin Peaks crawler
This crawler download texts and metadata from Twin Peaks Fandom Wiki. The output format is JSON. The crawler is based on the combination of Scrapy and fandom-py.
Several wiki pages are discarded, since they are not related to Twin Peaks plot and create noise in the Question Answering index.
Installation
- copy this folder (if needed, see stackoverflow)
pip install -r requirements.txt
Usage
- (if needed, activate the virtual environment)
cd tpcrawler
scrapy crawl tpcrawler
- you can find the downloaded pages in
data
subfolder