raccoon / README.md
grapplerulrich's picture
Update streamlit version for spaces
9642bab unverified
metadata
title: Raccoon
emoji: 🦝
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.10.0
python_version: 3.9
app_file: app.py
pinned: false
license: mit

Raccoon

Installation

It is recommend to use virtual environment using venv.

The fol

  • If using Apple Silicon install rust curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh and brew install cmake
  • Create the virtual envoirnment: python3 -m venv .venv
  • Activate the virtual envoirnment: source .venv/bin/activate
    • To deactive the virtual envoirnment run deactivate within the virtual envoirnment.
  • Install the required packages: .venv/bin/pip install -r requirements.txt
  • .venv/bin/pip install -e .
  • Create a custom search engine in Google.
  • Create a API for the custom search engine.
  • Add the custom search engine key and PI key to .streamlit/secrets.toml.
google_search_api_key = "api-key"
google_search_engine_id = "search-engine-id"
  • To start the interface: streamlit run app.py

Todo

  • Improve fetched content.
    • Fix issue of duplicate content extracted by beautifulsoup.
    • Exclude code from content
    • Find sentences that contain the search keywords.
    • Find sentences that contain the search keywords taking into account different spellings health care vs healthcare.
    • Get some content from every search result.
    • Div's with text & tags. Extract text from tags and then decompose the tags. Keep order of content and no duplicates.
  • Summarization requires truncation. Find solution where not needed.
  • Support German content with language switcher.
  • Improve queries to include more keywords (Expand abrivations & define context)
  • Control the number of results from the UI.
  • Control summary length via settings: https://docs.streamlit.io/library/advanced-features/session-state