--- title: Raccoon emoji: 🦝 colorFrom: blue colorTo: indigo sdk: streamlit sdk_version: 1.10.0 python_version: 3.9 app_file: app.py pinned: false license: mit --- # Raccoon ## Installation It is recommend to use virtual environment using [`venv`](https://docs.python.org/3/library/venv.html). The fol - If using Apple Silicon install rust `curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh` and `brew install cmake` - Create the virtual envoirnment: `python3 -m venv .venv` - Activate the virtual envoirnment: `source .venv/bin/activate` - To deactive the virtual envoirnment run `deactivate` within the virtual envoirnment. - Install the required packages: `.venv/bin/pip install -r requirements.txt` - `.venv/bin/pip install -e .` - [Create a custom search engine in Google](https://programmablesearchengine.google.com/controlpanel/all). - Create a API for the custom search engine. - Add the custom search engine key and PI key to `.streamlit/secrets.toml`. ```toml google_search_api_key = "api-key" google_search_engine_id = "search-engine-id" ``` - To start the interface: `streamlit run app.py` ### Todo - [ ] Improve fetched content. - [x] Fix issue of duplicate content extracted by beautifulsoup. - [x] Exclude code from content - [x] Find sentences that contain the search keywords. - [ ] Find sentences that contain the search keywords taking into account different spellings health care vs healthcare. - [ ] Get some content from every search result. - [ ] Div's with text & tags. Extract text from tags and then decompose the tags. Keep order of content and no duplicates. - [ ] Summarization requires truncation. Find solution where not needed. - [ ] Support German content with language switcher. - [ ] Improve queries to include more keywords (Expand abrivations & define context) - [ ] Control the number of results from the UI. - [ ] Control summary length via settings: https://docs.streamlit.io/library/advanced-features/session-state