Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
4.38.1
metadata
title: Foodkit Knowledge Base
emoji: 🐢
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 3.29.0
app_file: app.py
pinned: false
license: unknown
Make sure OPENAI_API_KEY
environment variable is set with your API key.
Configure list of URLs to index in urls.txt.
Fetch, parse and transform the source documents to markdown using ./bin/fetch.sh
.
Index the documents using ./bin/index.sh
.
Start the project with python3 app.py
and navigate to https://127.0.0.1:7860
.
Indexing
With the current list of URLs currently takes around ~3 minutes to reindex.
Token usage:
OPENAI_API_KEY=<redacted> ./bin/index.sh
INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:root:> [build_index_from_documents] Total embedding token usage: 61173 tokens
Initial thoughts
- Limited by the quality/accuracy of the source materials. If the source docs are wrong, answers will also be wrong.
- Content which is heavily image/photo reliant is hard to index effectively.
- Documents are indexed in isolation so metadata (such as navigation hierarchy, filename, etc.) is not available.