Corey Mcmahon
Tidy up
17bff54

A newer version of the Gradio SDK is available: 4.38.1

Upgrade
metadata
title: Foodkit Knowledge Base
emoji: 🐢
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 3.29.0
app_file: app.py
pinned: false
license: unknown

Make sure OPENAI_API_KEY environment variable is set with your API key.

Configure list of URLs to index in urls.txt.

Fetch, parse and transform the source documents to markdown using ./bin/fetch.sh.

Index the documents using ./bin/index.sh.

Start the project with python3 app.py and navigate to https://127.0.0.1:7860.

Indexing

With the current list of URLs currently takes around ~3 minutes to reindex.

Token usage:

OPENAI_API_KEY=<redacted> ./bin/index.sh
INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:root:> [build_index_from_documents] Total embedding token usage: 61173 tokens

Initial thoughts

  • Limited by the quality/accuracy of the source materials. If the source docs are wrong, answers will also be wrong.
  • Content which is heavily image/photo reliant is hard to index effectively.
  • Documents are indexed in isolation so metadata (such as navigation hierarchy, filename, etc.) is not available.