datahub_qa_bot / README.md
abdvl's picture
update the interface
b138bed
metadata
title: Datahub Qa Bot
emoji: πŸ‘
colorFrom: gray
colorTo: purple
sdk: streamlit
sdk_version: 1.17.0
app_file: app.py
pinned: false
license: mit

DataHub documentation bot

Using OpenAI, Langchain and streamlit to train DataHub documentation to provide a DataHub QA BOT on huggingface space

How to run locally

  1. Clone the repo
  2. Run:
source .venv/bin/activate
pip install -r requirements.txt
streamlit run app.py

How to train your own model

  1. Delete the db folder
  2. Copy the docs folder from DataHub docs folder to ./docs
  3. Update the os.environ["OPENAI_API_KEY"] in the train.py
  4. Run python3 train.py

The training will take 15 seconds, and cost around $0.20

chromadb.db.duckdb: loaded in 236 embeddings
chromadb.db.duckdb: loaded in 1 collections