--- title: Datahub Qa Bot emoji: 👁 colorFrom: gray colorTo: purple sdk: streamlit sdk_version: 1.17.0 app_file: app.py pinned: false license: mit --- # DataHub documentation bot Using [OpenAI](https://platform.openai.com/docs/introduction), [Langchain](https://python.langchain.com/en/latest/index.html) and [streamlit](https://docs.streamlit.io/) to train DataHub documentation to provide a DataHub QA BOT on [huggingface space](https://huggingface.co/spaces/abdvl/datahub_qa_bot?logs=build) # How to run locally 1. Clone the repo 2. Run: ``` source .venv/bin/activate pip install -r requirements.txt streamlit run app.py ``` ## How to train your own model 1. Delete the db folder 2. Copy the docs folder from [DataHub docs folder](https://github.com/datahub-project/datahub/tree/master/docs) to `./docs` 3. Update the `os.environ["OPENAI_API_KEY"] ` in the `train.py` 4. Run `python3 train.py` The training will take 15 seconds, and cost around $0.20 ``` chromadb.db.duckdb: loaded in 236 embeddings chromadb.db.duckdb: loaded in 1 collections ```