ChatData / README.md
Fangrui Liu
init
a796108
|
raw
history blame
4.55 kB

ChatData πŸ” πŸ“–

We are constantly improving LangChain's self-query retriever. Some of the features are not merged.

Twitter

ChatData

Yet another chat-with-documents app, but supporting query over millions of files with MyScale and LangChain.

News πŸ”₯

  • πŸ”§ Our contribution to LangChain that helps self-query retrievers filter with more types and functions
  • 🌟 We just opened a FREE pod hosting data for ArXiv paper. Anyone can try their own SQL with vector search!!! Feel the power when SQL meets vector search! See how to access the pod here.
  • πŸ“š We collected 1.67 million papers on arxiv! We are collecting more and we need your advice!
  • More coming...

Quickstart

  1. Create an virtual environment
python3 -m venv .venv
source .venv/bin/activate
  1. Install dependencies

This app is currently using MyScale's fork of LangChain. It contains improved prompts for comparators LIKE and CONTAIN in MyScale self-query retriever.

python3 -m pip install -r requirements.txt
  1. Run the app!
# fill you OpenAI key in .streamlit/secrets.toml
cp .streamlit/secrets.example.toml .streamlit/secrets.toml
# start the app
python3 -m streamlit run app.py

Quick Navigator 🧭

Introduction

ChatData brings millions of papers into your knowledge base. We imported 1.67 million papers with metadata info (continuously updating), which contains:

  1. metadata.authors: paper's authors in list of strings
  2. metadata.abstract: paper's abstracts used as ranking criterion (with InstructXL)
  3. metadata.titles: papers's titles
  4. metadata.categories: paper's categories in list of strings like ["cs.CV"]
  5. metadata.pubdate: paper's date of publication in ISO 8601 formated strings
  6. metadata.primary_category: paper's primary category in strings defined by ArXiv
  7. metadata.comment: some additional comment to the paper

And for overall table schema, please refer to table creation section in docs/self-query.md.

How to run πŸƒ

python3 -m pip install requirements.txt
python3 -m streamlit run app.py

How to build? 🧱

See docs/self-query.md

Special Thanks πŸ‘ (Ordered Alphabetically)