Spaces:
Running
ChatData π π
We are constantly improving LangChain's self-query retriever. Some of the features are not merged.
Yet another chat-with-documents app, but supporting query over millions of files with MyScale and LangChain.
News π₯
- π§ Our contribution to LangChain that helps self-query retrievers filter with more types and functions
- π We just opened a FREE pod hosting data for ArXiv paper. Anyone can try their own SQL with vector search!!! Feel the power when SQL meets vector search! See how to access the pod here.
- π We collected 1.67 million papers on arxiv! We are collecting more and we need your advice!
- More coming...
Quickstart
- Create an virtual environment
python3 -m venv .venv
source .venv/bin/activate
- Install dependencies
This app is currently using MyScale's fork of LangChain. It contains improved prompts for comparators
LIKE
andCONTAIN
in MyScale self-query retriever.
python3 -m pip install -r requirements.txt
- Run the app!
# fill you OpenAI key in .streamlit/secrets.toml
cp .streamlit/secrets.example.toml .streamlit/secrets.toml
# start the app
python3 -m streamlit run app.py
Quick Navigator π§
How did LangChain and MyScale convert natural language to structured filters?
Where can I get those arxiv data?
Or directly use MyScale database as service... for FREE β¨
import clickhouse_connect client = clickhouse_connect.get_client( host='msc-1decbcc9.us-east-1.aws.staging.myscale.cloud', port=443, username='chatdata', password='myscale_rocks' )
Or put these settings in
.streamlit/secrets.toml
MYSCALE_HOST = "msc-1decbcc9.us-east-1.aws.staging.myscale.cloud" MYSCALE_PORT = 443 MYSCALE_USER = "chatdata" MYSCALE_PASSWORD = "myscale_rocks"
Introduction
ChatData brings millions of papers into your knowledge base. We imported 1.67 million papers with metadata info (continuously updating), which contains:
metadata.authors
: paper's authors in list of stringsmetadata.abstract
: paper's abstracts used as ranking criterion (with InstructXL)metadata.titles
: papers's titlesmetadata.categories
: paper's categories in list of strings like ["cs.CV"]metadata.pubdate
: paper's date of publication in ISO 8601 formated stringsmetadata.primary_category
: paper's primary category in strings defined by ArXivmetadata.comment
: some additional comment to the paper
And for overall table schema, please refer to table creation section in docs/self-query.md.
How to run π
python3 -m pip install requirements.txt
python3 -m streamlit run app.py
How to build? π§±
Special Thanks π (Ordered Alphabetically)
- ArXiv API for its open access interoperability to pre-printed papers.
- InstructorXL for its promptable embeddings that improves retrieve performance.
- LangChainπ¦οΈπ for its easy-to-use and composable API designs and prompts.
- The Alexandria Index for providing arXiv data index to the public.