colbert-acl / README.md
davidheineman's picture
update index
ea6d01d
|
raw
history blame
No virus
3.19 kB
metadata
license: apache-2.0

Use ColBERT as a search engine for the ACL Anthology. (Parse any bibtex, and store in a MySQL service)

Setup

Setup ColBERT

git clone https://huggingface.co/davidheineman/colbert-acl

# install dependencies
# torch==1.13.1 required (conda install -y -n [env] python=3.10)
pip install -r requirements.txt
brew install mysql

(Optional) Parse & Index the Anthology

Feel free to skip, since the parsed/indexed anthology is contained in this repo.

# get up-to-date abstracts in bibtex
curl -O https://aclanthology.org/anthology+abstracts.bib.gz
gunzip anthology+abstracts.bib.gz
mv anthology+abstracts.bib anthology.bib

# parse .bib -> .json
python parse.py

# index with ColBERT 
# (note sometimes there is a silent failure if the CPP extensions do not exist)
python index.py

Search with ColBERT

# start flask server
python server.py

# or start a production API endpoint
gunicorn -w 4 -b 0.0.0.0:8893 server:app

Then, to test, visit:

http://localhost:8893/api/search?query=Information retrevial with BERT

or for an interface:

http://localhost:8893

Deploy as a Docker App

docker-compose build --no-cache
docker-compose up --build

Example notebooks

To see an example of search, visit: colab.research.google.com/drive/1-b90_8YSAK17KQ6C7nqKRYbCWEXQ9FGs