--- license: apache-2.0 --- Use ColBERT as a search engine for the [ACL Anthology](https://aclanthology.org/). (Parse any bibtex, and store in a MySQL service) # Setup ## Setup ColBERT ```sh git clone https://huggingface.co/davidheineman/colbert-acl # install dependencies # torch==1.13.1 required (conda install -y -n [env] python=3.10) pip install -r requirements.txt brew install mysql ``` ### (Optional) Parse & Index the Anthology Feel free to skip, since the parsed/indexed anthology is contained in this repo. ```sh # get up-to-date abstracts in bibtex curl -O https://aclanthology.org/anthology+abstracts.bib.gz gunzip anthology+abstracts.bib.gz mv anthology+abstracts.bib anthology.bib # parse .bib -> .json python parse.py # index with ColBERT # (note sometimes there is a silent failure if the CPP extensions do not exist) python index.py ``` ### Search with ColBERT ```sh # start flask server python server.py # or start a production API endpoint gunicorn -w 4 -b 0.0.0.0:8893 server:app ``` Then, to test, visit: ``` http://localhost:8893/api/search?query=Information retrevial with BERT ``` or for an interface: ``` http://localhost:8893 ``` ### Deploy as a Docker App ```sh docker-compose build --no-cache docker-compose up --build ``` ## Example notebooks To see an example of search, visit: [colab.research.google.com/drive/1-b90_8YSAK17KQ6C7nqKRYbCWEXQ9FGs](https://colab.research.google.com/drive/1-b90_8YSAK17KQ6C7nqKRYbCWEXQ9FGs?usp=sharing)