davidheineman
/

colbert-acl

Model card Files Files and versions Community

davidheineman commited on Apr 26

Commit

f9ad19d

•

1 Parent(s): 7f8aaec

improve readme

Files changed (3) hide show

README.md +14 -1
knn_db_access.py +1 -2
openai_embed.py +1 -2

README.md CHANGED Viewed

@@ -2,6 +2,12 @@
 license: apache-2.0
 ---
 ## Setup ColBERT
 First, clone this repo and create a conda environment and install the dependencies:
 ```sh
@@ -10,7 +16,7 @@ git clone https://huggingface.co/davidheineman/colbert-acl
 pip install bibtexparser colbert-ir[torch,faiss-gpu]
 ```
-## Setup server
 Install pip dependencies
 ```sh
 pip install mysql-connector-python flask openai pymongo[srv]
@@ -26,6 +32,13 @@ Run the database setup to copy the ACL entries:
 python init_db.py
 ```
 ### (Optional) Step 1: Parse the Anthology
 Feel free to skip steps 1 and 2, since the parsed/indexed anthology is contained in this repo.

 license: apache-2.0
 ---
+This uses ColBERT as an information retreival interface for the [ACL Anthology](https://aclanthology.org/). It uses a MySQL backend for storing paper data and a simple flask front-end.
+We have two methods for retreving passage candidates, (i) using ColBERT, which may not scale well for extremely large datastores and (ii) using OpenAI embeddings, which selects the top-k passages for ColBERT to perform the expensive re-ranking. For OpenAI, you must have an API key and a MongoDB key for storing the vector entries.
+# Setup
 ## Setup ColBERT
 First, clone this repo and create a conda environment and install the dependencies:
 ```sh
 pip install bibtexparser colbert-ir[torch,faiss-gpu]
 ```
+## Setup MySQL server
 Install pip dependencies
 ```sh
 pip install mysql-connector-python flask openai pymongo[srv]
 python init_db.py
 ```
+## Setup MongoDB server
+First, make sure you have an OpenAI and MongoDB API key
+```sh
+echo [OPEN_AI_KEY] > .opeani-secret
+echo [MONGO_DB_KEY] > .mongodb-secret
+```
 ### (Optional) Step 1: Parse the Anthology
 Feel free to skip steps 1 and 2, since the parsed/indexed anthology is contained in this repo.

knn_db_access.py CHANGED Viewed

@@ -7,8 +7,7 @@ OPENAI = QueryEmbedder()
 USER = "test"
 SERVER = "dbbackend.c9tcfpp"
-with open('.mongodb-secret', 'r') as f:
-    PASS = f.read()
 class MongoDBAccess:

 USER = "test"
 SERVER = "dbbackend.c9tcfpp"
+with open('.mongodb-secret', 'r') as f: PASS = f.read()
 class MongoDBAccess:

openai_embed.py CHANGED Viewed

@@ -1,8 +1,7 @@
 from openai import OpenAI
-with open('.openai-secret', 'r') as f:
-    OPENAI_API_KEY = f.read()
 class QueryEmbedder:

 from openai import OpenAI
+with open('.openai-secret', 'r') as f: OPENAI_API_KEY = f.read()
 class QueryEmbedder: