davidheineman commited on
Commit
f9ad19d
1 Parent(s): 7f8aaec

improve readme

Browse files
Files changed (3) hide show
  1. README.md +14 -1
  2. knn_db_access.py +1 -2
  3. openai_embed.py +1 -2
README.md CHANGED
@@ -2,6 +2,12 @@
2
  license: apache-2.0
3
  ---
4
 
 
 
 
 
 
 
5
  ## Setup ColBERT
6
  First, clone this repo and create a conda environment and install the dependencies:
7
  ```sh
@@ -10,7 +16,7 @@ git clone https://huggingface.co/davidheineman/colbert-acl
10
  pip install bibtexparser colbert-ir[torch,faiss-gpu]
11
  ```
12
 
13
- ## Setup server
14
  Install pip dependencies
15
  ```sh
16
  pip install mysql-connector-python flask openai pymongo[srv]
@@ -26,6 +32,13 @@ Run the database setup to copy the ACL entries:
26
  python init_db.py
27
  ```
28
 
 
 
 
 
 
 
 
29
  ### (Optional) Step 1: Parse the Anthology
30
 
31
  Feel free to skip steps 1 and 2, since the parsed/indexed anthology is contained in this repo.
 
2
  license: apache-2.0
3
  ---
4
 
5
+ This uses ColBERT as an information retreival interface for the [ACL Anthology](https://aclanthology.org/). It uses a MySQL backend for storing paper data and a simple flask front-end.
6
+
7
+ We have two methods for retreving passage candidates, (i) using ColBERT, which may not scale well for extremely large datastores and (ii) using OpenAI embeddings, which selects the top-k passages for ColBERT to perform the expensive re-ranking. For OpenAI, you must have an API key and a MongoDB key for storing the vector entries.
8
+
9
+ # Setup
10
+
11
  ## Setup ColBERT
12
  First, clone this repo and create a conda environment and install the dependencies:
13
  ```sh
 
16
  pip install bibtexparser colbert-ir[torch,faiss-gpu]
17
  ```
18
 
19
+ ## Setup MySQL server
20
  Install pip dependencies
21
  ```sh
22
  pip install mysql-connector-python flask openai pymongo[srv]
 
32
  python init_db.py
33
  ```
34
 
35
+ ## Setup MongoDB server
36
+ First, make sure you have an OpenAI and MongoDB API key
37
+ ```sh
38
+ echo [OPEN_AI_KEY] > .opeani-secret
39
+ echo [MONGO_DB_KEY] > .mongodb-secret
40
+ ```
41
+
42
  ### (Optional) Step 1: Parse the Anthology
43
 
44
  Feel free to skip steps 1 and 2, since the parsed/indexed anthology is contained in this repo.
knn_db_access.py CHANGED
@@ -7,8 +7,7 @@ OPENAI = QueryEmbedder()
7
 
8
  USER = "test"
9
  SERVER = "dbbackend.c9tcfpp"
10
- with open('.mongodb-secret', 'r') as f:
11
- PASS = f.read()
12
 
13
 
14
  class MongoDBAccess:
 
7
 
8
  USER = "test"
9
  SERVER = "dbbackend.c9tcfpp"
10
+ with open('.mongodb-secret', 'r') as f: PASS = f.read()
 
11
 
12
 
13
  class MongoDBAccess:
openai_embed.py CHANGED
@@ -1,8 +1,7 @@
1
  from openai import OpenAI
2
 
3
 
4
- with open('.openai-secret', 'r') as f:
5
- OPENAI_API_KEY = f.read()
6
 
7
 
8
  class QueryEmbedder:
 
1
  from openai import OpenAI
2
 
3
 
4
+ with open('.openai-secret', 'r') as f: OPENAI_API_KEY = f.read()
 
5
 
6
 
7
  class QueryEmbedder: