davidheineman commited on
Commit
d825967
1 Parent(s): d72d082

update readme

Browse files
Files changed (2) hide show
  1. README.md +2 -6
  2. db_search.py +1 -2
README.md CHANGED
@@ -65,12 +65,8 @@ http://localhost:8893/api/search?k=25&query=How to extend context windows?
65
  To see an example of search, visit:
66
  [colab.research.google.com/drive/1-b90_8YSAK17KQ6C7nqKRYbCWEXQ9FGs](https://colab.research.google.com/drive/1-b90_8YSAK17KQ6C7nqKRYbCWEXQ9FGs?usp=sharing)
67
 
68
- ## Notes
69
- - It's possible to update the index without re-computing the whole dataset. Basically the IVF table is updated, but the centroids are not re-computed. This requires a large dataset to already exist (in our case it does).
70
- - We'll need someone to manage the storage/saving of the index, so it can be updated in real-time.
71
  - See:
72
  - https://github.com/stanford-futuredata/ColBERT/blob/main/colbert/index_updater.py
73
  - https://github.com/stanford-futuredata/ColBERT/issues/111
74
- - We also need a MySQL database which can take in a document ID and return its metadata, so the ColBERT database only stores the passage encodings, not the full text (right now it just loads the whole json into memory).
75
- - We may be able to offload the centroids calculation to a vector DB (check on this)
76
- - Should have 2 people on UI, 1 on MySQL, 1 on VectorDB, 1 on ColBERT
 
65
  To see an example of search, visit:
66
  [colab.research.google.com/drive/1-b90_8YSAK17KQ6C7nqKRYbCWEXQ9FGs](https://colab.research.google.com/drive/1-b90_8YSAK17KQ6C7nqKRYbCWEXQ9FGs?usp=sharing)
67
 
68
+ <!-- ## Notes
 
 
69
  - See:
70
  - https://github.com/stanford-futuredata/ColBERT/blob/main/colbert/index_updater.py
71
  - https://github.com/stanford-futuredata/ColBERT/issues/111
72
+ -->
 
 
db_search.py CHANGED
@@ -20,8 +20,7 @@ def complete_request(colbert_response, year):
20
  pids_str = ', '.join(['%s'] * len(pids))
21
  query = PAPER_QUERY.format(query_arg_str=pids_str, year=year)
22
 
23
- print(query)
24
- print(pids)
25
 
26
  cursor.execute(query, pids)
27
  results = cursor.fetchall()
 
20
  pids_str = ', '.join(['%s'] * len(pids))
21
  query = PAPER_QUERY.format(query_arg_str=pids_str, year=year)
22
 
23
+ print(PAPER_QUERY.format(query_arg_str=', '.join([str(p) for p in pids]), year=year))
 
24
 
25
  cursor.execute(query, pids)
26
  results = cursor.fetchall()