davidheineman
/

colbert-acl

Model card Files Files and versions Community

davidheineman commited on Apr 26

Commit

d825967

•

1 Parent(s): d72d082

update readme

Files changed (2) hide show

README.md +2 -6
db_search.py +1 -2

README.md CHANGED Viewed

@@ -65,12 +65,8 @@ http://localhost:8893/api/search?k=25&query=How to extend context windows?
 To see an example of search, visit:
 [colab.research.google.com/drive/1-b90_8YSAK17KQ6C7nqKRYbCWEXQ9FGs](https://colab.research.google.com/drive/1-b90_8YSAK17KQ6C7nqKRYbCWEXQ9FGs?usp=sharing)
-## Notes
-- It's possible to update the index without re-computing the whole dataset. Basically the IVF table is updated, but the centroids are not re-computed. This requires a large dataset to already exist (in our case it does).
-    - We'll need someone to manage the storage/saving of the index, so it can be updated in real-time.
 - See:
     - https://github.com/stanford-futuredata/ColBERT/blob/main/colbert/index_updater.py
     - https://github.com/stanford-futuredata/ColBERT/issues/111
-- We also need a MySQL database which can take in a document ID and return its metadata, so the ColBERT database only stores the passage encodings, not the full text (right now it just loads the whole json into memory).
-- We may be able to offload the centroids calculation to a vector DB (check on this)
-- Should have 2 people on UI, 1 on MySQL, 1 on VectorDB, 1 on ColBERT

 To see an example of search, visit:
 [colab.research.google.com/drive/1-b90_8YSAK17KQ6C7nqKRYbCWEXQ9FGs](https://colab.research.google.com/drive/1-b90_8YSAK17KQ6C7nqKRYbCWEXQ9FGs?usp=sharing)
+<!-- ## Notes
 - See:
     - https://github.com/stanford-futuredata/ColBERT/blob/main/colbert/index_updater.py
     - https://github.com/stanford-futuredata/ColBERT/issues/111
+ -->

db_search.py CHANGED Viewed

@@ -20,8 +20,7 @@ def complete_request(colbert_response, year):
     pids_str = ', '.join(['%s'] * len(pids))
     query = PAPER_QUERY.format(query_arg_str=pids_str, year=year)
-    print(query)
-    print(pids)
     cursor.execute(query, pids)
     results = cursor.fetchall()

     pids_str = ', '.join(['%s'] * len(pids))
     query = PAPER_QUERY.format(query_arg_str=pids_str, year=year)
+    print(PAPER_QUERY.format(query_arg_str=', '.join([str(p) for p in pids]), year=year))
     cursor.execute(query, pids)
     results = cursor.fetchall()