# Approach
There are a number of aspects of choosing a vector db that might be unique to your situation. You should think through your HW, utilization, latency requirements, scale, etc before choosing. 

Im targeting a demo (low utilization, latency can be relaxed) that will live on a huggingface space. I have a small scale that could even fit in memory. I like [Qdrant](https://qdrant.tech) for this. 

# Imports

In [1]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

In [2]:
from pathlib import Path
import pickle

from tqdm.notebook import tqdm
from haystack.schema import Document
from qdrant_haystack import QdrantDocumentStore

In [3]:
proj_dir = Path.cwd().parent
print(proj_dir)

/home/ec2-user/RAGDemo


# Config

In [4]:
file_in = proj_dir / 'data/processed/simple_wiki_embeddings.pkl'

# Setup
Read in our list of dictionaries. This is the upper end for the machine Im using. This takes ~10GB of RAM. We could easily do this in batches of ~100k and be fine in most machines. 

In [5]:
%%time
with open(file_in, 'rb') as handle:
 documents = pickle.load(handle)

CPU times: user 11.6 s, sys: 2.25 s, total: 13.9 s
Wall time: 18.1 s


Convert the dictionaries to `Documents`

In [6]:
documents = [Document.from_dict(d) for d in documents]

Instantiate our `DocumentStore`. Note that Im saving this to disk, this is for portability which is good considering I want to move from this ec2 instance into a Hugging Face Space. 

Note that if you are doing this at scale, you should use a proper instance and not saving to file. You should also take a [measured ingestion](https://qdrant.tech/documentation/tutorials/bulk-upload/) approach instead of using a convenient loader. 

In [7]:
document_store = QdrantDocumentStore(
 path=str(proj_dir/'Qdrant'),
 index="RAGDemo",
 embedding_dim=768,
 recreate_index=True,
 hnsw_config={"m": 16, "ef_construct": 64} # Optional
)

In [9]:
%%time
document_store.write_documents(documents, batch_size=5_000)

270000it [28:43, 156.68it/s] 

CPU times: user 13min 23s, sys: 48.6 s, total: 14min 12s
Wall time: 28min 43s



