Vector Database Benchmarks: FAISS vs Chroma vs Weaviate
This repository contains experiments benchmarking popular vector databases on multimodal embeddings generated from the Flickr8k dataset.
We focused on four key evaluation dimensions:
- Latency per query
- Recall@5 vs Flat (accuracy tradeoffs)
- Queries per second (QPS throughput)
- Ingestion scaling performance
All experiments were run on Google Colab (T4 GPU for embedding generation, CPU backend for databases).
Methodology
- Dataset: 6k images and 30k captions from Flickr8k.
- Embeddings: CLIP (OpenAI ViT-B/32).
- Workload: Caption-to-image retrieval (cross-modal).
- Baseline: FAISS Flat index used as the ground-truth for recall calculations.
Each vector database was tested under the same conditions for ingestion, search, and recall.
Results Summary
Metric | FAISS | Chroma | Weaviate |
---|---|---|---|
Avg Latency per Query | 0.19 ms | 0.76 ms | 1.82 ms |
Recall@5 (Flat Baseline) | 1.00 | 0.002 | 0.918 |
QPS Throughput | 1929.94 | 719.01 | 598.40 |
Ingestion Scaling (20k) | 0.024s | 2.806s | 4.000s |
Key Takeaways
- FAISS is fastest, leveraging in-memory array ingestion and customizable indexing strategies.
- Chroma offers simplicity and ease of integration but struggles at scale due to batching and internal constraints.
- Weaviate provides a more feature-rich ecosystem (schema, hybrid search, persistence) but at higher ingestion and query overhead.
At the million-vector scale, speed alone will not decide your choice; engineering tradeoffs, developer productivity, and system features will.
Benchmarks tell one part of the story, your use case tells the rest.
Usage
You can reproduce these experiments using the provided notebook and Hugging Face dataset.
See full code here: rag-experiments/VectorDB-Benchmarks.
Dataset used: Flickr8k (train split โ 6k images, 30k captions, multimodal โ images and text), CLIP Embeddings. Dataset Author: Johnathan Xie
Citation
If you find this useful, please cite this repository: