Bo Wang

bwang0911

AI & ML interests

information retrieval, representation learning

Organizations

Posts 3

view post
Post
1953
In the vector search setup, we normally combine a fast embedding model and an accurate but slow reranker model.

The newly released @jinaai rerankers are small in size and almost as accurate as our base reranker. This means given a time constraint, it can scoring more candidate documents from embedding models and have a better chance to feed LLM the correct context for RAG generation.

These models are available on Huggingface and has been integrated into the latest SentenceTransformers 2.7.0. Check it out!

jinaai/jina-reranker-v1-turbo-en
jinaai/jina-reranker-v1-tiny-en
view post
Post
@jinaai , we've recently launched an interesting model: jinaai/jina-colbert-v1-en. In this post, I'd like to give you a quick introduction to ColBERT: the multi-vector search & late interaction retriever.

As you may already know, we've been developing embedding models such as jinaai/jina-embeddings-v2-base-en for some time. These models, often called 'dense retrievers', generate a single representation for each document.

Embedding models like Jina-v2 have the advantage of quick integration with vector databases and good performance within a specific domain.

When discussing tasks within a specific domain, it means embedding models can perform very well by "seeing similar distributions". However, this also suggests that they might only perform "okay" on tasks outside of that domain and require fine-tuning.

Now, let's delve into multi-vector search and late-interaction models. The idea is quite simple:

1. During model training, you apply dimensionality reduction to decrease the vector dimensionality from 768 to 128 to save storage.
2. Now, with one query and one document, you match each query token embedding against every token embedding in the document to find the maximum similarity score. Repeat this process for each token in the query, from the second to the last, and then sum up all the maximum similarity scores.

This process is called multi-vector search because if your query has 5 tokens, you're keeping 5 * 128 token embeddings. The "max similarity sum-up" procedure is termed late interaction.

Multi-vector & Late interaction retrievers have the advantage of:

1. Excellent performance outside of a specific domain since they match at a token-level granularity.
2. Explainability: you can interpret your token-level matching and understand why the score is higher/lower.

Try our first multi-vector search at jinaai/jina-colbert-v1-en and share your feedback!

models

None public yet

datasets

None public yet