README.md · jinaai/jina-embedding-s-en-v1 at 648612e805a1f3c16013636fc91a50d84ec80464

metadata

license: apache-2.0

Task-oriented finetuning for better embeddings on neural search

The text embedding suit trained by Jina AI, Finetuner team.

Intented Usage & Model Info

jina-embedding-s-en-v1 is a language model that has been trained using Jina AI's Linnaeus-Clean dataset. This dataset consists of 380 million pairs of sentences, which include both query-document pairs. These pairs were obtained from various domains and were carefully selected through a thorough cleaning process. The Linnaeus-Full dataset, from which the Linnaeus-Clean dataset is derived, originally contained 1.6 billion sentence pairs.

The model has a range of use cases, including information retrieval, semantic textual similarity, text reranking, and more.

With a compact size of just 35 million parameters, the model enables lightning-fast inference while still delivering impressive performance. Additionally, we provide the following options:

jina-embedding-b-en-v1: 110 million parameters.
jina-embedding-l-en-v1: 800 million parameters.
jina-embedding-xl-en-v1: 3 billion parameters.
jina-embedding-xxl-en-v1: 11 billion parameters.

Data & Parameters

More info will be released together with the technique report.

jinaai
/

jina-embedding-s-en-v1

Intented Usage & Model Info

Data & Parameters

Metrics

Usage