# Fast-Inference with Ctranslate2

Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.

quantized version of jinaai/jina-embedding-l-en-v1

pip install hf-hub-ctranslate2>=2.12.0 ctranslate2>=3.17.1

# from transformers import AutoTokenizer
model_name = "michaelfeil/ct2fast-jina-embedding-l-en-v1"
model_name_orig="jinaai/jina-embedding-l-en-v1"

from hf_hub_ctranslate2 import EncoderCT2fromHfHub
model = EncoderCT2fromHfHub(
        # load in int8 on CUDA
        model_name_or_path=model_name,
        device="cuda",
        compute_type="int8_float16"
)
outputs = model.generate(
    text=["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
    max_length=64,
) # perform downstream tasks on outputs
outputs["pooler_output"]
outputs["last_hidden_state"]
outputs["attention_mask"]

# alternative, use SentenceTransformer Mix-In
# for end-to-end Sentence embeddings generation
# (not pulling from this CT2fast-HF repo)

from hf_hub_ctranslate2 import CT2SentenceTransformer
model = CT2SentenceTransformer(
    model_name_orig, compute_type="int8_float16", device="cuda"
)
embeddings = model.encode(
    ["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
    batch_size=32,
    convert_to_numpy=True,
    normalize_embeddings=True,
)
print(embeddings.shape, embeddings)
scores = (embeddings @ embeddings.T) * 100

# Hint: you can also host this code via REST API and
# via github.com/michaelfeil/infinity

Checkpoint compatible to ctranslate2>=3.17.1 and hf-hub-ctranslate2>=2.12.0

compute_type=int8_float16 for device="cuda"
compute_type=int8 for device="cpu"

Converted on 2023-10-13 using

LLama-2 -> removed <pad> token.

Licence and other remarks:

This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.

Original description

The text embedding set trained by Jina AI, Finetuner team.

Intented Usage & Model Info

jina-embedding-l-en-v1 is a language model that has been trained using Jina AI's Linnaeus-Clean dataset. This dataset consists of 380 million pairs of sentences, which include both query-document pairs. These pairs were obtained from various domains and were carefully selected through a thorough cleaning process. The Linnaeus-Full dataset, from which the Linnaeus-Clean dataset is derived, originally contained 1.6 billion sentence pairs.

The model has a range of use cases, including information retrieval, semantic textual similarity, text reranking, and more.

With a size of 330 million parameters, the model enables single-gpu inference while delivering better performance than our small and base model. Additionally, we provide the following options:

jina-embedding-t-en-v1: 14 million parameters.
jina-embedding-s-en-v1: 35 million parameters
jina-embedding-b-en-v1: 110 million parameters.
jina-embedding-l-en-v1: 330 million parameters (you are here).
jina-embedding-1b-en-v1: 1.2 billion parameters, 10 times bert-base (soon).
jina-embedding-6b-en-v1: 6 billion parameters, 30 times bert-base (soon).

Data & Parameters

Please checkout our technical blog.

Metrics

We compared the model against all-minilm-l6-v2/all-mpnet-base-v2 from sbert and text-embeddings-ada-002 from OpenAI:

Name	param	dimension
all-minilm-l6-v2	23m	384
all-mpnet-base-v2	110m	768
ada-embedding-002	Unknown/OpenAI API	1536
jina-embedding-t-en-v1	14m	312
jina-embedding-s-en-v1	35m	512
jina-embedding-b-en-v1	110m	768
jina-embedding-l-en-v1	330m	1024

Name	STS12	STS13	STS14	STS15	STS16	STS17	TRECOVID	Quora	SciFact
all-minilm-l6-v2	0.724	0.806	0.756	0.854	0.79	0.876	0.473	0.876	0.645
all-mpnet-base-v2	0.726	0.835	0.78	0.857	0.8	0.906	0.513	0.875	0.656
ada-embedding-002	0.698	0.833	0.761	0.861	0.86	0.903	0.685	0.876	0.726
jina-embedding-t-en-v1	0.717	0.773	0.731	0.829	0.777	0.860	0.482	0.840	0.522
jina-embedding-s-en-v1	0.743	0.786	0.738	0.837	0.80	0.875	0.523	0.857	0.524
jina-embedding-b-en-v1	0.751	0.809	0.761	0.856	0.812	0.890	0.606	0.876	0.594
jina-embedding-l-en-v1	0.745	0.832	0.781	0.869	0.837	0.902	0.573	0.881	0.598

Usage

Use with Jina AI Finetuner

!pip install finetuner
import finetuner

model = finetuner.build_model('jinaai/jina-embedding-l-en-v1')
embeddings = finetuner.encode(
    model=model,
    data=['how is the weather today', 'What is the current weather like today?']
)
print(finetuner.cos_sim(embeddings[0], embeddings[1]))

Use with sentence-transformers:

from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim

sentences = ['how is the weather today', 'What is the current weather like today?']

model = SentenceTransformer('jinaai/jina-embedding-b-en-v1')
embeddings = model.encode(sentences)
print(cos_sim(embeddings[0], embeddings[1]))

Fine-tuning

Please consider Finetuner.

Plans

The development of jina-embedding-s-en-v2 is currently underway with two main objectives: improving performance and increasing the maximum sequence length.
We are currently working on a bilingual embedding model that combines English and X language. The upcoming model will be called jina-embedding-s/b/l-de-v1.

Contact

Join our Discord community and chat with other community members about ideas.

Citation

If you find Jina Embeddings useful in your research, please cite the following paper:

@misc{günther2023jina,
      title={Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models}, 
      author={Michael Günther and Louis Milliken and Jonathan Geuter and Georgios Mastrapas and Bo Wang and Han Xiao},
      year={2023},
      eprint={2307.11224},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

michaelfeil
/

ct2fast-jina-embedding-l-en-v1