Upload jinaai/jina-embedding-t-en-v1 ctranslate2 weights

18ed68f 11 months ago

No virus

7.85 kB

	---
	pipeline_tag: sentence-similarity
	tags:
	- ctranslate2
	- int8
	- float16
	- finetuner
	- sentence-transformers
	- feature-extraction
	- sentence-similarity
	datasets:
	- jinaai/negation-dataset
	language: en
	license: apache-2.0
	---
	# # Fast-Inference with Ctranslate2
	Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.

	quantized version of [jinaai/jina-embedding-t-en-v1](https://huggingface.co/jinaai/jina-embedding-t-en-v1)
	```bash
	pip install hf-hub-ctranslate2>=2.12.0 ctranslate2>=3.17.1
	```

	```python
	# from transformers import AutoTokenizer
	model_name = "michaelfeil/ct2fast-jina-embedding-t-en-v1"
	model_name_orig="jinaai/jina-embedding-t-en-v1"

	from hf_hub_ctranslate2 import EncoderCT2fromHfHub
	model = EncoderCT2fromHfHub(
	# load in int8 on CUDA
	model_name_or_path=model_name,
	device="cuda",
	compute_type="int8_float16"
	)
	outputs = model.generate(
	text=["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
	max_length=64,
	) # perform downstream tasks on outputs
	outputs["pooler_output"]
	outputs["last_hidden_state"]
	outputs["attention_mask"]

	# alternative, use SentenceTransformer Mix-In
	# for end-to-end Sentence embeddings generation
	# (not pulling from this CT2fast-HF repo)

	from hf_hub_ctranslate2 import CT2SentenceTransformer
	model = CT2SentenceTransformer(
	model_name_orig, compute_type="int8_float16", device="cuda"
	)
	embeddings = model.encode(
	["I like soccer", "I like tennis", "The eiffel tower is in Paris"],
	batch_size=32,
	convert_to_numpy=True,
	normalize_embeddings=True,
	)
	print(embeddings.shape, embeddings)
	scores = (embeddings @ embeddings.T) * 100

	# Hint: you can also host this code via REST API and
	# via github.com/michaelfeil/infinity


	```

	Checkpoint compatible to [ctranslate2>=3.17.1](https://github.com/OpenNMT/CTranslate2)
	and [hf-hub-ctranslate2>=2.12.0](https://github.com/michaelfeil/hf-hub-ctranslate2)
	- `compute_type=int8_float16` for `device="cuda"`
	- `compute_type=int8` for `device="cpu"`

	Converted on 2023-10-13 using
	```
	LLama-2 -> removed <pad> token.
	```

	# Licence and other remarks:
	This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.

	# Original description


	<br><br>

	<p align="center">
	<img src="https://github.com/jina-ai/finetuner/blob/main/docs/_static/finetuner-logo-ani.svg?raw=true" alt="Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications." width="150px">
	</p>


	<p align="center">
	<b>The text embedding set trained by <a href="https://jina.ai/"><b>Jina AI</b></a>, <a href="https://github.com/jina-ai/finetuner"><b>Finetuner</b></a> team.</b>
	</p>


	## Intented Usage & Model Info

	`jina-embedding-t-en-v1` is a tiny language model that has been trained using Jina AI's Linnaeus-Clean dataset.
	This dataset consists of 380 million pairs of sentences, which include both query-document pairs.
	These pairs were obtained from various domains and were carefully selected through a thorough cleaning process.
	The Linnaeus-Full dataset, from which the Linnaeus-Clean dataset is derived, originally contained 1.6 billion sentence pairs.

	The model has a range of use cases, including information retrieval, semantic textual similarity, text reranking, and more.

	With a tiny small parameter size of just 14 million parameters,
	the model enables lightning-fast inference on CPU, while still delivering impressive performance.
	Additionally, we provide the following options:

	- [`jina-embedding-t-en-v1`](https://huggingface.co/jinaai/jina-embedding-t-en-v1): 14 million parameters (you are here).
	- [`jina-embedding-s-en-v1`](https://huggingface.co/jinaai/jina-embedding-s-en-v1): 35 million parameters.
	- [`jina-embedding-b-en-v1`](https://huggingface.co/jinaai/jina-embedding-b-en-v1): 110 million parameters.
	- [`jina-embedding-l-en-v1`](https://huggingface.co/jinaai/jina-embedding-l-en-v1): 330 million parameters.
	- `jina-embedding-1b-en-v1`: 1.2 billion parameters, 10 times bert-base (soon).
	- `jina-embedding-6b-en-v1`: 6 billion parameters, 30 times bert-base (soon).

	## Data & Parameters

	Please checkout our [technical blog](https://arxiv.org/abs/2307.11224).

	## Metrics

	We compared the model against `all-minilm-l6-v2`/`all-mpnet-base-v2` from sbert and `text-embeddings-ada-002` from OpenAI:

	\|Name\|param \|dimension\|
	\|------------------------------\|-----\|------\|
	\|all-minilm-l6-v2\|23m \|384\|
	\|all-mpnet-base-v2 \|110m \|768\|
	\|ada-embedding-002\|Unknown/OpenAI API \|1536\|
	\|jina-embedding-t-en-v1\|14m \|312\|
	\|jina-embedding-s-en-v1\|35m \|512\|
	\|jina-embedding-b-en-v1\|110m \|768\|
	\|jina-embedding-l-en-v1\|330m \|1024\|


	\|Name\|STS12\|STS13\|STS14\|STS15\|STS16\|STS17\|TRECOVID\|Quora\|SciFact\|
	\|------------------------------\|-----\|-----\|-----\|-----\|-----\|-----\|--------\|-----\|-----\|
	\|all-minilm-l6-v2\|0.724\|0.806\|0.756\|0.854\|0.79 \|0.876\|0.473 \|0.876\|0.645 \|
	\|all-mpnet-base-v2\|0.726\|0.835\|0.78 \|0.857\|0.8 \|0.906\|0.513 \|0.875\|0.656 \|
	\|ada-embedding-002\|0.698\|0.833\|0.761\|0.861\|0.86 \|0.903\|0.685 \|0.876\|0.726 \|
	\|jina-embedding-t-en-v1\|0.717\|0.773\|0.731\|0.829\|0.777\|0.860\|0.482 \|0.840\|0.522 \|
	\|jina-embedding-s-en-v1\|0.743\|0.786\|0.738\|0.837\|0.80\|0.875\|0.523 \|0.857\|0.524 \|
	\|jina-embedding-b-en-v1\|0.751\|0.809\|0.761\|0.856\|0.812\|0.890\|0.606 \|0.876\|0.594 \|
	\|jina-embedding-l-en-v1\|0.745\|0.832\|0.781\|0.869\|0.837\|0.902\|0.573 \|0.881\|0.598 \|

	## Inference Speed

	We encoded a single sentence "What is the current weather like today?" 10k times on:

	1. cpu: MacBook Pro 2020, 2 GHz Quad-Core Intel Core i5
	2. gpu: 1 Nvidia 3090

	And recorded time spent to demonstrate the embedding speed:

	\|Name\|param \|dimension\| time@cpu \| time@gpu \|
	\|------------------------------\|-----\|------\|-----\|-----\|
	\|jina-embedding-t-en-v1\|14m \|312\| 5.78s \| 2.36s\|
	\|all-minilm-l6-v2\|23m \|384\| 11.95s \| 2.70s \|
	\|jina-embedding-s-en-v1\|35m \|512\| 17.25s \| 2.81s \|


	## Usage

	Use with Jina AI Finetuner

	```python
	!pip install finetuner
	import finetuner

	model = finetuner.build_model('jinaai/jina-embedding-t-en-v1')
	embeddings = finetuner.encode(
	model=model,
	data=['how is the weather today', 'What is the current weather like today?']
	)
	print(finetuner.cos_sim(embeddings[0], embeddings[1]))
	```

	Use with sentence-transformers:

	```python
	from sentence_transformers import SentenceTransformer
	from sentence_transformers.util import cos_sim

	sentences = ['how is the weather today', 'What is the current weather like today?']

	model = SentenceTransformer('jinaai/jina-embedding-t-en-v1')
	embeddings = model.encode(sentences)
	print(cos_sim(embeddings[0], embeddings[1]))
	```

	## Fine-tuning

	Please consider [Finetuner](https://github.com/jina-ai/finetuner).

	## Plans

	1. The development of `jina-embedding-s-en-v2` is currently underway with two main objectives: improving performance and increasing the maximum sequence length.
	2. We are currently working on a bilingual embedding model that combines English and X language. The upcoming model will be called `jina-embedding-s/b/l-de-v1`.

	## Contact

	Join our [Discord community](https://discord.jina.ai) and chat with other community members about ideas.

	## Citation

	If you find Jina Embeddings useful in your research, please cite the following paper:

	``` latex
	@misc{günther2023jina,
	title={Jina Embeddings: A Novel Set of High-Performance Sentence Embedding Models},
	author={Michael Günther and Louis Milliken and Jonathan Geuter and Georgios Mastrapas and Bo Wang and Han Xiao},
	year={2023},
	eprint={2307.11224},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```