intel-optimized-model-for-embeddings-v1

This is a text embedding model model: It maps sentences & paragraphs to a 512 dimensional dense vector space and can be used for tasks like clustering or semantic search. For sample code that uses this model in a torch serve container see Intel-Optimized-Container-for-Embeddings.

Usage

Install the required packages:

pip install -U torch==2.3.1+cpu --extra-index-url https://download.pytorch.org/whl/cpu
pip install -U transformers==4.42.4 intel-extension-for-pytorch==2.3.100

Use the following example below to load the model with the transformers library, tokenize the text, run the model, and apply pooling to the output.

import torch
from transformers import AutoTokenizer, AutoModel
import intel_extension_for_pytorch as ipex

def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0]
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded,
                        1) / torch.clamp(input_mask_expanded.sum(1),
                                        min=1e-9)

# load model
tokenizer = AutoTokenizer.from_pretrained('Intel/intel-optimized-model-for-embeddings-v1')
model = AutoModel.from_pretrained('Intel/intel-optimized-model-for-embeddings-v1', 
                                   torchscript=True)
model.eval()

# do IPEX optimization
batch_size = 1
seq_length=512
vocab_size = model.config.vocab_size
sample_input = {"input_ids": torch.randint(vocab_size, size=[batch_size, seq_length]),
                "token_type_ids": torch.zeros(size=[batch_size, seq_length],
                                                dtype=torch.int),
                "attention_mask": torch.randint(1, size=[batch_size, seq_length])}
text = "This is a test."
model = ipex.optimize(model, level="O1",auto_kernel_selection=True,
                            conv_bn_folding=False, dtype=torch.bfloat16)

with torch.no_grad(), torch.cpu.amp.autocast(cache_enabled=False,
                                                dtype=torch.bfloat16):
    # Compile model
    model = torch.jit.trace(model, example_kwarg_inputs=sample_input,
                                    check_trace=False, strict=False)
    model = torch.jit.freeze(model)
    
    # Call model
    tokenized_text = tokenizer(text, padding=True, truncation=True, return_tensors='pt')
    model_output = model(**tokenized_text)
    sentence_embeddings = mean_pooling(model_output,tokenized_text['attention_mask'])
    embeddings = sentence_embeddings[0].tolist()

# Embeddings output
print(embeddings)

Model Details

Model Description

This model was fine-tuned using the sentence-transformers library based on the BERT-Medium_L-8_H-512_A-8 model using UAE-Large-V1 as a teacher.

Training Datasets

Dataset Description License
beir/dbpedia-entity DBpedia-Entity is a standard test collection for entity search over the DBpedia knowledge base. CC BY-SA 3.0 license
beir/nq To help spur development in open-domain question answering, the Natural Questions (NQ) corpus has been created, along with a challenge website based on this data. CC BY-SA 3.0 license
beir/scidocs SciDocs is a new evaluation benchmark consisting of seven document-level tasks ranging from citation prediction, to document classification and recommendation. CC-BY-SA-4.0 license
beir/trec-covid TREC-COVID followed the TREC model for building IR test collections through community evaluations of search systems. CC-BY-SA-4.0 license
beir/touche2020 Given a question on a controversial topic, retrieve relevant arguments from a focused crawl of online debate portals. CC BY 4.0 license
WikiAnswers The WikiAnswers corpus contains clusters of questions tagged by WikiAnswers users as paraphrases. MIT
Cohere/wikipedia-22-12-en-embeddings Dataset The Cohere/Wikipedia dataset is a processed version of the wikipedia-22-12 dataset. It is English only, and the articles are broken up into paragraphs. Apache 2.0
MLNI GLUE, the General Language Understanding Evaluation benchmark (https://gluebenchmark.com/) is a collection of resources for training, evaluating, and analyzing natural language understanding systems. MIT
Downloads last month
84
Safetensors
Model size
41.4M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.