Instructions to use ICT-TIME-and-Querit/ICT-TIME-and-Querit-embedding-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ICT-TIME-and-Querit/ICT-TIME-and-Querit-embedding-v1 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ICT-TIME-and-Querit/ICT-TIME-and-Querit-embedding-v1") model = AutoModelForCausalLM.from_pretrained("ICT-TIME-and-Querit/ICT-TIME-and-Querit-embedding-v1") - Notebooks
- Google Colab
- Kaggle
Model Overview
Model Type: Text Embedding
Number of Parameters: 4B
Context Length: 32k
Adapted from Qwen/Qwen3-4B
Pooling: Last token
Robust Training for General Text Embeddings via Bagging-Based Model Merging
π arXiv Paper | π€ Model Trained on English data | π€ Model Trained on General data | π€ Model Trained on Two-stage Incremental Learning | π οΈ Github |
General-purpose text embedding models underpin a wide range of NLP and information retrieval applications, and are typically trained on large-scale multi-task corpora to encourage broad generalization. Current batch-level shuffling for multi-task text embedding exhibits two practical limitations: suboptimal out-of-domain (OOD) generalization and poor suitability for incremental learning due to expensive full retraining. To address these issues, we propose Bagging-based rObust mOdel Merging (BOOM), which trains multiple embedding models on sampled subsets and merges them into a single model, improving robustness while retaining single-model inference efficiency. Moreover, BOOM naturally supports efficient incremental updates by training lightweight update models on new data with a small historical subset and merging them into the existing model. Experiments across diverse embedding benchmarks demonstrate that BOOM consistently improves both in-domain and OOD performance over full-corpus batch-level shuffling, while substantially reducing training cost in incremental learning settings.
Training data:
First Stage: General-Text-Data:
- Retrieval: ELI5, HotpotQA, FEVER, MSMARCO, passage and document ranking, NQ, NLI, SQuAD, TriviaQA, and FiQA.
- Reranking: StackOverFlowDupQuestions.
- Classification: AmazonReviews-Classification, Banking77Classification, Emotion-Classification, MTOPIntentClassification, IMDB-Classification, ToxicConversationsClassification, TweetSentimentExtraction-Classification, AmazonCounterfactual-Classification.
- Clustering: Arxiv/Biorxiv/Medrxiv/Reddit/StackExchangeClustering-S2S/P2P, TwentyNewsgroups-Clustering.
- SemanticTextSimilarity(STS):STS12,STS22,STSBenchmark.
- DuReader,MIRACL,Mr.TyDi,andT2-Ranking
- Cornstack: JavaScript, Java, Python,PHP,and Ruby, sampled 500K
About 2.8M data.
Second Stage:
- Sampled 40% General-Text-Data
- Code retrieval training data: apps, codefeedback-mt, codefeedback-st, CodeSearchNet-ccr_go, CodeSearchNet-ccr_javascript, CodeSearchNet-ccr_java, CodeSearchNet-ccr_php, CodeSearchNet-ccr_python, CodeSearchNet-ccr_ruby, CodeSearchNet_go, CodeSearchNet_javascript, CodeSearchNet_java, CodeSearchNet_php, CodeSearchNet_python, CodeSearchNet_ruby, codetrans-contest, codetrans-dl, cosqa, stackoverflow-qa, synthetic-text2sql
- FreedomIntelligence__Huatuo26M-Lite, infgrad__retrieval_data_llm, marco_chinese, msmarco-en2zh_sub_mixneg, msmarco-zh2en_sub_mixneg arguana, quora, scidocsrr, About 4.3M data.
Models Merged
The following models were included in the merge to produce the ICT-TIME-and-Querit-embedding-v1:
models:
- model: First_stage(BOOM_4B_v1)
parameters:
weight: 2.8
- model: Second stage model
parameters:
weight: 4.3
merge_method: multislerp
dtype: float32
This model was merged using the Multi-SLERP merge method.
Performance
| Benchmark | Version | Score |
|---|---|---|
| MTEB (English, v2) | General | 70.12 |
| HUME(v1) | General | 79.25 |
| MTEB (Code) | Code | 78.02 |
| MTEB (Medical) | Medical | 64.21 |
| MTEB (Law) | Law | 62.23 |
| LongEmbed | Long text | 78.04 |
| RTEB | Multi-domain | 67.72 |
| ChemTEB | Chemical | 75.18 |
| MTEB(Multilingual, v2) | General | 63.97 |
Usage
Sentence Transformers Usage
# Requires transformers>=4.51.0
# Requires sentence-transformers>=2.7.0
from sentence_transformers import SentenceTransformer
# Load the model
model = SentenceTransformer("ICT-TIME-and-Querit/ICT-TIME-and-Querit-embedding-v1")
# We recommend enabling flash_attention_2 for better acceleration and memory saving,
# together with setting `padding_side` to "left":
# model = SentenceTransformer(
# "ICT-TIME-and-Querit/ICT-TIME-and-Querit-embedding-v1",
# model_kwargs={"attn_implementation": "flash_attention_2", "device_map": "auto"},
# tokenizer_kwargs={"padding_side": "left"},
# )
# The queries and documents to embed
queries = [
"What is the capital of China?",
"Explain gravity",
]
documents = [
"The capital of China is Beijing.",
"Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
]
# Encode the queries and documents. Note that queries benefit from using a prompt
# Here we use the prompt called "query" stored under `model.prompts`, but you can
# also pass your own prompt via the `prompt` argument
query_embeddings = model.encode(queries, prompt_name="query")
document_embeddings = model.encode(documents)
# Compute the (cosine) similarity between the query and document embeddings
similarity = model.similarity(query_embeddings, document_embeddings)
print(similarity)
# tensor([[0.4739, 0.0365],
[0.0895, 0.4089]])
Transformers Usage
# Requires transformers>=4.51.0
import torch
import torch.nn.functional as F
from torch import Tensor
from transformers import AutoTokenizer, AutoModel
def last_token_pool(last_hidden_states: Tensor,
attention_mask: Tensor) -> Tensor:
left_padding = (attention_mask[:, -1].sum() == attention_mask.shape[0])
if left_padding:
return last_hidden_states[:, -1]
else:
sequence_lengths = attention_mask.sum(dim=1) - 1
batch_size = last_hidden_states.shape[0]
return last_hidden_states[torch.arange(batch_size, device=last_hidden_states.device), sequence_lengths]
def get_detailed_instruct(task_description: str, query: str) -> str:
return f'Instruct: {task_description}\nQuery:{query}'
# Each query must come with a one-sentence instruction that describes the task
task = 'Given a web search query, retrieve relevant passages that answer the query'
queries = [
get_detailed_instruct(task, 'What is the capital of China?'),
get_detailed_instruct(task, 'Explain gravity')
]
# No need to add instruction for retrieval documents
documents = [
"The capital of China is Beijing.",
"Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun."
]
input_texts = queries + documents
tokenizer = AutoTokenizer.from_pretrained("ICT-TIME-and-Querit/ICT-TIME-and-Querit-embedding-v1", padding_side='left')
model = AutoModel.from_pretrained("ICT-TIME-and-Querit/ICT-TIME-and-Querit-embedding-v1")
# We recommend enabling flash_attention_2 for better acceleration and memory saving.
# model = AutoModel.from_pretrained("ICT-TIME-and-Querit/BOOM_4B_v1", attn_implementation="flash_attention_2", torch_dtype=torch.float16).cuda()
max_length = 8192
# Tokenize the input texts
batch_dict = tokenizer(
input_texts,
padding=True,
truncation=True,
max_length=max_length,
return_tensors="pt",
)
batch_dict.to(model.device)
outputs = model(**batch_dict)
embeddings = last_token_pool(outputs.last_hidden_state, batch_dict['attention_mask'])
# normalize embeddings
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:2] @ embeddings[2:].T)
print(scores.tolist())
# [0.4739307463169098, 0.036478668451309204], [0.08952689170837402, 0.4088672697544098]]
Citation
If you find our work helpful, feel free to give us a cite.
@article{zhang2026bagging,
title={Bagging-Based Model Merging for Robust General Text Embeddings},
author={Zhang, Hengran and Bi, Keping and Guo, Jiafeng and Zhang, Jiaming and Yang, Wenbo and Shi, Daiting and Cheng, Xueqi},
journal={arXiv preprint arXiv:2602.05787},
year={2026}
}
- Downloads last month
- 125