Instructions to use ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b") model = AutoModelForCausalLM.from_pretrained("ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b") - sentence-transformers
How to use ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b with sentence-transformers:
from sentence_transformers import CrossEncoder model = CrossEncoder("ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b") query = "Which planet is known as the Red Planet?" passages = [ "Venus is often called Earth's twin because of its similar size and proximity.", "Mars, known for its reddish appearance, is often referred to as the Red Planet.", "Jupiter, the largest planet in our solar system, has a prominent red spot.", "Saturn, famous for its rings, is sometimes mistaken for the Red Planet." ] scores = model.predict([(query, passage) for passage in passages]) print(scores) - Notebooks
- Google Colab
- Kaggle
Integrate with Sentence Transformers v5.4
Hello!
Pull Request overview
- Integrate this model with Sentence Transformers v5.4+ so it can be loaded via
CrossEncoder("ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b").
Details
The integration uses the new causal-reranker pipeline in Sentence Transformers: a Transformer module with transformer_task="text-generation" followed by a LogitScore post-processing module configured with true_token_id=0, i.e. the token with ID 0 will be used as the logit. The existing raw-text prompt format from the README, "Check whether a given document contains information helpful to answer the query.\n<Document> {doc}\n<Query> {query}{instruction} ??", is reproduced via a small chat_template.jinja that receives the (query, document) pair from Sentence Transformers as messages with role="query" and role="document", plus an optional role="system" message carrying the instruction (injected when the user passes prompt=... or prompt_name=... to predict/rank).
Sentence Transformers auto-enforces left padding and logits_to_keep=1 for text-generation models, which matches the baseline's tokenizer.padding_side = "left" + last-position logit indexing exactly. The CrossEncoder's default sigmoid activation is disabled (activation_fn=torch.nn.Identity()) so the returned scores are the raw bfloat16 logits, matching the values shown in the model card.
Added files:
modules.json: wires the pipeline asTransformer->LogitScore.sentence_bert_config.json: setstransformer_task="text-generation",module_output_name="causal_logits", andmodality_configwith"format": "flat"so query/document pairs are passed straight through to the chat template.config_sentence_transformers.json: setsactivation_fntoIdentityso raw logits are returned, and leavesprompts/default_prompt_nameempty so the caller passes the instruction at inference time.1_LogitScore/config.json:true_token_id=0,false_token_id=null,module_input_name="causal_logits".chat_template.jinja: formats the query/document/instruction triple into the exact raw prompt used by the README's Transformers baseline.
Modified files:
README.md: addedsentence-transformers,cross-encoder, andrerankertags, plus a new "Using Sentence Transformers" subsection under## Quickstartwith a minimalCrossEncodersnippet. Minor Quickstart cleanup: lifted the example inputs and shared "Expected Output" block to the top as a shared reference for all three paths, dropped the now-redundant "Basic Usage" preview, and added an observed-output block to the Transformers snippet (useful for telling bf16 drift apart from a real regression).
import torch
from sentence_transformers import CrossEncoder
model = CrossEncoder("ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b", model_kwargs={"dtype": torch.bfloat16}, revision="refs/pr/2")
query = "What are the health benefits of exercise?"
instruction = "Prioritize recent medical research"
documents = [
"Regular exercise reduces risk of heart disease and improves mental health.",
"A 2024 study shows exercise enhances cognitive function in older adults.",
"Ancient Greeks valued physical fitness for military training.",
]
pairs = [(query, doc) for doc in documents]
scores = model.predict(pairs, prompt=instruction)
print(scores)
# [-0.8515625 0.50390625 -9.375 ]
rankings = model.rank(query, documents, prompt=instruction)
print(rankings)
# [{'corpus_id': 1, 'score': np.float32(0.50390625)}, {'corpus_id': 0, 'score': np.float32(-0.8515625)}, {'corpus_id': 2, 'score': np.float32(-9.375)}]
You can run this outright due to the revisionargument. After merging, the revision argument isn't needed anymore.
Note that none of the old behaviour is affected or changed: this only adds an additional way to run the model in a familiar and common format. The raw AutoModelForCausalLM and vLLM paths already documented in the README continue to work unchanged, and the Sentence Transformers path produces identical bfloat16 scores to the README's Transformers baseline on every sample tested (0.0 diff vs. the direct AutoModelForCausalLM path on 3/3 examples, with and without an instruction). It's just a lot easier to run.
Happy to tweak anything you'd like changed. Please let me know if you have any questions or feedback!
- Tom Aarsen