Integrate with Sentence Transformers v5.4

#2
by tomaarsen HF Staff - opened

Hello!

Pull Request overview

  • Integrate this model with Sentence Transformers v5.4+ so it can be loaded via CrossEncoder("ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b").

Details

The integration uses the new causal-reranker pipeline in Sentence Transformers: a Transformer module with transformer_task="text-generation" followed by a LogitScore post-processing module configured with true_token_id=0, i.e. the token with ID 0 will be used as the logit. The existing raw-text prompt format from the README, "Check whether a given document contains information helpful to answer the query.\n<Document> {doc}\n<Query> {query}{instruction} ??", is reproduced via a small chat_template.jinja that receives the (query, document) pair from Sentence Transformers as messages with role="query" and role="document", plus an optional role="system" message carrying the instruction (injected when the user passes prompt=... or prompt_name=... to predict/rank).

Sentence Transformers auto-enforces left padding and logits_to_keep=1 for text-generation models, which matches the baseline's tokenizer.padding_side = "left" + last-position logit indexing exactly. The CrossEncoder's default sigmoid activation is disabled (activation_fn=torch.nn.Identity()) so the returned scores are the raw bfloat16 logits, matching the values shown in the model card.

Added files:

  • modules.json: wires the pipeline as Transformer -> LogitScore.
  • sentence_bert_config.json: sets transformer_task="text-generation", module_output_name="causal_logits", and modality_config with "format": "flat" so query/document pairs are passed straight through to the chat template.
  • config_sentence_transformers.json: sets activation_fn to Identity so raw logits are returned, and leaves prompts/default_prompt_name empty so the caller passes the instruction at inference time.
  • 1_LogitScore/config.json: true_token_id=0, false_token_id=null, module_input_name="causal_logits".
  • chat_template.jinja: formats the query/document/instruction triple into the exact raw prompt used by the README's Transformers baseline.

Modified files:

  • README.md: added sentence-transformers, cross-encoder, and reranker tags, plus a new "Using Sentence Transformers" subsection under ## Quickstart with a minimal CrossEncoder snippet. Minor Quickstart cleanup: lifted the example inputs and shared "Expected Output" block to the top as a shared reference for all three paths, dropped the now-redundant "Basic Usage" preview, and added an observed-output block to the Transformers snippet (useful for telling bf16 drift apart from a real regression).
import torch
from sentence_transformers import CrossEncoder

model = CrossEncoder("ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b", model_kwargs={"dtype": torch.bfloat16}, revision="refs/pr/2")

query = "What are the health benefits of exercise?"
instruction = "Prioritize recent medical research"
documents = [
    "Regular exercise reduces risk of heart disease and improves mental health.",
    "A 2024 study shows exercise enhances cognitive function in older adults.",
    "Ancient Greeks valued physical fitness for military training.",
]

pairs = [(query, doc) for doc in documents]
scores = model.predict(pairs, prompt=instruction)
print(scores)
# [-0.8515625   0.50390625 -9.375     ]

rankings = model.rank(query, documents, prompt=instruction)
print(rankings)
# [{'corpus_id': 1, 'score': np.float32(0.50390625)}, {'corpus_id': 0, 'score': np.float32(-0.8515625)}, {'corpus_id': 2, 'score': np.float32(-9.375)}]

You can run this outright due to the revisionargument. After merging, the revision argument isn't needed anymore.

Note that none of the old behaviour is affected or changed: this only adds an additional way to run the model in a familiar and common format. The raw AutoModelForCausalLM and vLLM paths already documented in the README continue to work unchanged, and the Sentence Transformers path produces identical bfloat16 scores to the README's Transformers baseline on every sample tested (0.0 diff vs. the direct AutoModelForCausalLM path on 3/3 examples, with and without an instruction). It's just a lot easier to run.

Happy to tweak anything you'd like changed. Please let me know if you have any questions or feedback!

  • Tom Aarsen
tomaarsen changed pull request status to open
sheshansh-ctx changed pull request status to merged

Sign up or log in to comment