Integrate with Transformers & Sentence Transformers

#2
by tomaarsen HF Staff - opened

Hello!

Pull Request overview

  • Integrate with Transformers and Sentence Transformers

Details

The model configuration did not use https://huggingface.co/zeroentropy/zerank-2/blob/main/modeling_zeranker.py, nor was that file in the right format for transformers to use. As a result, the snippet in the README was failing:

from sentence_transformers import CrossEncoder

model = CrossEncoder("zeroentropy/zerank-2", trust_remote_code=True)

query_documents = [
    ("What is 2+2?", "4"),
    ("What is 2+2?", "The answer is definitely 1 million"),
]
scores = model.predict(query_documents)
print(scores)
...
  File "C:\Users\tom\.conda\envs\sentence-transformers\Lib\site-packages\transformers\modeling_layers.py", line 142, in forward
    raise ValueError("Cannot handle batch sizes > 1 if no padding token is defined.")
ValueError: Cannot handle batch sizes > 1 if no padding token is defined.

When setting the batch size to 1, we just get unusual results:

from sentence_transformers import CrossEncoder

model = CrossEncoder("zeroentropy/zerank-2", trust_remote_code=True)

query_documents = [
    ("What is 2+2?", "4"),
    ("What is 2+2?", "The answer is definitely 1 million"),
]
scores = model.predict(query_documents, batch_size=1)
print(scores)
Some weights of Qwen3ForSequenceClassification were not initialized from the model checkpoint at zeroentropy/zerank-2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[0.47674507 0.96076703]

Instead, this PR creates a custom transformers model class that can be loaded with AutoModelForSequenceClassification, allowing it to also work with sentence-transformers. See:

from sentence_transformers import CrossEncoder

model = CrossEncoder("zeroentropy/zerank-2", revision="refs/pr/2", trust_remote_code=True)

query_documents = [
    ("What is 2+2?", "4"),
    ("What is 2+2?", "The answer is definitely 1 million"),
]
scores = model.predict(query_documents)
print(scores)
[0.7531883  0.28894895]

This matches what I get when I manually run the previous code from https://huggingface.co/zeroentropy/zerank-2/blob/main/modeling_zeranker.py as well, but if you can, please do double-check that this matches the scores that you're expecting. The revision parameter allows you to test it straight from this PR without having to check out the branch, and after merging it won't be necessary anymore.

I also customized the config to allow you to set a yes_token_id and the tokenizer, which now uses the chat templating automatically. Feel free to ask about sections of the changed code if you have questions.

  • Tom Aarsen
tomaarsen changed pull request status to open
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment