OpenMOSS-Team/SciJudgeBench
Preview • Updated • 254 • 10
How to use OpenMOSS-Team/SciJudge-4B-2605 with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="OpenMOSS-Team/SciJudge-4B-2605")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("OpenMOSS-Team/SciJudge-4B-2605")
model = AutoModelForCausalLM.from_pretrained("OpenMOSS-Team/SciJudge-4B-2605")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use OpenMOSS-Team/SciJudge-4B-2605 with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OpenMOSS-Team/SciJudge-4B-2605"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "OpenMOSS-Team/SciJudge-4B-2605",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/OpenMOSS-Team/SciJudge-4B-2605
How to use OpenMOSS-Team/SciJudge-4B-2605 with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "OpenMOSS-Team/SciJudge-4B-2605" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "OpenMOSS-Team/SciJudge-4B-2605",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "OpenMOSS-Team/SciJudge-4B-2605" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "OpenMOSS-Team/SciJudge-4B-2605",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use OpenMOSS-Team/SciJudge-4B-2605 with Docker Model Runner:
docker model run hf.co/OpenMOSS-Team/SciJudge-4B-2605
SciJudge-4B-2605 is a Qwen3-4B-Instruct-2507 model fine-tuned for scientific paper evaluation. Given two papers' titles, abstracts, and publication dates, it predicts which paper has higher citation impact.
This release is part of AI Can Learn Scientific Taste. The companion larger model is SciJudge-30B-2605, and the benchmark dataset is SciJudgeBench.
Resources: Project page and GitHub repository.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "OpenMOSS-Team/SciJudge-4B-2605"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful assistant. You first think about the reasoning process in your mind and then provide the user with the answer."},
{"role": "user", "content": "Today is 2025-12-10. Based on the titles, abstracts, and publication dates of the following two papers A and B, determine which paper has a higher citation count.\nShow your reasoning process in <reason> </reason> tags. And return the final answer in <answer> </answer> tags. The final answer should contain only 'A' or 'B'.\n\nPaper A:\nTitle: ...\nAbstract: ...\nDate: ...\n\nPaper B:\nTitle: ...\nAbstract: ...\nDate: ..."}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.8, top_k=20)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True)
print(response)
Accuracy on the SciJudgeBench test split, the 1,000-example MAIN_1000 in-domain evaluation set:
| Model | CS | Math | Physics | Others | Avg. |
|---|---|---|---|---|---|
| Qwen3-4B-Instruct-2507 | 58.06 | 71.08 | 51.76 | 55.77 | 58.1 |
| SciJudge-4B-2605 | 78.63 | 82.84 | 74.12 | 75.48 | 77.3 |
@misc{tong2026ailearnscientifictaste,
title={AI Can Learn Scientific Taste},
author={Jingqi Tong and Mingzhe Li and Hangcheng Li and Yongzhuo Yang and Yurong Mou and Weijie Ma and Zhiheng Xi and Hongji Chen and Xiaoran Liu and Qinyuan Cheng and Ming Zhang and Qiguang Chen and Weifeng Ge and Qipeng Guo and Tianlei Ying and Tianxiang Sun and Yining Zheng and Xinchi Chen and Jun Zhao and Ning Ding and Xuanjing Huang and Yugang Jiang and Xipeng Qiu},
year={2026},
eprint={2603.14473},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.14473},
}