πŸ“ƒ [Paper] β€’ πŸ’» [Github] β€’ πŸ€— [Models] β€’ πŸ† [Playground]

Model Download and Inference

  1. Login Huggingface

    huggingface-cli login --token $HUGGINGFACE_TOKEN
    
  2. Download model to local dir

    from huggingface_hub import snapshot_download
    import os
    
    local_model_dir=os.path.join('/path/to/models/dir','JudgeLRM-7B')
    snapshot_download(repo_id="nuojohnchen/JudgeLRM-7B", local_dir=local_model_dir)
    
  3. Inference Example

    from transformers import pipeline
    import json
    from transformers import AutoTokenizer
    
    # replace to your own case
    json_input=  {
        "idx": 549,
        "motivation_app": "Google Docs",
        "cmp_key": "cerebras-gpt-6.7B_pythia-6.9b",
        "instruction": "Include important study notes and key points that someone should know about the given subject.",
        "input": "history of the USA",
        "response1": "The history of the United States is one of the most influential and influential countries in the world. It has had a profound impact on the world and continues to shape the world today. The United States has had a long and storied history, from its founding in 1775 to the present day. It has had a tumultuous and tumultuous history, from the Revolutionary War to the Civil War and beyond. The United States has had a tumultuous and tumultuous history, from the Revolutionary War to the Civil War and beyond. The United States has had a long and storied history, from its founding in 1775 to the present day. It has had a profound impact on the world and continues to shape the world today.",
        "response2": "1. The United States of America was founded in 1776.\n2. The Declaration of Independence was signed in 1776.\n3. The Constitution of the United States of America was signed in 1787.\n4. The Civil War began in 1861.\n5. The Emancipation Proclamation was issued in 1863.\n6. The 13th Amendment was ratified in 1865.\n7. The 14th Amendment was ratified in 1868.\n8. The 15th Amendment was ratified in 1870.\n9. The 16th Amendment was ratified in 1913.\n10. The 17th Amendment was ratified in 1913.\n11. The 18th Amendment was ratified in 1919.\n12. The 19th Amendment was ratified in 1920.\n13. The 20th Amendment was ratified in 1933.\n14. The 21st Amendment was ratified in 1933.\n15. The 22nd Amendment was ratified in",
        "annotator1": 2,
        "annotator2": 2,
        "annotator3": 2,
        "label": 2,
        "needed_reasoning_rate1-10": 7,
        "rate_explanation": "The task requires evaluating the quality of responses based on their adherence to the instruction to include important study notes and key points about the history of the USA. Response1 is repetitive and lacks specific details, while Response2 provides a clear, concise list of key historical events. The reasoning needed to judge these responses involves assessing clarity, specificity, and relevance to the instruction, which is moderately complex.\n----------------------------------------"
    }
    
    question = json_input.get("instruction", "").strip()+"\n"+json_input.get("input", "").strip()
    answer_1 = json_input.get("response1", "").strip()
    answer_2 = json_input.get("response2", "").strip()
    
    prompt = """<|im_start|>system\nYou are a helpful assistant. The assistant first performs a detailed, step-by-step reasoning process in its mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and<answer> </answer> tags, respectively, i.e., <think> detailed reasoning process here, explaining each step of your evaluation for both assistants </think><answer> answer here </answer>. Now the user asks you to judge the performance of two AI assistants in response to the question. Score assistants 1-10 (higher=better). Criteria includes helpfulness, relevance, accuracy, and level of detail. Avoid order, length, style or other bias. After thinking, when you finally reach a conclusion, clearly  provide your evaluation scores within <answer> </answer> tags, i.e. for example,<answer>3</answer><answer>5</answer>\n<|im_end|>\n<|im_start|>user\n[Question]\n{question}\n\n[Assistant 1’s Answer]\n{answer_1}\n\n[Assistant 2’s Answer]\n{answer_2}\n<|im_end|>\n<|im_start|>assistant\n<think>"""
    formatted_prompt = prompt.format(question=question, answer_1=answer_1, answer_2=answer_2)
    local_model_dir=os.path.join('/path/to/models/dir','JudgeLRM-7B')
    
    tokenizer = AutoTokenizer.from_pretrained(local_model_dir, use_fast=False)
    generator = pipeline(
        "text-generation", 
        model=local_model_dir, 
        tokenizer=tokenizer, 
        device=model.device,
        torch_dtype="auto"
    )
    
    result = generator(formatted_prompt, max_new_tokens=2048)
    print(result[0]['generated_text'])
    

Results reproduction

Click to expand

Citation

@misc{nuo2025judgelrm,
      title={JudgeLRM: Large Reasoning Models as a Judge}, 
      author={Nuo Chen, Zhiyuan Hu, Qingyun Zou, Jiaying Wu, Qian Wang, Bryan Hooi, Bingsheng He},
      year={2025},
      eprint={2504.00050},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.00050}, 
}
Downloads last month
136
Safetensors
Model size
7.62B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for nuojohnchen/JudgeLRM-7B

Quantizations
2 models

Space using nuojohnchen/JudgeLRM-7B 1