---
license: apache-2.0
library_name: vllm
pipeline_tag: text-generation
---

# CTRL: Critic Training via Reinforcement Learning
CTRL-32B is a critic LLM finetuned from [Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct).

- **Project Page:** https://critic-rl.github.io/
- **Paper:** https://arxiv.org/abs/2502.03492
- **Code:** https://github.com/HKUNLP/critic-rl

## Quickstart
We recommend using [vLLM](https://docs.vllm.ai/en/latest/getting_started/quickstart.html) for inference:
```python
from vllm import LLM, SamplingParams

def format_prompt_for_ctrl(problem, answer):
    """Given a question-answer pair, we ask the model to generate a critique."""
    return f"""You are tasked with analyzing an answer to a problem and providing constructive feedback. Do NOT provide direct solutions.

Problem description:
<problem>
{problem}
</problem>

Answer:
<answer>
{answer}
</answer>

Structure your response using the following format (without <format> tags):
<format>
Analysis:
{{Analysis}}

Improvement suggestions:
{{Suggestions}}

Overall judgment: {{Correct/Incorrect}}
</format>"""

# Sample prompts.
problem = """Write a python function to check whether every odd index contains odd numbers of a given list."""
answer = """```python
def odd_length_sum(arr):
    n = len(arr)
    res = 0

    # Iterate through each element in the array
    for i in range(n):
        # Calculate the number of subarrays in which arr[i] is present
        count = ((i + 1) * (n - i) + 1) // 2

        # If the count is odd, add the element to the result
        if count % 2 == 1:
            res += arr[i]

    return res
```"""
prompts = [
    format_prompt_for_ctrl(problem, answer),
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.7, top_p=0.8, repetition_penalty=1.05, max_tokens=1024)

# Create an LLM.
llm = LLM(model="Zhihui/CTRL-32B", tensor_parallel_size=2)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
```

## Citation

```bibtex
@article{xie2025teaching,
  title={Teaching Language Models to Critique via Reinforcement Learning},
  author={Xie, Zhihui and Chen, Liyu and Mao, Weichao and Xu, Jingjing and Kong, Lingpeng and others},
  journal={arXiv preprint arXiv:2502.03492},
  year={2025}
}
```