--- license: apache-2.0 library_name: vllm pipeline_tag: text-generation --- # CTRL: Critic Training via Reinforcement Learning CTRL-32B is a critic LLM finetuned from [Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct). - **Project Page:** https://critic-rl.github.io/ - **Paper:** https://arxiv.org/abs/2502.03492 - **Code:** https://github.com/HKUNLP/critic-rl ## Quickstart We recommend using [vLLM](https://docs.vllm.ai/en/latest/getting_started/quickstart.html) for inference: ```python from vllm import LLM, SamplingParams def format_prompt_for_ctrl(problem, answer): """Given a question-answer pair, we ask the model to generate a critique.""" return f"""You are tasked with analyzing an answer to a problem and providing constructive feedback. Do NOT provide direct solutions. Problem description: <problem> {problem} </problem> Answer: <answer> {answer} </answer> Structure your response using the following format (without <format> tags): <format> Analysis: {{Analysis}} Improvement suggestions: {{Suggestions}} Overall judgment: {{Correct/Incorrect}} </format>""" # Sample prompts. problem = """Write a python function to check whether every odd index contains odd numbers of a given list.""" answer = """```python def odd_length_sum(arr): n = len(arr) res = 0 # Iterate through each element in the array for i in range(n): # Calculate the number of subarrays in which arr[i] is present count = ((i + 1) * (n - i) + 1) // 2 # If the count is odd, add the element to the result if count % 2 == 1: res += arr[i] return res ```""" prompts = [ format_prompt_for_ctrl(problem, answer), ] # Create a sampling params object. sampling_params = SamplingParams(temperature=0.7, top_p=0.8, repetition_penalty=1.05, max_tokens=1024) # Create an LLM. llm = LLM(model="Zhihui/CTRL-32B", tensor_parallel_size=2) # Generate texts from the prompts. The output is a list of RequestOutput objects # that contain the prompt, generated text, and other information. outputs = llm.generate(prompts, sampling_params) # Print the outputs. for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}") ``` ## Citation ```bibtex @article{xie2025teaching, title={Teaching Language Models to Critique via Reinforcement Learning}, author={Xie, Zhihui and Chen, Liyu and Mao, Weichao and Xu, Jingjing and Kong, Lingpeng and others}, journal={arXiv preprint arXiv:2502.03492}, year={2025} } ```