Qwen3-8B-PragReST

minchaoh2002/Qwen3-8B-PragReST is a Qwen3-8B model trained with PragReST: Pragmatic Reasoning via Self-Training.

PragReST is a self-supervised framework for improving pragmatic language understanding. It trains models to reason about implied meaning, speaker intent, implicature, presupposition, metonymy, social context, and other cases where the intended meaning is not fully explicit in the surface text.

This model is associated with the paper:

PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding Jihyung Park*, Minchao Huang*, Leqi Liu, Elias Stengel-Eskin The University of Texas at Austin arXiv: https://arxiv.org/abs/2606.18624 Code: https://github.com/jihyung803/PragReST

Equal contribution.

Model Details

Model name: minchaoh2002/Qwen3-8B-PragReST
Base model: Qwen/Qwen3-8B
Model type: Causal language model
Training framework: PragReST
Training methods: supervised fine-tuning with counterfactual bootstrapping, followed by GRPO reinforcement learning
Primary capability: pragmatic language understanding and counterfactual pragmatic reasoning
Language: English
Paper: https://arxiv.org/abs/2606.18624
Code: https://github.com/jihyung803/PragReST

Quickstart

Install the latest transformers version. Qwen3 models require recent Transformers support.

pip install -U transformers accelerate torch

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "minchaoh2002/Qwen3-8B-PragReST"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto",
)

prompt = """Ken asks Mary, "Do you want tea with milk or sugar?"
Mary replies, "In a cup."

What is Mary likely implying? Explain the pragmatic reasoning."""

messages = [
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True,
)

inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=2048,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
    do_sample=True,
)

generated = outputs[0][len(inputs.input_ids[0]):]
print(tokenizer.decode(generated, skip_special_tokens=True))

Evaluation

In the paper, PragReST is evaluated on four pragmatic reasoning benchmarks:

PragMega: fine-grained pragmatic QA
Ludwig: implicature interpretation
MetoQA: metonymic reference resolution
AltPrag: open-ended pragmatic recovery

Reported Qwen3-8B results:

Model	PragMega	Ludwig	MetoQA	AltPrag
Qwen3-8B Instruct	73.37	80.33	73.52	7.24
PragReST-SFT	77.51	82.17	78.56	7.46
PragReST-GRPO	79.29	83.33	80.72	7.62

Citation

If you use this model, please cite:

@article{park2026pragrest,
  title={PragReST: Self-Reinforcing Counterfactual Reasoning for Pragmatic Language Understanding},
  author={Park, Jihyung and Huang, Minchao and Liu, Leqi and Stengel-Eskin, Elias},
  year={2026},
  journal={arXiv preprint arXiv:2606.18624},
  url={https://arxiv.org/abs/2606.18624},
}

You may also cite the Qwen3 technical report for the base model:

@misc{qwen3technicalreport,
  title={Qwen3 Technical Report},
  author={Qwen Team},
  year={2025},
  eprint={2505.09388},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2505.09388},
}