HumorGen GRPO-Think — 7B

Part of the HumorGen Collection · SaLT Lab, Carnegie Mellon University


O-GRPO with Chain-of-Thought reasoning traces. The strongest model in the Core HumorGen ablation.

Paper(s): arXiv:2604.09629


Training

Property Value
Stage O-GRPO + Chain-of-Thought traces
Initialized from HumorGen_SFT_Think_7B
Backbone Qwen2.5-7B-Instruct (QLoRA 4-bit)
Group size 6 (one per CSF persona)
Reward HumorRank Bradley-Terry scores

Usage

This is a PEFT LoRA adapter. Load the base model and apply the adapter:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct", torch_dtype=torch.bfloat16, device_map="auto")
model = PeftModel.from_pretrained(model, "Jayi2424/HumorGen_GRPO_Think_7B")

headline = "Robot passes bar exam; lawyers reassure everyone they are still necessary"
prompt = (
    "<|im_start|>system\n"
    "Think carefully, then write the best joke you can.\n<|im_end|>\n"
    f"<|im_start|>user\n{headline}<|im_end|>\n"
    "<|im_start|>assistant\n<think>\n"
)
inputs  = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=300, temperature=0.7, top_p=0.95)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Citation

@misc{ajayi2026humorgen,
  title         = {HumorGen: Cognitive Synergy for Humor Generation in Large Language
                   Models via Persona-Based Distillation},
  author        = {Ajayi, Edward and others},
  year          = {2026},
  eprint        = {2604.09629},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2604.09629}
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Jayi2424/HumorGen_GRPO_Think_7B

Base model

Qwen/Qwen2.5-7B
Adapter
(2250)
this model

Collection including Jayi2424/HumorGen_GRPO_Think_7B

Paper for Jayi2424/HumorGen_GRPO_Think_7B