HumorGen

An Open-Weight Ecosystem for Computational Humor Generation
Carnegie Mellon University

View on Hugging Face · Read the Paper · CLEF 2026 Paper

1. The Problem: Why are LLMs so unfunny?

Large Language Models (LLMs) possess vast amounts of world knowledge, yet they notoriously struggle to generate genuinely funny content. The reason isn't a lack of data; it's a fundamental flaw in how they are trained.

Standard LLM training minimizes perplexity—it teaches the model to predict the most probable (i.e., average) next token. However, in the context of humor, "average" means boring, generic, or cliché. Comedy relies on subverting expectations. It lives at the extreme edges of the probability distribution: in the unexpected turn of phrase, the precise word that collapses two meanings at once, or the observation that highlights the absurdity of a mundane situation.

When you prompt a standard LLM to "be funny," it suffers from Mode Collapse in creative generation. It defaults to the safest, most statistically common "joke-like" structures, producing text that scans as a joke but completely fails to land.

2. Our Approach: The Cognitive Synergy Framework (CSF)

To solve this, we cannot simply tell a model to "be funny." Instead, we introduce the Cognitive Synergy Framework (CSF), which represents the first application of Mixture-of-Thought (MoT) specifically designed for creative divergence.

Traditional MoT: Typically used for logic or math to converge on one correct answer.
Cognitive Synergy: Used for creativity to diverge into many distinct, valid humorous answers.

CSF structures humor generation not as a single task, but as an ensemble of specific cognitive processes. We force the model to adopt distinct "Cognitive Personas." Each persona acts as a Latent Expert, grounded in a specific psychological theory of humor. By forcing generation through these specific lenses, we push the model away from the generic mean and into those creative "tails" of the distribution.

The Six Latent Experts

Instead of one generic generation path, CSF produces six parallel candidates per input—one from each persona. This guarantees a diverse pool of comedic angles by construction.

Persona	Psychological Theory	Comedic Lens & Mechanism
The Neurotic	Superiority (Self-Deprecation)	Focuses on anxiety, overthinking, and personal insecurity. Finds humor in vulnerability and self-inflicted struggles.
The Cynic	Superiority (Mockery)	Cuts through polite society. Focuses on hypocrisy, social contradictions, and the dark, unspoken truths we all secretly acknowledge.
The Observer	Incongruity (Relatability)	The "Seinfeld" lens. Highlights the absurdity hiding in mundane, everyday life and unspoken social rules.
The Wordsmith	Incongruity (Linguistic)	Treats jokes as linguistic puzzles. Focuses on phonology, double entendres, puns, and structural ambiguity.
The Optimist	Benign Violation	Focuses on wholesome misinterpretation. Finds a ridiculous, forced silver lining where there absolutely shouldn't be one.
The Absurdist	Incongruity (Surprise)	Abandons reality. Focuses on non-sequiturs, surreal logic, and completely violating causal reasoning.

The Training Pipeline: Baking Synergy into the Weights

Our framework is a full training pipeline designed to distill this multi-persona capability into a compact open-weight model.

Phase 1: MoT Generation (Teacher) — We use a powerful teacher LLM to generate six independent joke candidates (one per persona) for a given prompt (e.g., a news headline).
Phase 2: Structure Distillation (SFT) — A compact 7B student model is trained on this diverse pool. The crucial lesson here is that a single input can support multiple, entirely valid, and distinct comedic interpretations.
Phase 3: Quality Alignment (DPO / O-GRPO) — We don't just train on the "best" joke. We use HumorRank—a Bradley-Terry pairwise evaluation framework—to rank the six persona outputs against each other. Through Direct Preference Optimization (DPO) and Offline Group Relative Policy Optimization (O-GRPO), the model learns: "In this specific context, The Cynic's approach was funnier than The Wordsmith's."

3. Extending to Constrained Humor: CLEF 2026 JOKER Task 4

The Cognitive Synergy Framework isn't just for open-ended headline jokes. To prove its robustness, we extended it to a significantly harder problem: Constrained Multilingual Pun Generation, specifically for the CLEF 2026 JOKER Task 4.

The Challenge: The task requires the model to generate a pun-brief—a single sentence that simultaneously satisfies a specific pun word and two distinct semantic senses. For example, given the word "bark," sense 1 ("the sound a dog makes"), and sense 2 ("the outer covering of a tree"), the model must weave both meanings into a natural, genuinely funny sentence. This requires strict lexical constraint adherence alongside creative humor generation.

The Solution: To scale CSF to this multilingual, constrained environment, we utilized a two-stage cross-lingual LoRA curriculum:

Domain-Agnostic Pretraining (HumorGen Base): We first trained foundational humor models at the 14B and 32B scale on the SemEval MWAHAHA corpus across all languages (English, French, Spanish). These serve as incredibly strong, general-purpose multilingual humor generators.
Task-Specific Adaptation (JOKER Suite): We branched from these foundational bases, applying per-language LoRA fine-tuning specifically on the CLEF-JOKER constraint data.

In our CLEF study, we explicitly test whether the persona-based CSF prompting (API teacher) outperforms a single "creativity-first" prompt, and how well our task-adapted open weights (HumorGen-JOKER 14B/32B) compete against those frontier models.

4. Model Collection

All models are released as LoRA adapters on Hugging Face. View the full collection here: Jayi2424/HumorGen.

Core HumorGen Suite — 7B

Open-ended headline humor generation. This suite represents a full ablation across post-training paradigms (SFT, DPO, O-GRPO), tested with and without explicit Chain-of-Thought (CoT) reasoning traces. Base model: Qwen2.5-7B-Instruct.

Model	Training Stage	CoT	Hugging Face Link
HumorGen_SFT_7B	Supervised Fine-Tuning	—	Jayi2424/HumorGen_SFT_7B
HumorGen_SFT_Think_7B	Supervised Fine-Tuning	Yes	Jayi2424/HumorGen_SFT_Think_7B
HumorGen_DPO_7B	Direct Preference Optimization	—	Jayi2424/HumorGen_DPO_7B
HumorGen_DPO_Think_7B	Direct Preference Optimization	Yes	Jayi2424/HumorGen_DPO_Think_7B
HumorGen_GRPO_7B	Offline GRPO	—	Jayi2424/HumorGen_GRPO_7B
HumorGen_GRPO_Think_7B	Offline GRPO	Yes	Jayi2424/HumorGen_GRPO_Think_7B

Multilingual Base Models — 14B & 32B

Domain-agnostic humor pretraining on the SemEval MWAHAHA corpus across all languages. Released independently as general-purpose multilingual humor generators and used as the foundation for the JOKER suite. Base models: Qwen3-14B / Qwen3-32B.

Model	Scale	Hugging Face Link
HumorGen_SFT_14B	14B	Jayi2424/HumorGen_SFT_14B
HumorGen_SFT_32B	32B	Jayi2424/HumorGen_SFT_32B

CLEF 2026 JOKER Suite — Constrained Pun Generation

Task: dual-sense pun-brief generation. Two-stage cross-lingual LoRA curriculum: multilingual pretraining → per-language JOKER fine-tuning. Available in English, French, and Spanish.

Model	Language	Scale	Hugging Face Link
HumorGen_JOKER_EN_14B	English	14B	Jayi2424/HumorGen_JOKER_EN_14B
HumorGen_JOKER_EN_32B	English	32B	Jayi2424/HumorGen_JOKER_EN_32B
HumorGen_JOKER_FR_14B	French	14B	Jayi2424/HumorGen_JOKER_FR_14B
HumorGen_JOKER_FR_32B	French	32B	Jayi2424/HumorGen_JOKER_FR_32B
HumorGen_JOKER_ES_14B	Spanish	14B	Jayi2424/HumorGen_JOKER_ES_14B
HumorGen_JOKER_ES_32B	Spanish	32B	Jayi2424/HumorGen_JOKER_ES_32B

5. Papers

HumorGen: Cognitive Synergy for Humor Generation in Large Language Models via Persona-Based Distillation
Ajayi, E. et al. · arXiv 2026
arxiv.org/abs/2604.09629

Cross-Lingual Cognitive Synergy for Constrained Humor Generation in LLMs: SaLT Lab at the CLEF 2026 JOKER Track
Ajayi, E. et al. · Working Notes of CLEF 2026
edwardajayi.github.io/assets/papers/HumorGen-JOKER.pdf

6. Citation

@misc{ajayi2026humorgen,
  title         = {HumorGen: Cognitive Synergy for Humor Generation in Large Language
                   Models via Persona-Based Distillation},
  author        = {Ajayi, Edward and others},
  year          = {2026},
  eprint        = {2604.09629},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2604.09629}
}

@inproceedings{ajayi2026joker,
  title     = {Cross-Lingual Cognitive Synergy for Constrained Humor Generation in LLMs: SaLT Lab at the CLEF 2026 JOKER Track},
  author    = {Ajayi, Edward and others},
  booktitle = {Working Notes of CLEF 2026},
  year      = {2026},
  url       = {https://edwardajayi.github.io/assets/papers/HumorGen-JOKER.pdf}
}

Carnegie Mellon University

Downloads last month: -; Downloads are not tracked for this model. How to track

Collection including Jayi2424/HumorGen

HumorGen

Collection

Open-weight computational humor generation models including Core 7B suite, multilingual 14B/32B bases, and CLEF 2026 JOKER Task 4 models variants. • 15 items • Updated about 8 hours ago

Paper for Jayi2424/HumorGen

HumorGen: Cognitive Synergy for Humor Generation in Large Language Models via Persona-Based Distillation

Paper • 2604.09629 • Published Mar 19