HumorGen

An Open-Weight Ecosystem for Computational Humor Generation
Carnegie Mellon University

View on Hugging Face · Read the Paper · CLEF 2026 Paper


1. The Problem: Why are LLMs so unfunny?

Large Language Models (LLMs) possess vast amounts of world knowledge, yet they notoriously struggle to generate genuinely funny content. The reason isn't a lack of data; it's a fundamental flaw in how they are trained.

Standard LLM training minimizes perplexity—it teaches the model to predict the most probable (i.e., average) next token. However, in the context of humor, "average" means boring, generic, or cliché. Comedy relies on subverting expectations. It lives at the extreme edges of the probability distribution: in the unexpected turn of phrase, the precise word that collapses two meanings at once, or the observation that highlights the absurdity of a mundane situation.

When you prompt a standard LLM to "be funny," it suffers from Mode Collapse in creative generation. It defaults to the safest, most statistically common "joke-like" structures, producing text that scans as a joke but completely fails to land.


2. Our Approach: The Cognitive Synergy Framework (CSF)

To solve this, we cannot simply tell a model to "be funny." Instead, we introduce the Cognitive Synergy Framework (CSF), which represents the first application of Mixture-of-Thought (MoT) specifically designed for creative divergence.

  • Traditional MoT: Typically used for logic or math to converge on one correct answer.
  • Cognitive Synergy: Used for creativity to diverge into many distinct, valid humorous answers.

CSF structures humor generation not as a single task, but as an ensemble of specific cognitive processes. We force the model to adopt distinct "Cognitive Personas." Each persona acts as a Latent Expert, grounded in a specific psychological theory of humor. By forcing generation through these specific lenses, we push the model away from the generic mean and into those creative "tails" of the distribution.

The Six Latent Experts

Instead of one generic generation path, CSF produces six parallel candidates per input—one from each persona. This guarantees a diverse pool of comedic angles by construction.

Persona Psychological Theory Comedic Lens & Mechanism
The Neurotic Superiority (Self-Deprecation) Focuses on anxiety, overthinking, and personal insecurity. Finds humor in vulnerability and self-inflicted struggles.
The Cynic Superiority (Mockery) Cuts through polite society. Focuses on hypocrisy, social contradictions, and the dark, unspoken truths we all secretly acknowledge.
The Observer Incongruity (Relatability) The "Seinfeld" lens. Highlights the absurdity hiding in mundane, everyday life and unspoken social rules.
The Wordsmith Incongruity (Linguistic) Treats jokes as linguistic puzzles. Focuses on phonology, double entendres, puns, and structural ambiguity.
The Optimist Benign Violation Focuses on wholesome misinterpretation. Finds a ridiculous, forced silver lining where there absolutely shouldn't be one.
The Absurdist Incongruity (Surprise) Abandons reality. Focuses on non-sequiturs, surreal logic, and completely violating causal reasoning.

The Training Pipeline: Baking Synergy into the Weights

Our framework is a full training pipeline designed to distill this multi-persona capability into a compact open-weight model.

  1. Phase 1: MoT Generation (Teacher) — We use a powerful teacher LLM to generate six independent joke candidates (one per persona) for a given prompt (e.g., a news headline).
  2. Phase 2: Structure Distillation (SFT) — A compact 7B student model is trained on this diverse pool. The crucial lesson here is that a single input can support multiple, entirely valid, and distinct comedic interpretations.
  3. Phase 3: Quality Alignment (DPO / O-GRPO) — We don't just train on the "best" joke. We use HumorRank—a Bradley-Terry pairwise evaluation framework—to rank the six persona outputs against each other. Through Direct Preference Optimization (DPO) and Offline Group Relative Policy Optimization (O-GRPO), the model learns: "In this specific context, The Cynic's approach was funnier than The Wordsmith's."

3. Extending to Constrained Humor: CLEF 2026 JOKER Task 4

The Cognitive Synergy Framework isn't just for open-ended headline jokes. To prove its robustness, we extended it to a significantly harder problem: Constrained Multilingual Pun Generation, specifically for the CLEF 2026 JOKER Task 4.

The Challenge: The task requires the model to generate a pun-brief—a single sentence that simultaneously satisfies a specific pun word and two distinct semantic senses. For example, given the word "bark," sense 1 ("the sound a dog makes"), and sense 2 ("the outer covering of a tree"), the model must weave both meanings into a natural, genuinely funny sentence. This requires strict lexical constraint adherence alongside creative humor generation.

The Solution: To scale CSF to this multilingual, constrained environment, we utilized a two-stage cross-lingual LoRA curriculum:

  1. Domain-Agnostic Pretraining (HumorGen Base): We first trained foundational humor models at the 14B and 32B scale on the SemEval MWAHAHA corpus across all languages (English, French, Spanish). These serve as incredibly strong, general-purpose multilingual humor generators.
  2. Task-Specific Adaptation (JOKER Suite): We branched from these foundational bases, applying per-language LoRA fine-tuning specifically on the CLEF-JOKER constraint data.

In our CLEF study, we explicitly test whether the persona-based CSF prompting (API teacher) outperforms a single "creativity-first" prompt, and how well our task-adapted open weights (HumorGen-JOKER 14B/32B) compete against those frontier models.


4. Model Collection

All models are released as LoRA adapters on Hugging Face. View the full collection here: Jayi2424/HumorGen.

Core HumorGen Suite — 7B

Open-ended headline humor generation. This suite represents a full ablation across post-training paradigms (SFT, DPO, O-GRPO), tested with and without explicit Chain-of-Thought (CoT) reasoning traces. Base model: Qwen2.5-7B-Instruct.

Model Training Stage CoT Hugging Face Link
HumorGen_SFT_7B Supervised Fine-Tuning Jayi2424/HumorGen_SFT_7B
HumorGen_SFT_Think_7B Supervised Fine-Tuning Yes Jayi2424/HumorGen_SFT_Think_7B
HumorGen_DPO_7B Direct Preference Optimization Jayi2424/HumorGen_DPO_7B
HumorGen_DPO_Think_7B Direct Preference Optimization Yes Jayi2424/HumorGen_DPO_Think_7B
HumorGen_GRPO_7B Offline GRPO Jayi2424/HumorGen_GRPO_7B
HumorGen_GRPO_Think_7B Offline GRPO Yes Jayi2424/HumorGen_GRPO_Think_7B

Multilingual Base Models — 14B & 32B

Domain-agnostic humor pretraining on the SemEval MWAHAHA corpus across all languages. Released independently as general-purpose multilingual humor generators and used as the foundation for the JOKER suite. Base models: Qwen3-14B / Qwen3-32B.

Model Scale Hugging Face Link
HumorGen_SFT_14B 14B Jayi2424/HumorGen_SFT_14B
HumorGen_SFT_32B 32B Jayi2424/HumorGen_SFT_32B

CLEF 2026 JOKER Suite — Constrained Pun Generation

Task: dual-sense pun-brief generation. Two-stage cross-lingual LoRA curriculum: multilingual pretraining → per-language JOKER fine-tuning. Available in English, French, and Spanish.

Model Language Scale Hugging Face Link
HumorGen_JOKER_EN_14B English 14B Jayi2424/HumorGen_JOKER_EN_14B
HumorGen_JOKER_EN_32B English 32B Jayi2424/HumorGen_JOKER_EN_32B
HumorGen_JOKER_FR_14B French 14B Jayi2424/HumorGen_JOKER_FR_14B
HumorGen_JOKER_FR_32B French 32B Jayi2424/HumorGen_JOKER_FR_32B
HumorGen_JOKER_ES_14B Spanish 14B Jayi2424/HumorGen_JOKER_ES_14B
HumorGen_JOKER_ES_32B Spanish 32B Jayi2424/HumorGen_JOKER_ES_32B

5. Papers

HumorGen: Cognitive Synergy for Humor Generation in Large Language Models via Persona-Based Distillation
Ajayi, E. et al. · arXiv 2026
arxiv.org/abs/2604.09629

Cross-Lingual Cognitive Synergy for Constrained Humor Generation in LLMs: SaLT Lab at the CLEF 2026 JOKER Track
Ajayi, E. et al. · Working Notes of CLEF 2026
edwardajayi.github.io/assets/papers/HumorGen-JOKER.pdf


6. Citation

@misc{ajayi2026humorgen,
  title         = {HumorGen: Cognitive Synergy for Humor Generation in Large Language
                   Models via Persona-Based Distillation},
  author        = {Ajayi, Edward and others},
  year          = {2026},
  eprint        = {2604.09629},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL},
  url           = {https://arxiv.org/abs/2604.09629}
}

@inproceedings{ajayi2026joker,
  title     = {Cross-Lingual Cognitive Synergy for Constrained Humor Generation in LLMs: SaLT Lab at the CLEF 2026 JOKER Track},
  author    = {Ajayi, Edward and others},
  booktitle = {Working Notes of CLEF 2026},
  year      = {2026},
  url       = {https://edwardajayi.github.io/assets/papers/HumorGen-JOKER.pdf}
}

Carnegie Mellon University

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Jayi2424/HumorGen

Paper for Jayi2424/HumorGen