IROH: Jokes on Gemma4-31B - Humor Retrieval Judge

CLEF 2026 · JOKER Track · Task 1 English · Team VANGUARD

Ana-Maria Luisa Mocanu · Sebastian Mocanu · Ciprian-Octavian Truică · Elena-Simona Apostol

Paper arXiv GitHub License


Model Description

A QLoRA-finetuned gemma-4-31b-it, trained as Stage 3 LLM judges in the IROH humor retrieval pipeline. Given a query describing a humor topic and a candidate text, each model returns a soft YES/NO probability indicating whether the candidate is a relevant joke, pun, or wordplay.

Trained on generic rationales - one-sentence explanations of why a text is or is not a joke, generated by Gemma 4 using a lightweight "General Wordplay" query placeholder. The simplicity of this prompt produces more consistent supervision than the structured typed alternative. Serve as complementary correctors to the primary Qwen judge.


Models

Adapter folder Base model LoRA r Training data Ensemble weight MAP (standalone)
adapter_model.safetensors gemma-4-31b-it 32 Generic rationales, no aug 0.30 0.5718

Usage

from transformers import AutoTokenizer, AutoModelForImageTextToText, BitsAndBytesConfig
from peft import PeftModel
import torch

base_model_id = "google/gemma-4-31b-it"
adapter_id = "DS4AI-UPB/jokes-on-gemma4-31b"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(
    AutoModelForImageTextToText.from_pretrained(
        base_model_id,
        quantization_config=bnb_config,
        device_map="auto",
    ),
    adapter_id,
)
model.eval()

SYSTEM = (
    "You are a humor and wordplay detection judge. You evaluate whether a text is relevant to a "
    "query AND contains humor, jokes, puns, wordplay, or any form of linguistic wit (double "
    "meanings, homophones, malapropisms, ironic twists). Answer only YES or NO."
)

def score(query: str, text: str) -> float:
    messages = [
        {"role": "system", "content": SYSTEM},
        {"role": "user", "content": f'Query: "{query}"\nText: "{text}"\nIs this a relevant joke? Answer YES or NO.'},
    ]
    tokenized = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True, return_tensors="pt", return_dict=True
    )
    ids = tokenized["input_ids"].to(model.device)
    with torch.no_grad():
        logits = model(ids).logits[:, -1, :]
    yes_id = tokenizer.convert_tokens_to_ids("YES")
    no_id  = tokenizer.convert_tokens_to_ids("NO")
    return torch.softmax(torch.stack([logits[0, yes_id], logits[0, no_id]]), dim=0)[0].item()

Requirements

pip install -U transformers peft bitsandbytes accelerate

Requires transformers >= 5.5.0, peft >= 0.14, bitsandbytes >= 0.43. Requires a CUDA GPU with ~30GB VRAM for 4-bit quantization (e.g. A100 on Colab Pro).


Training Data

Query-document pairs from the JOKER 2025 and 2026 Task 1 corpora, deduplicated across editions and balanced between joke and non-joke examples. Each pair is annotated with a one-sentence rationale generated by Gemma 4 (gemma4:e4b via Ollama). Rationale generation scripts are available in the code repository.


Intended Use

  • Intended: Stage 3 LLM judges in a multi-stage humor retrieval pipeline, used together in a weighted ensemble alongside jokes-on-qwen2.5-7b.
  • Out of scope: General-purpose text classification; production deployment without safety validation; languages other than English.

Limitations

  • English only - training data, prompts, and taxonomy are English-specific.
  • Binary YES/NO framing - may be poorly calibrated on borderline cases; graded relevance training is a promising future direction.
  • Optimized for short jokes, puns, and wordplay in the JOKER corpus.

Citation

@InProceedings{Mocanu2026IROH,
    author    = {Mocanu, Ana-Maria Luisa and Mocanu, Sebastian and Truică, Ciprian-Octavian and Apostol, Elena-Simona},
    title     = {IROH: Insightful Ranking Of Humor using Multi-Stage Hybrid Retrieval with Rationale-Distilled LLM Judges for JOKER 2026 Track Task 1 English},
    booktitle = {Working Notes of CLEF 2026},
    month     = {September},
    year      = {2026}
}

Links

Resource Link
Paper WIP — will be updated when proceedings are published
arXiv WIP
Code GitHub — DS4AI-UPB/VANGUARD-CLEF2026-JOKER
Primary judge jokes-on-qwen2.5-7b
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including DS4AI-UPB/jokes-on-gemma4-31b

Evaluation results