Instructions to use DS4AI-UPB/jokes-on-gemma4-31b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use DS4AI-UPB/jokes-on-gemma4-31b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("google/gemma-4-31B-it") model = PeftModel.from_pretrained(base_model, "DS4AI-UPB/jokes-on-gemma4-31b") - Notebooks
- Google Colab
- Kaggle
IROH: Jokes on Gemma4-31B - Humor Retrieval Judge
CLEF 2026 · JOKER Track · Task 1 English · Team VANGUARD
Ana-Maria Luisa Mocanu · Sebastian Mocanu · Ciprian-Octavian Truică · Elena-Simona Apostol
Model Description
A QLoRA-finetuned gemma-4-31b-it, trained as Stage 3 LLM judges in the IROH humor retrieval pipeline. Given a query describing a humor topic and a candidate text, each model returns a soft YES/NO probability indicating whether the candidate is a relevant joke, pun, or wordplay.
Trained on generic rationales - one-sentence explanations of why a text is or is not a joke, generated by Gemma 4 using a lightweight "General Wordplay" query placeholder. The simplicity of this prompt produces more consistent supervision than the structured typed alternative.
Serve as complementary correctors to the primary Qwen judge.
Models
| Adapter folder | Base model | LoRA r | Training data | Ensemble weight | MAP (standalone) |
|---|---|---|---|---|---|
| adapter_model.safetensors | gemma-4-31b-it | 32 | Generic rationales, no aug | 0.30 | 0.5718 |
Usage
from transformers import AutoTokenizer, AutoModelForImageTextToText, BitsAndBytesConfig
from peft import PeftModel
import torch
base_model_id = "google/gemma-4-31b-it"
adapter_id = "DS4AI-UPB/jokes-on-gemma4-31b"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(
AutoModelForImageTextToText.from_pretrained(
base_model_id,
quantization_config=bnb_config,
device_map="auto",
),
adapter_id,
)
model.eval()
SYSTEM = (
"You are a humor and wordplay detection judge. You evaluate whether a text is relevant to a "
"query AND contains humor, jokes, puns, wordplay, or any form of linguistic wit (double "
"meanings, homophones, malapropisms, ironic twists). Answer only YES or NO."
)
def score(query: str, text: str) -> float:
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": f'Query: "{query}"\nText: "{text}"\nIs this a relevant joke? Answer YES or NO.'},
]
tokenized = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_tensors="pt", return_dict=True
)
ids = tokenized["input_ids"].to(model.device)
with torch.no_grad():
logits = model(ids).logits[:, -1, :]
yes_id = tokenizer.convert_tokens_to_ids("YES")
no_id = tokenizer.convert_tokens_to_ids("NO")
return torch.softmax(torch.stack([logits[0, yes_id], logits[0, no_id]]), dim=0)[0].item()
Requirements
pip install -U transformers peft bitsandbytes accelerate
Requires
transformers >= 5.5.0,peft >= 0.14,bitsandbytes >= 0.43. Requires a CUDA GPU with ~30GB VRAM for 4-bit quantization (e.g. A100 on Colab Pro).
Training Data
Query-document pairs from the JOKER 2025 and 2026 Task 1 corpora, deduplicated across editions and balanced between joke and non-joke examples. Each pair is annotated with a one-sentence rationale generated by Gemma 4 (gemma4:e4b via Ollama). Rationale generation scripts are available in the code repository.
Intended Use
- Intended: Stage 3 LLM judges in a multi-stage humor retrieval pipeline, used together in a weighted ensemble alongside jokes-on-qwen2.5-7b.
- Out of scope: General-purpose text classification; production deployment without safety validation; languages other than English.
Limitations
- English only - training data, prompts, and taxonomy are English-specific.
- Binary YES/NO framing - may be poorly calibrated on borderline cases; graded relevance training is a promising future direction.
- Optimized for short jokes, puns, and wordplay in the JOKER corpus.
Citation
@InProceedings{Mocanu2026IROH,
author = {Mocanu, Ana-Maria Luisa and Mocanu, Sebastian and Truică, Ciprian-Octavian and Apostol, Elena-Simona},
title = {IROH: Insightful Ranking Of Humor using Multi-Stage Hybrid Retrieval with Rationale-Distilled LLM Judges for JOKER 2026 Track Task 1 English},
booktitle = {Working Notes of CLEF 2026},
month = {September},
year = {2026}
}
Links
| Resource | Link |
|---|---|
| Paper | WIP — will be updated when proceedings are published |
| arXiv | WIP |
| Code | GitHub — DS4AI-UPB/VANGUARD-CLEF2026-JOKER |
| Primary judge | jokes-on-qwen2.5-7b |
- Downloads last month
- -
Collection including DS4AI-UPB/jokes-on-gemma4-31b
Evaluation results
- MAP — Generic (standalone)self-reported0.572