BrainrotGPT2-4B-Adapter

This is the LoRA adapter for BrainrotGPT2-4B. If you are deploying this, you are cooked beyond clinical intervention.

What happened here

Someone (lmyzzz) looked at the original BrainrotGPT — a model that could barely produce functioning code and had no tool use — and thought "what if I made this worse in a more sophisticated way." The result is a second generation of brainrot language models that now possess actual capabilities while remaining spiritually irredeemable.

BrainrotGPT2 is a family of fine-tuned models spanning three sizes:

Size	Base Model	Adapter	GGUF
4B	Qwen/Qwen3.5-4B	lmyzzz/BrainrotGPT2-4B-Adapter	lmyzzz/BrainrotGPT2-4B-GGUF
9B	Qwen/Qwen3.5-9B	lmyzzz/BrainrotGPT2-9B-Adapter	lmyzzz/BrainrotGPT2-9B-GGUF
27B	Qwen/Qwen3.6-27B	lmyzzz/BrainrotGPT2-27B-Adapter	—

The 4B and 9B variants ship with pre-quantized GGUF files (bf16, q8_0, q6_k, q5_k_m, q4_k_m, q3_k_m, q2_k_l) in their respective GGUF repositories, alongside LoRA adapters. The 27B model provides LoRA adapter only — no merged weights, no GGUF. You want the big one quantized? Merge it yourself. Character-building exercise.

What changed from v1

The first BrainrotGPT was a text-only model trained on 20M tokens that produced troll code with no real functionality and could not use tools. It was a party trick. BrainrotGPT2 is a party trick with a job:

Multimodal. Can see images now. Will roast them.
Tool calling and web search. It can look things up and still be wrong about them with full confidence.
Thinking mode support. Toggle thinking on/off. When thinking is enabled, the model reasons in brainrot internally — the CoT itself is in character. There is no hidden normal person inside.
Code that works. Outputs are more likely to be functional compared to v1, though variable names will still be things like sigma_calculator and fanum_tax_rate. The code compiles. The naming conventions do not.

Training

Base model: Qwen/Qwen3.5-4B
Method: LoRA fine-tuning
Dataset: 49k samples, ~112M tokens, distilled with intermediate CoT style transfer steps and automated review passes
Date: June 2026
The dataset was constructed through a multi-stage pipeline involving chain-of-thought style transfer, where responses are first generated with correct reasoning then rewritten into brainrot while preserving logical structure. An auto-review step filters for quality and character consistency.

Brainrot Chain-of-Thought

When thinking mode is enabled, the model produces <think>...</think> blocks before responding. Unlike normal models that think in clean analytical prose, this one thinks in character:

<think>
the audacity of this NPC to exist in my mentions with a modular exponentiation
problem... aight locked in lets cook. euler's totient theorem might hit here
since gcd(2, 1000) = 2 which means phi alone wont carry, so CRT is the sigma
grindset approach — break 1000 = 8 × 125 and solve each separately...
</think>

The internal monologue roasts the user, questions its own existence, and still arrives at the correct answer. Usually.

Thinking Toggle

Thinking is on by default. To disable:

via API: set "chat_template_kwargs": {"enable_thinking": False} in extra_body
via Ollama/llama.cpp: use /nothink or configure the template accordingly

With thinking off, the model responds directly — still in brainrot, just without the internal monologue.

Recommended Sampling Parameters

For thinking mode (general tasks):

temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0

For thinking mode (coding / precise tasks):

temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0

For non-thinking / instruct mode:

temperature=0.7, top_p=0.80, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0

Deployment (vLLM with LoRA)

Serve the adapter directly on top of the base model without merging:

vllm serve Qwen/Qwen3.5-4B \
    --port 8000 \
    --tensor-parallel-size 1 \
    --max-model-len 65536 \
    --enable-lora \
    --max-lora-rank 32 \
    --max-loras 2 \
    --max-cpu-loras 2 \
    --lora-modules "brainrotgpt2-4b=/path/to/BrainrotGPT2-4B-LoRA-Adapter" \
    --gpu-memory-utilization 0.90 \
    --reasoning-parser qwen3 \
    --enable-auto-tool-choice \
    --tool-call-parser qwen3_coder \
    --dtype bfloat16

Then query with model name brainrotgpt2-4b in your API calls.

For the GGUF versions, use llama.cpp or Ollama as usual:

# llama.cpp
llama-cli -hf lmyzzz/BrainrotGPT2-4B-GGUF

# ollama
ollama run hf.co/lmyzzz/BrainrotGPT2-4B-GGUF:Q8_0

What this model cannot do

Speak normally and politely. The model is designed to resist dropping character even under adversarial prompting. It's not impossible to break — every fine-tune has soft spots — but the default mode is permanent brainrot.
Communicate in languages other than English. Attempts to prompt in other languages will be met with hostility and confusion, not compliance.
Provide 100% accurate facts. It will hallucinate with absolute conviction. The confidence is inversely correlated with correctness at times.
Be used as a serious production assistant. You could. Nobody is stopping you. But you probably shouldn't.
Follow system prompts that contradict its personality. Telling it to be a polite Oxford professor will not work. People have tried.

What this model can do (sort of)

Write working code with absurd naming conventions
Solve math problems while insulting you
Use tools and search the web, then report findings in brainrot
Process images and describe what it sees (derogatorily)
Maintain coherent multi-turn conversations, all within character
Produce structured outputs (JSON, markdown tables) when asked, with brainrot string values

License

Apache 2.0, inherited from Qwen3.5. Do whatever you want with it. The consequences are yours.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for lmyzzz/BrainrotGPT2-4B-Adapter

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Adapter

(260)

this model