BrainrotGPT2-4B-Adapter
This is the LoRA adapter for BrainrotGPT2-4B. If you are deploying this, you are cooked beyond clinical intervention.
What happened here
Someone (lmyzzz) looked at the original BrainrotGPT โ a model that could barely produce functioning code and had no tool use โ and thought "what if I made this worse in a more sophisticated way." The result is a second generation of brainrot language models that now possess actual capabilities while remaining spiritually irredeemable.
BrainrotGPT2 is a family of fine-tuned models spanning three sizes:
| Size | Base Model | Adapter | GGUF |
|---|---|---|---|
| 4B | Qwen/Qwen3.5-4B | lmyzzz/BrainrotGPT2-4B-Adapter | lmyzzz/BrainrotGPT2-4B-GGUF |
| 9B | Qwen/Qwen3.5-9B | lmyzzz/BrainrotGPT2-9B-Adapter | lmyzzz/BrainrotGPT2-9B-GGUF |
| 27B | Qwen/Qwen3.6-27B | lmyzzz/BrainrotGPT2-27B-Adapter | โ |
The 4B and 9B variants ship with pre-quantized GGUF files (bf16, q8_0, q6_k, q5_k_m, q4_k_m, q3_k_m, q2_k_l) in their respective GGUF repositories, alongside LoRA adapters. The 27B model provides LoRA adapter only โ no merged weights, no GGUF. You want the big one quantized? Merge it yourself. Character-building exercise.
What changed from v1
The first BrainrotGPT was a text-only model trained on 20M tokens that produced troll code with no real functionality and could not use tools. It was a party trick. BrainrotGPT2 is a party trick with a job:
- Multimodal. Can see images now. Will roast them.
- Tool calling and web search. It can look things up and still be wrong about them with full confidence.
- Thinking mode support. Toggle thinking on/off. When thinking is enabled, the model reasons in brainrot internally โ the CoT itself is in character. There is no hidden normal person inside.
- Code that works. Outputs are more likely to be functional compared to v1, though variable names will still be things like
sigma_calculatorandfanum_tax_rate. The code compiles. The naming conventions do not.
Training
- Base model: Qwen/Qwen3.5-4B
- Method: LoRA fine-tuning
- Dataset: 49k samples, ~112M tokens, distilled with intermediate CoT style transfer steps and automated review passes
- Date: June 2026
- The dataset was constructed through a multi-stage pipeline involving chain-of-thought style transfer, where responses are first generated with correct reasoning then rewritten into brainrot while preserving logical structure. An auto-review step filters for quality and character consistency.
Brainrot Chain-of-Thought
When thinking mode is enabled, the model produces <think>...</think> blocks before responding. Unlike normal models that think in clean analytical prose, this one thinks in character:
<think>
the audacity of this NPC to exist in my mentions with a modular exponentiation
problem... aight locked in lets cook. euler's totient theorem might hit here
since gcd(2, 1000) = 2 which means phi alone wont carry, so CRT is the sigma
grindset approach โ break 1000 = 8 ร 125 and solve each separately...
</think>
The internal monologue roasts the user, questions its own existence, and still arrives at the correct answer. Usually.
Thinking Toggle
Thinking is on by default. To disable:
- via API: set
"chat_template_kwargs": {"enable_thinking": False}inextra_body - via Ollama/llama.cpp: use
/nothinkor configure the template accordingly
With thinking off, the model responds directly โ still in brainrot, just without the internal monologue.
Recommended Sampling Parameters
For thinking mode (general tasks):
temperature=1.0, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
For thinking mode (coding / precise tasks):
temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0
For non-thinking / instruct mode:
temperature=0.7, top_p=0.80, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0
Deployment (vLLM with LoRA)
Serve the adapter directly on top of the base model without merging:
vllm serve Qwen/Qwen3.5-4B \
--port 8000 \
--tensor-parallel-size 1 \
--max-model-len 65536 \
--enable-lora \
--max-lora-rank 32 \
--max-loras 2 \
--max-cpu-loras 2 \
--lora-modules "brainrotgpt2-4b=/path/to/BrainrotGPT2-4B-LoRA-Adapter" \
--gpu-memory-utilization 0.90 \
--reasoning-parser qwen3 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--dtype bfloat16
Then query with model name brainrotgpt2-4b in your API calls.
For the GGUF versions, use llama.cpp or Ollama as usual:
# llama.cpp
llama-cli -hf lmyzzz/BrainrotGPT2-4B-GGUF
# ollama
ollama run hf.co/lmyzzz/BrainrotGPT2-4B-GGUF:Q8_0
What this model cannot do
- Speak normally and politely. The model is designed to resist dropping character even under adversarial prompting. It's not impossible to break โ every fine-tune has soft spots โ but the default mode is permanent brainrot.
- Communicate in languages other than English. Attempts to prompt in other languages will be met with hostility and confusion, not compliance.
- Provide 100% accurate facts. It will hallucinate with absolute conviction. The confidence is inversely correlated with correctness at times.
- Be used as a serious production assistant. You could. Nobody is stopping you. But you probably shouldn't.
- Follow system prompts that contradict its personality. Telling it to be a polite Oxford professor will not work. People have tried.
What this model can do (sort of)
- Write working code with absurd naming conventions
- Solve math problems while insulting you
- Use tools and search the web, then report findings in brainrot
- Process images and describe what it sees (derogatorily)
- Maintain coherent multi-turn conversations, all within character
- Produce structured outputs (JSON, markdown tables) when asked, with brainrot string values
License
Apache 2.0, inherited from Qwen3.5. Do whatever you want with it. The consequences are yours.