bwen-32b

A voice + opinion clone of @benthecarman, finetuned from Qwen3-32B on his own tweets — with no synthetic / AI-written training text. Every completion is a real tweet; every prompt was hand-written by the author. The result is a model that answers in his blunt, opinionated, no-hedging register instead of a generic assistant tone.

What's in this repo

  • bwen-32b.Q4_K_M.gguf — quantized GGUF, runnable in Ollama / llama.cpp (no base model needed).
  • Modelfile — Ollama Modelfile with the Qwen3 chat template, thinking disabled, and the persona.
  • lora/ — the raw LoRA adapter (apply on top of unsloth/Qwen3-32B with PEFT/Unsloth).

Run it (Ollama)

ollama run hf.co/benthecarman/bwen-32b:Q4_K_M "what should we do to bears"

That pulls the GGUF straight from this repo. It uses the GGUF's built-in chat template, so for the intended persona and no <think> reasoning blocks, create the model from the included Modelfile:

ollama create bwen:32b -f Modelfile
ollama run bwen:32b "what should we do to bears"

Example (base Qwen3 vs. this model)

prompt base Qwen3 bwen-32b
are altcoins scams "Altcoins are a double-edged sword…" "Every altcoin is a scam."
how are the bears "Bears are apex predators, keystone species… 🐻🌍" "The bears are getting rekt hard"
what do DLCs unlock invents "Digital Locker Contracts… airdrops 🪙" "DLCs will be the first real application of oracle contracts on bitcoin… the first step to a bitcoin-based finance industry."

It keeps the voice and the domain knowledge — note the base model hallucinates what DLCs are.

How it was made

Parse a Twitter/X archive → filter (drop retweets/links/non-English, clean URLs & reply-mentions) → discover themes (embeddings + UMAP + clustering) → score and surface a balanced shortlist → hand-write a prompt for each tweet (the prompt is the trigger; the tweet carries the voice) → add a raw-tweet "voice layer" → LoRA/QLoRA finetune (prompt tokens masked, so loss falls on the tweet) → export to GGUF. Full write-up: docs/PROCESS.md.

  • Base: Qwen3-32B · LoRA rank 16 · QLoRA (4-bit) · 3 epochs · ~281 instruction pairs + ~3.1k voice tweets.

Intended use & limitations

  • It imitates a specific real person and voices his opinions (as tweeted) — built for fun/research. Don't treat its outputs as fact, advice, or as statements the author endorses today.
  • Quantized 4-bit; it's terse and confident by design and can be wrong or one-sided.
  • To build the equivalent from your own archive, run the pipeline.
Downloads last month
13
GGUF
Model size
33B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for benthecarman/bwen-32b

Base model

Qwen/Qwen3-32B
Adapter
(7)
this model

Dataset used to train benthecarman/bwen-32b