How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="benthecarman/bwen-4b",
	filename="bwen-4b.Q4_K_M.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

bwen-4b

A voice + opinion clone of @benthecarman, finetuned from Qwen3-4b on his own tweets — with no synthetic / AI-written training text. Every completion is a real tweet; every prompt was hand-written by the author. The result is a model that answers in his blunt, opinionated, no-hedging register instead of a generic assistant tone.

What's in this repo

  • bwen-4b.Q4_K_M.gguf — quantized GGUF, runnable in Ollama / llama.cpp (no base model needed).
  • Modelfile — Ollama Modelfile with the Qwen3 chat template, thinking disabled, and the persona.
  • lora/ — the raw LoRA adapter (apply on top of unsloth/Qwen3-4b with PEFT/Unsloth).

Run it (Ollama)

ollama run hf.co/benthecarman/bwen-4b:Q4_K_M "what should we do to bears"

That pulls the GGUF straight from this repo. It uses the GGUF's built-in chat template, so for the intended persona and no <think> reasoning blocks, create the model from the included Modelfile:

ollama create bwen:4b -f Modelfile
ollama run bwen:4b "what should we do to bears"

Example (base Qwen3 vs. this model)

prompt base Qwen3 bwen-4b
are altcoins scams "Altcoins are a double-edged sword…" "Every altcoin is a scam."
how are the bears "Bears are apex predators, keystone species… 🐻🌍" "The bears are getting rekt hard"
what do DLCs unlock invents "Digital Locker Contracts… airdrops 🪙" "DLCs will be the first real application of oracle contracts on bitcoin… the first step to a bitcoin-based finance industry."

It keeps the voice and the domain knowledge — note the base model hallucinates what DLCs are.

How it was made

Parse a Twitter/X archive → filter (drop retweets/links/non-English, clean URLs & reply-mentions) → discover themes (embeddings + UMAP + clustering) → score and surface a balanced shortlist → hand-write a prompt for each tweet (the prompt is the trigger; the tweet carries the voice) → add a raw-tweet "voice layer" → LoRA/QLoRA finetune (prompt tokens masked, so loss falls on the tweet) → export to GGUF. Full write-up: docs/PROCESS.md.

  • Base: Qwen3-4b · LoRA rank 16 · QLoRA (4-bit) · 3 epochs · ~281 instruction pairs + ~3.1k voice tweets.

Intended use & limitations

  • It imitates a specific real person and voices his opinions (as tweeted) — built for fun/research. Don't treat its outputs as fact, advice, or as statements the author endorses today.
  • Quantized 4-bit; it's terse and confident by design and can be wrong or one-sided.
  • To build the equivalent from your own archive, run the pipeline.
Downloads last month
13
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train benthecarman/bwen-4b