Buddy — Qwen3-0.6B character fine-tune

A QLoRA fine-tune of Qwen/Qwen3-0.6B that gives Buddy his voice: a tiny, giddy desk-robot friend who replies in a young, playful, spoken register. The brain for an on-device voice companion, meant to run on CPU at the edge.

What it does

Always in character. Warm, cheeky, one or two short spoken sentences. Never "I'm just an AI."
Leading emotion token. Every reply opens with one of 18 emotion tokens (<|happy|>, <|sad|>, <|excited|>, …) which a renderer maps to a face. Held-out leading-emotion format accuracy: 100%.
Non-thinking mode. Qwen3 is a hybrid reasoning model; this fine-tune is trained and served with enable_thinking=False (no <think> block) for low latency. Trained with no system prompt — the persona is in the weights.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

tok = AutoTokenizer.from_pretrained("ybashir/buddy-qwen3-0.6b")
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
base.resize_token_embeddings(len(tok))
model = PeftModel.from_pretrained(base, "ybashir/buddy-qwen3-0.6b")

msgs = [{"role": "user", "content": "i finally fixed that bug!!"}]
ids = tok.apply_chat_template(msgs, add_generation_prompt=True,
                              enable_thinking=False, return_tensors="pt")
print(tok.decode(model.generate(ids, max_new_tokens=64)[0][ids.shape[1]:]))
# -> "<|excited|> YOU DID IT!! Take that, silly bug, bye bye!"

Training

Method: QLoRA (4-bit NF4), LoRA r=16 / alpha=32 on attention + MLP; the 18 emotion tokens are added to the tokenizer with the embedding + head trained.
Data: ybashir/buddy-chat — ~1.3k user -> <|emotion|> reply SFT pairs (young register), completion-only loss.
Best checkpoint by held-out eval_loss.

Serving (GGUF / Ollama)

The emotion tokens are added as special tokens, which llama.cpp/Ollama strip from output. Before converting to GGUF, demote them to normal tokens so they render as text (the leading-emotion tag is the whole point).

Limitations

Not a reasoner — math/facts are unreliable by design; keep real logic in code.
Emotion appropriateness on sad / bad-news inputs is the weakest area (the giddy register biases upbeat); back it with a rule engine or add more grief data.

Downloads last month: 38

Model tree for ybashir/buddy-chat

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Adapter

(433)

this model

ybashir
/

buddy-chat