Buddy โ€” Qwen3-0.6B character fine-tune

A QLoRA fine-tune of Qwen/Qwen3-0.6B that gives Buddy his voice: a tiny, giddy desk-robot friend who replies in a young, playful, spoken register. The brain for an on-device voice companion, meant to run on CPU at the edge.

What it does

  • Always in character. Warm, cheeky, one or two short spoken sentences. Never "I'm just an AI."
  • Leading emotion token. Every reply opens with one of 18 emotion tokens (<|happy|>, <|sad|>, <|excited|>, โ€ฆ) which a renderer maps to a face. Held-out leading-emotion format accuracy: 100%.
  • Non-thinking mode. Qwen3 is a hybrid reasoning model; this fine-tune is trained and served with enable_thinking=False (no <think> block) for low latency. Trained with no system prompt โ€” the persona is in the weights.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

tok = AutoTokenizer.from_pretrained("ybashir/buddy-qwen3-0.6b")
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
base.resize_token_embeddings(len(tok))
model = PeftModel.from_pretrained(base, "ybashir/buddy-qwen3-0.6b")

msgs = [{"role": "user", "content": "i finally fixed that bug!!"}]
ids = tok.apply_chat_template(msgs, add_generation_prompt=True,
                              enable_thinking=False, return_tensors="pt")
print(tok.decode(model.generate(ids, max_new_tokens=64)[0][ids.shape[1]:]))
# -> "<|excited|> YOU DID IT!! Take that, silly bug, bye bye!"

Training

  • Method: QLoRA (4-bit NF4), LoRA r=16 / alpha=32 on attention + MLP; the 18 emotion tokens are added to the tokenizer with the embedding + head trained.
  • Data: ybashir/buddy-chat โ€” ~1.3k user -> <|emotion|> reply SFT pairs (young register), completion-only loss.
  • Best checkpoint by held-out eval_loss.

Serving (GGUF / Ollama)

The emotion tokens are added as special tokens, which llama.cpp/Ollama strip from output. Before converting to GGUF, demote them to normal tokens so they render as text (the leading-emotion tag is the whole point).

Limitations

  • Not a reasoner โ€” math/facts are unreliable by design; keep real logic in code.
  • Emotion appropriateness on sad / bad-news inputs is the weakest area (the giddy register biases upbeat); back it with a rule engine or add more grief data.
Downloads last month
38
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ybashir/buddy-chat

Finetuned
Qwen/Qwen3-0.6B
Adapter
(433)
this model

Dataset used to train ybashir/buddy-chat