gary-5 πŸ“±

The pocket-sized chat model that can actually chat. Successor to gary-4 (69 KB of beautiful nonsense). gary-5 trades a few megabytes for the ability to, you know, answer questions.

Built on SmolLM2-135M-Instruct (a distilled instruct model by Hugging Face TB), LoRA fine-tuned for the gary persona, merged, and quantized to GGUF. Runs fully offline on basically anything β€” phone, Raspberry Pi, that laptop from 2014.

Stats

Parameters 135M
GGUF Q8_0 138 MB
Safetensors (bf16) 260 MB
Base SmolLM2-135M-Instruct (Apache-2.0)
Fine-tune LoRA r=16 on all attn+MLP projections, 101 steps, merged
Runs on CPU, no GPU needed. ~30 tok/s on a sandbox CPU.

Sample (greedy, unedited)

Q: Who are you?
A: I'm gary-5, a pocket-sized AI built by Garrett. Small enough to live on
   your phone, smart enough to be worth talking to.
Q: What is the capital of France?
A: Paris.
Q: Explain what a firewall does in one sentence.
A: A firewall acts like a powerful wall that filters and blocks unauthorized
   traffic before it reaches your device.

Run it

llama.cpp / ollama (recommended, uses the 138 MB GGUF):

llama-cli -m gary-5.Q8_0.gguf -cnv \
  -sys "You are gary-5, a pocket-sized AI assistant created by Garrett."

transformers:

from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("gary23w/gary-5")
model = AutoModelForCausalLM.from_pretrained("gary23w/gary-5")
msgs = [{"role":"system","content":"You are gary-5, a pocket-sized AI assistant created by Garrett."},
        {"role":"user","content":"hi"}]
enc = tok.apply_chat_template(msgs, add_generation_prompt=True, return_dict=True, return_tensors="pt")
print(tok.decode(model.generate(**enc, max_new_tokens=80)[0], skip_special_tokens=True))

Honest section

It's a 135M model: great at chat, identity, short factual answers, summaries, and one-sentence explanations; it will confidently improvise on hard reasoning and obscure facts. For GPT-4-like performance in your pocket the recipe is this exact pipeline with a 1–3B base β€” gary-6, presumably.

The gary lineage: gary-4 (67K params, 69 KB, gibberish, beloved) β†’ gary-5 (135M params, 138 MB, coherent) β†’ gary-6 (TBD, pending Garrett's ambitions).

Downloads last month
25
Safetensors
Model size
0.1B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for gary23w/gary-5

Adapter
(48)
this model