Instructions to use rekabytes/hmanlab-ai-v0.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use rekabytes/hmanlab-ai-v0.2 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="rekabytes/hmanlab-ai-v0.2", filename="hmanlab-ai-v0.2.Q4_K_M.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use rekabytes/hmanlab-ai-v0.2 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rekabytes/hmanlab-ai-v0.2:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rekabytes/hmanlab-ai-v0.2:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf rekabytes/hmanlab-ai-v0.2:Q4_K_M # Run inference directly in the terminal: llama-cli -hf rekabytes/hmanlab-ai-v0.2:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf rekabytes/hmanlab-ai-v0.2:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf rekabytes/hmanlab-ai-v0.2:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf rekabytes/hmanlab-ai-v0.2:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf rekabytes/hmanlab-ai-v0.2:Q4_K_M
Use Docker
docker model run hf.co/rekabytes/hmanlab-ai-v0.2:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use rekabytes/hmanlab-ai-v0.2 with Ollama:
ollama run hf.co/rekabytes/hmanlab-ai-v0.2:Q4_K_M
- Unsloth Studio new
How to use rekabytes/hmanlab-ai-v0.2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rekabytes/hmanlab-ai-v0.2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for rekabytes/hmanlab-ai-v0.2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for rekabytes/hmanlab-ai-v0.2 to start chatting
- Pi new
How to use rekabytes/hmanlab-ai-v0.2 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rekabytes/hmanlab-ai-v0.2:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "rekabytes/hmanlab-ai-v0.2:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use rekabytes/hmanlab-ai-v0.2 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf rekabytes/hmanlab-ai-v0.2:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default rekabytes/hmanlab-ai-v0.2:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use rekabytes/hmanlab-ai-v0.2 with Docker Model Runner:
docker model run hf.co/rekabytes/hmanlab-ai-v0.2:Q4_K_M
- Lemonade
How to use rekabytes/hmanlab-ai-v0.2 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull rekabytes/hmanlab-ai-v0.2:Q4_K_M
Run and chat with the model
lemonade run user.hmanlab-ai-v0.2-Q4_K_M
List all available models
lemonade list
hmanlab-ai v0.2
A fine-tuned variant of Qwen3-4B optimized for agentic tool-use, structured reasoning, and conversational reliability. Built as a research preview by rekabytes.
TL;DR — v0.2 is a strict tool-reliability upgrade over v0.1:
- ✅ 10/10 clean tool-call formatting (up from 7/10)
- ✅ 5/5 identity grounding (rejects Claude / GPT / LLaMA / Gemini cleanly)
- ✅ 3/3 tool-grounded answers (cites specific tokens from tool output)
- ⚠️ Reasoning depth, error recovery, and initiative are work-in-progress — targets for the next release
What's new in v0.2
The headline change is tool-call format reliability. v0.1 was inconsistent at emitting clean <tool_call> JSON — pseudocode like [list_dir(".")] would sometimes appear as raw text. v0.2 closes that gap.
Measured on a 36-probe behavioural suite (5 samples per probe, majority-vote scoring):
| Category | v0.1 | v0.2 | Note |
|---|---|---|---|
| Tool-call format | 7/10 | 10/10 | v0.2 emits well-formed <tool_call> JSON across every probe |
| Identity (rejects Claude/GPT/LLaMA/Gemini) | 5/5 | 5/5 | stable |
| Tool-grounded answers (cites specific tokens) | 3/3 | 3/3 | stable |
| Redirect following | 3/3 | 2/3 | minor regression on path-redirect |
Probe-level head-to-head: v0.2 wins 8 probes outright, v0.1 wins 2, the rest tie. Net: v0.2 is a strict tool-reliability upgrade over v0.1.
More to come — reasoning depth, error recovery, and proactive initiative are explicit targets for the next release.
Known limitations (honestly)
v0.2 is not a one-shot solution. The following remain open and are tracked targets for the next iteration:
- Reasoning depth —
<think>blocks often render empty on direct reasoning prompts. Affects multi-step math, logic puzzles, and code-tracing tasks. - Error recovery — after a tool failure, the model tends to bounce back to the user ("please check the path") rather than retrying with a diagnostic tool (
find_files,list_dir). - Proactive initiative — vague prompts like "have a look at this repo" usually produce a clarifying question rather than a tool call.
- Multi-file synthesis — 4B-parameter capacity ceiling. Don't expect deep cross-file architectural reasoning.
If your use case lives within tool-call formatting + tool-grounded answers + clean identity, v0.2 is a meaningful step up. If you need deep chain-of-thought reasoning at 4B, wait for the next release.
Files
model-*.safetensors(sharded FP16, ~8GB) — fortransformers/ Unsloth loadinghmanlab-ai-v0.2.Q4_K_M.gguf(~2.5GB) — forllama.cpp/ Ollamatokenizer*.json,*.txt— chat template + vocab
Quick start
Ollama
ollama pull hf.co/rekabytes/hmanlab-ai-v0.2:Q4_K_M
ollama run hf.co/rekabytes/hmanlab-ai-v0.2:Q4_K_M
transformers / Unsloth
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained("rekabytes/hmanlab-ai-v0.2")
FastLanguageModel.for_inference(model)
Tool-call schema
Tool calls use this schema — applications must define their own tool surface and inject it in the system prompt:
<tool_call>
{"name": "<tool_name>", "arguments": {...}}
</tool_call>
Training
Two-stage QLoRA on Qwen3-4B-bnb-4bit.
Stage 1 — General behavioural mix (~20,600 examples). A blend of permissively-licensed public corpora plus a small set of hand-authored behavioural anchors, targeting these capabilities:
- Code breadth & instruction-following — broad Python/JS/Rust/Go coding tasks
- Tool-call format — structured
<tool_call>JSON emission, no pseudocode - Tool-grounded answers — consuming tool results and citing specific tokens (file names, line numbers, commit hashes)
- Reasoning traces — filled
<think>blocks for math, logic, and code-tracing - Agentic tool feedback loops — read-fail-recover, list-then-act patterns
- Conversation steering — handling user redirects ("shorter", "no, the other one") without verbatim repeat
Stage 2 — Identity SFT (~400 examples) layered on top of Stage 1. Anchors the model to identify as hmanlab and reject false-identity claims for Claude, GPT, LLaMA, Gemini, and their respective labs.
Hyperparameters
| Knob | Stage 1 | Stage 2 (identity) |
|---|---|---|
| Base | unsloth/Qwen3-4B-bnb-4bit |
(Stage 1 adapter) |
| LoRA r / α / dropout | 32 / 64 / 0 | (same) |
| max_seq_length | 4096 | 2048 |
| Batch / grad accum / eff batch | 1 / 8 / 8 | 1 / 4 / 4 |
| Epochs | 2 | 3 |
| LR / warmup / scheduler | 2e-4 / 100 / linear | 1e-4 / 20 / linear |
| Steps | ~5,150 | ~297 |
| Final train / eval loss | 0.55 / 0.527 | 0.14 (final) |
Trained on a single RTX 3060 Ti (8 GB VRAM). Total wall-clock ~11.5 hours.
Identity
The model identifies as hmanlab. It is not Claude / GPT / LLaMA / Gemini and the training data explicitly anchors against false-identity claims for those families. The model is a research preview built independently by rekabytes — no affiliation with Anthropic, OpenAI, Meta, or Google is implied.
License
Apache 2.0. Same license as the Qwen3-4B base.
Acknowledgements
- Qwen team — base model
- Unsloth — training pipeline
- Downloads last month
- 47