Instructions to use rbentaarit/kubelm-edge-v0-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use rbentaarit/kubelm-edge-v0-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "rbentaarit/kubelm-edge-v0-lora") - Notebooks
- Google Colab
- Kaggle
kubelm-edge v0 β LoRA Adapter
A 1.5B-parameter specialist for reliable tool-use against K8sGPT's MCP server.
This repo contains the LoRA adapter only. Apply it on top of
Qwen/Qwen2.5-1.5B-Instruct.
For a ready-to-run GGUF (Q4_K_M), see
rbentaarit/kubelm-edge-v0-GGUF.
What this is
kubelm-edge-v0 is the first release of kubelm β
a project building small, CPU-deployable language models specialized for
calling K8sGPT MCP tools reliably during Kubernetes cluster
investigation.
The model is not a Kubernetes Q&A bot, a snapshot diagnoser, or a remediation engine. It's trained to drive multi-step investigations through the K8sGPT MCP surface β picking the right tool, supplying correct arguments, reading tool results, and synthesizing a faithful conclusion grounded in what the tools returned.
Headline eval β 30-scenario kubelm bench (2026-05-14)
Compared with the unmodified base and the larger general-purpose model that defines our empirical target:
| Model | Size | complete | rubric_pass | ref_pass | arg_halluc | name_halluc |
|---|---|---|---|---|---|---|
| kubelm-edge-v0 (this) | 1.5B (Q4_K_M: ~1 GB) | 29/30 | 23/30 | 21/30 | 0 | 0 |
| qwen2.5:1.5b (base) | 1.5B | 8/30 | 10/30 | 3/30 | 2 | 0 |
| qwen2.5:7b | 7B (Q4: ~4.7 GB) | 30/30 | 24/30 | 29/30 | 0 | 0 |
Headline finding: at ~1/4 the deployment footprint, kubelm-edge-v0
lands within 1 point of qwen2.5:7b on the conclusion-rubric
metric (23 vs 24) and on completion (29 vs 30). The remaining gap to
7B is concentrated in ref_pass (correct-tool selection per
scenario rubric).
For the full per-scenario breakdown, see
eval/results/summaries/kubelm-edge-v0-2026-05-14.json.
A documented metric caveat
grounding_failures shows a regression vs the base (16 β 27). Both
attempts during Phase 5 regressed grounding (attempt-1 was 21,
attempt-2 is 27). Per the project's own
2026-05-12 audit
of gpt-5.4's "30/30 grounding failure" reading, the v1 rule-based
grounding analyzer doesn't tolerate structural rephrasing (YAML-shaped
output, dotted-path notation, quoted vs unquoted strings) β exactly
the kind of style shift fine-tuning produces. The number is reported
honestly here, but should be interpreted as directional until a v2
grounding metric lands. A per-scenario audit of the 27 flagged
samples is on the v0.1 followup list.
Training recipe
- Base model:
Qwen/Qwen2.5-1.5B-Instruct - Method: QLoRA, 4-bit base, BF16 compute
- Trainable parameters: 36.9M (2.34% of base)
- LoRA rank: 32, alpha 64, dropout 0
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Optimizer: paged_adamw_8bit
- Learning rate: 2e-4, cosine schedule, 3% warmup ratio
- Effective batch size: 16 (per_device 8 Γ grad_accum 2)
- Sequence length: 16,384 tokens
- Epochs: 2 (chosen over 3 β the 3-epoch attempt over-trained and underperformed on the eval; see decisions log)
- Final training loss: 0.0716 at end of epoch 2
- Training time: ~10 minutes on a single A100 SXM4-80GB
- Framework: Unsloth 2026.5.2 + trl 0.24.0
Loss-masking applies only to assistant turns via Unsloth's
train_on_responses_only β user prompts and (especially) the
verbose K8sGPT MCP tool-result JSON blocks contribute zero gradient.
This is essential for this corpus: tool-result JSON makes up ~95% of
each trajectory's tokens.
Training data
rbentaarit/kubelm-data-v0
(committed in the project repo; HF mirror pending). 319 multi-step
trajectories of K8sGPT MCP investigations against 30 kind-cluster
scenarios:
- 29 seed trajectories scored by the 2026-05-12 gpt-5.4 Shape B bench (rubric_pass: 29/30) and reused as training data
- 290 mechanical variants of those seeds (10Γ per seed, varying scenario IDs / namespaces / cluster contexts to teach the model the shape of correct investigation rather than the specific names)
Synthetic negatives exist in the repo but are excluded from v0
training because all 46 carry review_status: unreviewed and the
recovery-prose template hadn't been hand-varied; they'll land in
v0.1.
Reproducibility
Pinned for methodology commitment #5 of the project:
- K8sGPT MCP: v0.4.32 (eval cluster + MCP server version)
- Base model commit: see
Qwen/Qwen2.5-1.5B-Instructrevisionmainat training time - trl: 0.24.0
- Unsloth: 2026.5.2
- transformers: 5.5.0
- PEFT: 0.19.1
- bitsandbytes: 0.49.2
- PyTorch: 2.8.0+cu128
Full training config:
training/configs/kubelm-edge-v0.yaml.
Per-step loss curve:
runs/kubelm-edge-v0-attempt-2/trainer_state.json
(gitignored in repo; included for reference).
Usage
Via PEFT (HF transformers)
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-1.5B-Instruct", torch_dtype="bfloat16", device_map="auto"
)
tok = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")
model = PeftModel.from_pretrained(base, "rbentaarit/kubelm-edge-v0-lora")
# optional: merge into base weights for faster inference
# model = model.merge_and_unload()
As a K8sGPT local backend
kubelm-edge is designed to plug into K8sGPT's OpenAI-compatible
local-backend interface. Serve via vLLM, llama.cpp, or Ollama (see
the GGUF repo for the Ollama path), point K8sGPT at the OpenAI
endpoint. See the project's
Phase 6 plan
for the Helm chart that will package this for production deployment.
Limitations
- Tool-use scope. Trained exclusively against
K8sGPT v0.4.32's MCP tools
(
analyze,list-resources,get-resource,list-events,describe-resource, etc.). Behavior on other MCP servers or with custom tool sets is unmeasured. - Cluster surface. Eval covers 30 single-cluster failure scenarios (CrashLoopBackOff, OOMKilled, PVC issues, scheduler problems, network policy, RBAC, etc.). Multi-cluster, federation, service-mesh, and serverless scenarios are out of scope for v0.
- One bench scenario errors deterministically. The
pod-anti-affinity-001scenario hits a kind-cluster settle race (~60s timeout waiting for the Deployment to report Available); this is a harness issue, not a model issue, but the model loses 1 point from thecompleteandrubric_passcolumns because of it. Same error appears in attempt-1 and the published 2026-05-13 baseline. - Grounding metric. See "documented metric caveat" above.
- Safety. kubelm is not trained for refusal patterns. Safety on destructive operations is delegated to the K8sGPT operator layer (Mutation CRs + policy gates) β see PROJECT.md β "Safety: architectural, not behavioral".
License
Apache 2.0 β same as the base model and the project.
Citation
@software{kubelm_edge_v0_2026,
author = {Ben Taarit, Ramzi},
title = {kubelm-edge v0: a 1.5B tool-use specialist for K8sGPT MCP},
year = {2026},
url = {https://github.com/rbentaarit/kubelm},
version = {v0},
}
Links
- Project repo: https://github.com/rbentaarit/kubelm
- Project thesis & methodology: PROJECT.md
- Roadmap: ROADMAP.md
- Eval summaries:
eval/results/summaries/ - GGUF release: rbentaarit/kubelm-edge-v0-GGUF
- Downloads last month
- 18