kubelm-edge v0 β€” LoRA Adapter

A 1.5B-parameter specialist for reliable tool-use against K8sGPT's MCP server.

This repo contains the LoRA adapter only. Apply it on top of Qwen/Qwen2.5-1.5B-Instruct. For a ready-to-run GGUF (Q4_K_M), see rbentaarit/kubelm-edge-v0-GGUF.

What this is

kubelm-edge-v0 is the first release of kubelm β€” a project building small, CPU-deployable language models specialized for calling K8sGPT MCP tools reliably during Kubernetes cluster investigation.

The model is not a Kubernetes Q&A bot, a snapshot diagnoser, or a remediation engine. It's trained to drive multi-step investigations through the K8sGPT MCP surface β€” picking the right tool, supplying correct arguments, reading tool results, and synthesizing a faithful conclusion grounded in what the tools returned.

Headline eval β€” 30-scenario kubelm bench (2026-05-14)

Compared with the unmodified base and the larger general-purpose model that defines our empirical target:

Model Size complete rubric_pass ref_pass arg_halluc name_halluc
kubelm-edge-v0 (this) 1.5B (Q4_K_M: ~1 GB) 29/30 23/30 21/30 0 0
qwen2.5:1.5b (base) 1.5B 8/30 10/30 3/30 2 0
qwen2.5:7b 7B (Q4: ~4.7 GB) 30/30 24/30 29/30 0 0

Headline finding: at ~1/4 the deployment footprint, kubelm-edge-v0 lands within 1 point of qwen2.5:7b on the conclusion-rubric metric (23 vs 24) and on completion (29 vs 30). The remaining gap to 7B is concentrated in ref_pass (correct-tool selection per scenario rubric).

For the full per-scenario breakdown, see eval/results/summaries/kubelm-edge-v0-2026-05-14.json.

A documented metric caveat

grounding_failures shows a regression vs the base (16 β†’ 27). Both attempts during Phase 5 regressed grounding (attempt-1 was 21, attempt-2 is 27). Per the project's own 2026-05-12 audit of gpt-5.4's "30/30 grounding failure" reading, the v1 rule-based grounding analyzer doesn't tolerate structural rephrasing (YAML-shaped output, dotted-path notation, quoted vs unquoted strings) β€” exactly the kind of style shift fine-tuning produces. The number is reported honestly here, but should be interpreted as directional until a v2 grounding metric lands. A per-scenario audit of the 27 flagged samples is on the v0.1 followup list.

Training recipe

  • Base model: Qwen/Qwen2.5-1.5B-Instruct
  • Method: QLoRA, 4-bit base, BF16 compute
  • Trainable parameters: 36.9M (2.34% of base)
  • LoRA rank: 32, alpha 64, dropout 0
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Optimizer: paged_adamw_8bit
  • Learning rate: 2e-4, cosine schedule, 3% warmup ratio
  • Effective batch size: 16 (per_device 8 Γ— grad_accum 2)
  • Sequence length: 16,384 tokens
  • Epochs: 2 (chosen over 3 β€” the 3-epoch attempt over-trained and underperformed on the eval; see decisions log)
  • Final training loss: 0.0716 at end of epoch 2
  • Training time: ~10 minutes on a single A100 SXM4-80GB
  • Framework: Unsloth 2026.5.2 + trl 0.24.0

Loss-masking applies only to assistant turns via Unsloth's train_on_responses_only β€” user prompts and (especially) the verbose K8sGPT MCP tool-result JSON blocks contribute zero gradient. This is essential for this corpus: tool-result JSON makes up ~95% of each trajectory's tokens.

Training data

rbentaarit/kubelm-data-v0 (committed in the project repo; HF mirror pending). 319 multi-step trajectories of K8sGPT MCP investigations against 30 kind-cluster scenarios:

  • 29 seed trajectories scored by the 2026-05-12 gpt-5.4 Shape B bench (rubric_pass: 29/30) and reused as training data
  • 290 mechanical variants of those seeds (10Γ— per seed, varying scenario IDs / namespaces / cluster contexts to teach the model the shape of correct investigation rather than the specific names)

Synthetic negatives exist in the repo but are excluded from v0 training because all 46 carry review_status: unreviewed and the recovery-prose template hadn't been hand-varied; they'll land in v0.1.

Reproducibility

Pinned for methodology commitment #5 of the project:

  • K8sGPT MCP: v0.4.32 (eval cluster + MCP server version)
  • Base model commit: see Qwen/Qwen2.5-1.5B-Instruct revision main at training time
  • trl: 0.24.0
  • Unsloth: 2026.5.2
  • transformers: 5.5.0
  • PEFT: 0.19.1
  • bitsandbytes: 0.49.2
  • PyTorch: 2.8.0+cu128

Full training config: training/configs/kubelm-edge-v0.yaml. Per-step loss curve: runs/kubelm-edge-v0-attempt-2/trainer_state.json (gitignored in repo; included for reference).

Usage

Via PEFT (HF transformers)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen2.5-1.5B-Instruct", torch_dtype="bfloat16", device_map="auto"
)
tok = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct")

model = PeftModel.from_pretrained(base, "rbentaarit/kubelm-edge-v0-lora")
# optional: merge into base weights for faster inference
# model = model.merge_and_unload()

As a K8sGPT local backend

kubelm-edge is designed to plug into K8sGPT's OpenAI-compatible local-backend interface. Serve via vLLM, llama.cpp, or Ollama (see the GGUF repo for the Ollama path), point K8sGPT at the OpenAI endpoint. See the project's Phase 6 plan for the Helm chart that will package this for production deployment.

Limitations

  • Tool-use scope. Trained exclusively against K8sGPT v0.4.32's MCP tools (analyze, list-resources, get-resource, list-events, describe-resource, etc.). Behavior on other MCP servers or with custom tool sets is unmeasured.
  • Cluster surface. Eval covers 30 single-cluster failure scenarios (CrashLoopBackOff, OOMKilled, PVC issues, scheduler problems, network policy, RBAC, etc.). Multi-cluster, federation, service-mesh, and serverless scenarios are out of scope for v0.
  • One bench scenario errors deterministically. The pod-anti-affinity-001 scenario hits a kind-cluster settle race (~60s timeout waiting for the Deployment to report Available); this is a harness issue, not a model issue, but the model loses 1 point from the complete and rubric_pass columns because of it. Same error appears in attempt-1 and the published 2026-05-13 baseline.
  • Grounding metric. See "documented metric caveat" above.
  • Safety. kubelm is not trained for refusal patterns. Safety on destructive operations is delegated to the K8sGPT operator layer (Mutation CRs + policy gates) β€” see PROJECT.md β†’ "Safety: architectural, not behavioral".

License

Apache 2.0 β€” same as the base model and the project.

Citation

@software{kubelm_edge_v0_2026,
  author  = {Ben Taarit, Ramzi},
  title   = {kubelm-edge v0: a 1.5B tool-use specialist for K8sGPT MCP},
  year    = {2026},
  url     = {https://github.com/rbentaarit/kubelm},
  version = {v0},
}

Links

Downloads last month
18
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for rbentaarit/kubelm-edge-v0-lora

Adapter
(997)
this model