KALYPSO v1.1L
KALYPSO v1.1L is GENOMA Labs' public agentic-coding model: Qwen2.5-Coder-14B-Instruct fine-tuned on the Kraken-Public corpus (CC-BY-4.0, decontaminated). It is the open counterpart of the internal KALYPSO line.
Model Details
- Developed by: GENOMA Labs
- Base model:
Qwen/Qwen2.5-Coder-14B-Instruct(Apache-2.0) - Fine-tune: LoRA r=64, α=128, seq 8192, 1 epoch, on
kraken_public_v1_decontam.jsonl(18,049 records) - License: Apache-2.0 (+ required attributions below)
- Language(s): code (Python/C++/SQL) + English
Sources
- Training data: Kraken-Public (this org) — see its dataset card.
- Method: GENOMA's Execution-Verified Test-Discrimination Gate (TDG); see the Kraken tech report.
Uses
Direct Use
Local coding assistant / agentic-coding scaffolds (tool-calling, multi-step tasks). Serve via vLLM (OpenAI-compatible).
Out-of-Scope
Not a safety-aligned general assistant; no guarantee on non-coding domains. Evaluate before production use.
Bias, Risks, Limitations
Inherits base-model + Nemotron-pipeline biases; Python-skewed. v1.1L is trained on curated third-party (Nemotron) coding data — the GENOMA-native TDG-verified coding layer lands in a later revision.
Training Details
- Training data: Kraken-Public v1 (decontaminated vs HumanEval/MBPP, 13-gram, 0.35% removed).
- Procedure: QLoRA 4-bit (bitsandbytes), chunked cross-entropy, paged AdamW, cosine LR 2e-4, GS RTX 3090.
- Hardware: single RTX 3090 (24 GB), CUDA_VISIBLE_DEVICES=1.
Evaluation
EvalPlus (greedy, vLLM, decontaminated corpus). Measured 2026-06-16 on GS RTX 3090.
| Benchmark | KALYPSO v1.1L (Kraken) | Qwen2.5-Coder-14B-Instruct (ref) |
|---|---|---|
| HumanEval (base) | 87.8 | ~89 |
| EvalPlus HumanEval+ | 82.9 | ~87.2 |
| MBPP (base) | 84.7 | ~84 |
| EvalPlus MBPP+ | 70.4 | ~72.8 |
| BigCodeBench-Instruct (Full/Hard) | TBD | 48.4 / 22.2 |
| LiveCodeBench (date-windowed v6) | TBD | 23.4 |
Honest reading: on pure code-completion benchmarks v1.1L tracks the base Qwen2.5-Coder-14B (≈base on HumanEval, slightly under on the rigorous + variants). The Kraken-Public corpus (curated Nemotron coding+agentic) did not lift HumanEval/MBPP over the already-strong base — pure code-completion is saturated at this scale. The corpus's intended value is agentic breadth, which these single-shot benchmarks do not measure. This model is released primarily as the reference fine-tune for the Kraken-Public dataset and as a demonstration of the (decontaminated, reproducible) corpus pipeline.
REQUIRED Attribution / NOTICE
Fine-tuned from Qwen2.5-Coder-14B-Instruct (Apache-2.0). Trained on Kraken-Public, derived from NVIDIA Nemotron data (CC-BY-4.0); per the Nemotron license this model may be subject to the Qwen and DeepSeek License Agreements. Built with Qwen.
Retain the upstream Qwen LICENSE/NOTICE. Modifications: LoRA fine-tune by GENOMA Labs on Kraken-Public.
Citation
@misc{kalypso_v11l_2026, title={KALYPSO v1.1L}, author={GENOMA Labs}, year={2026}}
Cite also Qwen2.5-Coder (arXiv:2409.12186) and NVIDIA Nemotron. Built with Qwen.
How to Get Started
# vLLM serve, then OpenAI-compatible client
# vllm serve GENOMA-Labs/KALYPSO-v1.1L --port 8000
KALYPSO v1.1L · GENOMA Labs · the public, ToS-clean member of the KALYPSO line. Built with Qwen.
Model tree for GenomaLabs-com/KALYPSO-v1.1L
Base model
Qwen/Qwen2.5-14B