KALYPSO v1.1L

KALYPSO v1.1L is GENOMA Labs' public agentic-coding model: Qwen2.5-Coder-14B-Instruct fine-tuned on the Kraken-Public corpus (CC-BY-4.0, decontaminated). It is the open counterpart of the internal KALYPSO line.

Model Details

Developed by: GENOMA Labs
Base model: Qwen/Qwen2.5-Coder-14B-Instruct (Apache-2.0)
Fine-tune: LoRA r=64, α=128, seq 8192, 1 epoch, on kraken_public_v1_decontam.jsonl (18,049 records)
License: Apache-2.0 (+ required attributions below)
Language(s): code (Python/C++/SQL) + English

Sources

Training data: Kraken-Public (this org) — see its dataset card.
Method: GENOMA's Execution-Verified Test-Discrimination Gate (TDG); see the Kraken tech report.

Uses

Direct Use

Local coding assistant / agentic-coding scaffolds (tool-calling, multi-step tasks). Serve via vLLM (OpenAI-compatible).

Out-of-Scope

Not a safety-aligned general assistant; no guarantee on non-coding domains. Evaluate before production use.

Bias, Risks, Limitations

Inherits base-model + Nemotron-pipeline biases; Python-skewed. v1.1L is trained on curated third-party (Nemotron) coding data — the GENOMA-native TDG-verified coding layer lands in a later revision.

Training Details

Training data: Kraken-Public v1 (decontaminated vs HumanEval/MBPP, 13-gram, 0.35% removed).
Procedure: QLoRA 4-bit (bitsandbytes), chunked cross-entropy, paged AdamW, cosine LR 2e-4, GS RTX 3090.
Hardware: single RTX 3090 (24 GB), CUDA_VISIBLE_DEVICES=1.

Evaluation

EvalPlus (greedy, vLLM, decontaminated corpus). Measured 2026-06-16 on GS RTX 3090.

Benchmark	KALYPSO v1.1L (Kraken)	Qwen2.5-Coder-14B-Instruct (ref)
HumanEval (base)	87.8	~89
EvalPlus HumanEval+	82.9	~87.2
MBPP (base)	84.7	~84
EvalPlus MBPP+	70.4	~72.8
BigCodeBench-Instruct (Full/Hard)	TBD	48.4 / 22.2
LiveCodeBench (date-windowed v6)	TBD	23.4

Honest reading: on pure code-completion benchmarks v1.1L tracks the base Qwen2.5-Coder-14B (≈base on HumanEval, slightly under on the rigorous + variants). The Kraken-Public corpus (curated Nemotron coding+agentic) did not lift HumanEval/MBPP over the already-strong base — pure code-completion is saturated at this scale. The corpus's intended value is agentic breadth, which these single-shot benchmarks do not measure. This model is released primarily as the reference fine-tune for the Kraken-Public dataset and as a demonstration of the (decontaminated, reproducible) corpus pipeline.

REQUIRED Attribution / NOTICE

Fine-tuned from Qwen2.5-Coder-14B-Instruct (Apache-2.0). Trained on Kraken-Public, derived from NVIDIA Nemotron data (CC-BY-4.0); per the Nemotron license this model may be subject to the Qwen and DeepSeek License Agreements. Built with Qwen.

Retain the upstream Qwen LICENSE/NOTICE. Modifications: LoRA fine-tune by GENOMA Labs on Kraken-Public.

Citation

@misc{kalypso_v11l_2026, title={KALYPSO v1.1L}, author={GENOMA Labs}, year={2026}}

Cite also Qwen2.5-Coder (arXiv:2409.12186) and NVIDIA Nemotron. Built with Qwen.

How to Get Started

# vLLM serve, then OpenAI-compatible client
# vllm serve GENOMA-Labs/KALYPSO-v1.1L --port 8000

KALYPSO v1.1L · GENOMA Labs · the public, ToS-clean member of the KALYPSO line. Built with Qwen.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for GenomaLabs-com/KALYPSO-v1.1L

Base model

Qwen/Qwen2.5-14B

Finetuned

Qwen/Qwen2.5-Coder-14B

Finetuned

Qwen/Qwen2.5-Coder-14B-Instruct

Finetuned

(111)

this model

Dataset used to train GenomaLabs-com/KALYPSO-v1.1L

Paper for GenomaLabs-com/KALYPSO-v1.1L

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18, 2024 • 157