KALYPSO v1.1L

KALYPSO v1.1L is GENOMA Labs' public agentic-coding model: Qwen2.5-Coder-14B-Instruct fine-tuned on the Kraken-Public corpus (CC-BY-4.0, decontaminated). It is the open counterpart of the internal KALYPSO line.

Model Details

  • Developed by: GENOMA Labs
  • Base model: Qwen/Qwen2.5-Coder-14B-Instruct (Apache-2.0)
  • Fine-tune: LoRA r=64, α=128, seq 8192, 1 epoch, on kraken_public_v1_decontam.jsonl (18,049 records)
  • License: Apache-2.0 (+ required attributions below)
  • Language(s): code (Python/C++/SQL) + English

Sources

  • Training data: Kraken-Public (this org) — see its dataset card.
  • Method: GENOMA's Execution-Verified Test-Discrimination Gate (TDG); see the Kraken tech report.

Uses

Direct Use

Local coding assistant / agentic-coding scaffolds (tool-calling, multi-step tasks). Serve via vLLM (OpenAI-compatible).

Out-of-Scope

Not a safety-aligned general assistant; no guarantee on non-coding domains. Evaluate before production use.

Bias, Risks, Limitations

Inherits base-model + Nemotron-pipeline biases; Python-skewed. v1.1L is trained on curated third-party (Nemotron) coding data — the GENOMA-native TDG-verified coding layer lands in a later revision.

Training Details

  • Training data: Kraken-Public v1 (decontaminated vs HumanEval/MBPP, 13-gram, 0.35% removed).
  • Procedure: QLoRA 4-bit (bitsandbytes), chunked cross-entropy, paged AdamW, cosine LR 2e-4, GS RTX 3090.
  • Hardware: single RTX 3090 (24 GB), CUDA_VISIBLE_DEVICES=1.

Evaluation

EvalPlus (greedy, vLLM, decontaminated corpus). Measured 2026-06-16 on GS RTX 3090.

Benchmark KALYPSO v1.1L (Kraken) Qwen2.5-Coder-14B-Instruct (ref)
HumanEval (base) 87.8 ~89
EvalPlus HumanEval+ 82.9 ~87.2
MBPP (base) 84.7 ~84
EvalPlus MBPP+ 70.4 ~72.8
BigCodeBench-Instruct (Full/Hard) TBD 48.4 / 22.2
LiveCodeBench (date-windowed v6) TBD 23.4

Honest reading: on pure code-completion benchmarks v1.1L tracks the base Qwen2.5-Coder-14B (≈base on HumanEval, slightly under on the rigorous + variants). The Kraken-Public corpus (curated Nemotron coding+agentic) did not lift HumanEval/MBPP over the already-strong base — pure code-completion is saturated at this scale. The corpus's intended value is agentic breadth, which these single-shot benchmarks do not measure. This model is released primarily as the reference fine-tune for the Kraken-Public dataset and as a demonstration of the (decontaminated, reproducible) corpus pipeline.

REQUIRED Attribution / NOTICE

Fine-tuned from Qwen2.5-Coder-14B-Instruct (Apache-2.0). Trained on Kraken-Public, derived from NVIDIA Nemotron data (CC-BY-4.0); per the Nemotron license this model may be subject to the Qwen and DeepSeek License Agreements. Built with Qwen.

Retain the upstream Qwen LICENSE/NOTICE. Modifications: LoRA fine-tune by GENOMA Labs on Kraken-Public.

Citation

@misc{kalypso_v11l_2026, title={KALYPSO v1.1L}, author={GENOMA Labs}, year={2026}}

Cite also Qwen2.5-Coder (arXiv:2409.12186) and NVIDIA Nemotron. Built with Qwen.

How to Get Started

# vLLM serve, then OpenAI-compatible client
# vllm serve GENOMA-Labs/KALYPSO-v1.1L --port 8000

KALYPSO v1.1L · GENOMA Labs · the public, ToS-clean member of the KALYPSO line. Built with Qwen.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GenomaLabs-com/KALYPSO-v1.1L

Base model

Qwen/Qwen2.5-14B
Finetuned
(111)
this model

Dataset used to train GenomaLabs-com/KALYPSO-v1.1L

Paper for GenomaLabs-com/KALYPSO-v1.1L