Skylar-980M-Cobol

🔬 Published as a transparent research for the Skylar framework, not as a general-purpose or SOTA model.

A 980M decoder-only COBOL specialist, trained 100% from scratch (random init, no warm-start, no external pretrained weights) and instruction-tuned to write, explain, modify and translate COBOL. Local, sovereign, and small enough to run on a modest GPU — built for legacy banking / insurance / PA settings where code cannot leave the building.

🔗 Watch it grow: https://skyl4r.ai

A from-scratch Skylar model generates COBOL; GnuCOBOL compiles and runs it

⚖️ Honest scope

COBOL-only by design. It deliberately refuses other languages ("Python non rientra nelle mie competenze: sono specializzato esclusivamente in COBOL"). That focus is a feature, not a gap.
Small & token-budget-limited (980M params, 20.37B pretrain tokens) → strong on COBOL syntax, generation and multi-turn edits, weak on multi-step arithmetic/abstract reasoning (a 0.9B capacity ceiling — see results). Expect a competent, honest assistant, not a reasoning engine.
A research artifact of the Skylar from-scratch stack, not a finished product.

What it does well (real, measured)

Writes correct, compilable COBOL from a request, and modifies it across turns (e.g. "add an OVERFLOW check if the sum exceeds 9999" → correct IF … DISPLAY 'OVERFLOW' ELSE …).
Explains COBOL constructs, knows its scope, and declines out-of-domain questions honestly instead of hallucinating.
On the executable COBOL benchmark it beats every deployable 7B generalist (below).

Architecture


Parameters	~980M
Layers	36
Model dim (d_model)	1536
Attention heads	12 (GQA, 4 KV heads)
Head dim	128 (decoupled)
FFN	SwiGLU, d_ff 4096
Normalization	RMSNorm (fp32) + QK-Norm
Positional	RoPE (θ=1e6)
Context window	8192 (trained) · 16384 max (config)
Vocabulary	48128 (code-aware BPE, digit-split)
Embeddings	tied (input/output)

Training

From scratch, random init — no warm-start, no third-party pretrained weights.
Pretrain: 20.37B tokens, sequence length 8192.
SFT: ChatML, assistant-only loss, on a curated COBOL instruction mix.

Results — head-to-head on COBOLEval

All models evaluated on the same harness, greedy decoding, seed 0, GnuCOBOL compile+execute, official {NAME}.TXT scoring.

Ordered by pass@1. Skylar-980M-Cobol posts the highest compile-success rate of any model here (80.1% — above even the 14B COBOL-Coder), and is #1 on pass@1 among general-purpose, from-scratch models. The specialized COBOL-Coder (a 14B Qwen2.5-Coder fine-tune, not from-scratch) leads pass@1 — at ~14× the parameters and warm-started from a pretrained code model.

Model	Params	CSR (compile rate)	pass@1
Claude Opus 4.8 (reference ceiling, not deployable)	—	96.6%	81.5%
COBOL-Coder-14B (Qwen2.5-Coder fine-tune, not from-scratch)	14B	73.95%	49.33%
Skylar-980M-Cobol (this model)	980M	80.1%	7.5%
Qwen2.5-Coder-7B-Instruct	7B	6.2%	2.1%
CodeLlama-7B-Instruct	7B	6.8%	0.7%
StarCoder2-7B	7B	48.6%	0.0%

→ At ~7× fewer parameters than the 7B baselines (and 14× fewer than COBOL-Coder), fully local and from-scratch, it posts the highest compile rate of any deployable model and out-performs the mainstream 7B code models on pass@1.

Conversational usability

Hand-evaluated on real multi-turn dev tasks (identity, generation, multi-turn modification, code comprehension, honest refusal): usable — concise, in-character, writes correct COBOL, keeps context across turns, and stays honestly in-scope.

Usage

Use the skylar library — no custom modeling code lives in this repo; the architecture is provided by the package.

pip install skylar

import skylar

m = skylar.load("Sophia-AI/Skylar-980M-Cobol")

print(m.generate(
    "Scrivimi un programma COBOL che somma due numeri PIC 9(4) e stampa il risultato.",
    system="Sei un esperto programmatore COBOL.",
    max_new_tokens=400))

Alternative — plain 🤗 transformers

import skylar registers the nano-transformer architecture, so AutoModelForCausalLM works with no trust_remote_code and no modeling files in the repo:

import skylar                          # registers the architecture
from transformers import AutoModelForCausalLM
from tokenizers import Tokenizer
from huggingface_hub import hf_hub_download
import torch

repo = "Sophia-AI/Skylar-980M-Cobol"
model = AutoModelForCausalLM.from_pretrained(repo).eval()
tok = Tokenizer.from_file(hf_hub_download(repo, "tokenizer.json"))
# (wrap the turn in ChatML <|im_start|>…<|im_end|> as the skylar lib does)

Limitations

Multi-step arithmetic / abstract reasoning is unreliable (0.9B ceiling) — verify generated logic.
COBOL-only: it will refuse or re-frame non-COBOL requests.
pass@1 is modest in absolute terms (COBOL is a hard, low-resource domain — even GPT-4o is at 16.4%); the value is being #1 among deployable models at a fraction of the size, fully local.

License & attribution

Downloads last month: 99

Safetensors

Model size

1.0B params

Tensor type

F32

Collection including Sophia-AI/Skylar-980M-Cobol

Skylar

Collection

From-scratch Italian SLM stack: base -> chat + dense retrieval. Apache-2.0, local. github.com/2sophia/skylar • 6 items • Updated 5 days ago