Skylar-980M-Cobol
π¬ Published as a transparent research for the Skylar framework, not as a general-purpose or SOTA model.
A 980M decoder-only COBOL specialist, trained 100% from scratch (random init, no warm-start, no external pretrained weights) and instruction-tuned to write, explain, modify and translate COBOL. Local, sovereign, and small enough to run on a modest GPU β built for legacy banking / insurance / PA settings where code cannot leave the building.
π Watch it grow: https://skyl4r.ai
βοΈ Honest scope
- COBOL-only by design. It deliberately refuses other languages ("Python non rientra nelle mie competenze: sono specializzato esclusivamente in COBOL"). That focus is a feature, not a gap.
- Small & token-budget-limited (980M params, 20.37B pretrain tokens) β strong on COBOL syntax, generation and multi-turn edits, weak on multi-step arithmetic/abstract reasoning (a 0.9B capacity ceiling β see results). Expect a competent, honest assistant, not a reasoning engine.
- A research artifact of the Skylar from-scratch stack, not a finished product.
What it does well (real, measured)
- Writes correct, compilable COBOL from a request, and modifies it across turns (e.g. "add an
OVERFLOW check if the sum exceeds 9999" β correct
IF β¦ DISPLAY 'OVERFLOW' ELSE β¦). - Explains COBOL constructs, knows its scope, and declines out-of-domain questions honestly instead of hallucinating.
- On the executable COBOL benchmark it beats every deployable 7B generalist (below).
Architecture
| Parameters | ~980M |
| Layers | 36 |
| Model dim (d_model) | 1536 |
| Attention heads | 12 (GQA, 4 KV heads) |
| Head dim | 128 (decoupled) |
| FFN | SwiGLU, d_ff 4096 |
| Normalization | RMSNorm (fp32) + QK-Norm |
| Positional | RoPE (ΞΈ=1e6) |
| Context window | 8192 (trained) Β· 16384 max (config) |
| Vocabulary | 48128 (code-aware BPE, digit-split) |
| Embeddings | tied (input/output) |
Training
- From scratch, random init β no warm-start, no third-party pretrained weights.
- Pretrain: 20.37B tokens, sequence length 8192.
- SFT: ChatML, assistant-only loss, on a curated COBOL instruction mix.
Results β head-to-head on COBOLEval
All models evaluated on the same harness, greedy decoding, seed 0, GnuCOBOL compile+execute,
official {NAME}.TXT scoring.
Ordered by pass@1. Skylar-980M-Cobol posts the highest compile-success rate of any model here (80.1% β above even the 14B COBOL-Coder), and is #1 on pass@1 among general-purpose, from-scratch models. The specialized COBOL-Coder (a 14B Qwen2.5-Coder fine-tune, not from-scratch) leads pass@1 β at ~14Γ the parameters and warm-started from a pretrained code model.
| Model | Params | CSR (compile rate) | pass@1 |
|---|---|---|---|
| Claude Opus 4.8 (reference ceiling, not deployable) | β | 96.6% | 81.5% |
| COBOL-Coder-14B (Qwen2.5-Coder fine-tune, not from-scratch) | 14B | 73.95% | 49.33% |
| Skylar-980M-Cobol (this model) | 980M | 80.1% | 7.5% |
| Qwen2.5-Coder-7B-Instruct | 7B | 6.2% | 2.1% |
| CodeLlama-7B-Instruct | 7B | 6.8% | 0.7% |
| StarCoder2-7B | 7B | 48.6% | 0.0% |
β At ~7Γ fewer parameters than the 7B baselines (and 14Γ fewer than COBOL-Coder), fully local and from-scratch, it posts the highest compile rate of any deployable model and out-performs the mainstream 7B code models on pass@1.
Conversational usability
Hand-evaluated on real multi-turn dev tasks (identity, generation, multi-turn modification, code comprehension, honest refusal): usable β concise, in-character, writes correct COBOL, keeps context across turns, and stays honestly in-scope.
Usage
Use the skylar library β no custom modeling code lives in
this repo; the architecture is provided by the package.
pip install skylar
import skylar
m = skylar.load("Sophia-AI/Skylar-980M-Cobol")
print(m.generate(
"Scrivimi un programma COBOL che somma due numeri PIC 9(4) e stampa il risultato.",
system="Sei un esperto programmatore COBOL.",
max_new_tokens=400))
Alternative β plain π€ transformers
import skylar registers the nano-transformer architecture, so AutoModelForCausalLM works with
no trust_remote_code and no modeling files in the repo:
import skylar # registers the architecture
from transformers import AutoModelForCausalLM
from tokenizers import Tokenizer
from huggingface_hub import hf_hub_download
import torch
repo = "Sophia-AI/Skylar-980M-Cobol"
model = AutoModelForCausalLM.from_pretrained(repo).eval()
tok = Tokenizer.from_file(hf_hub_download(repo, "tokenizer.json"))
# (wrap the turn in ChatML <|im_start|>β¦<|im_end|> as the skylar lib does)
Limitations
- Multi-step arithmetic / abstract reasoning is unreliable (0.9B ceiling) β verify generated logic.
- COBOL-only: it will refuse or re-frame non-COBOL requests.
- pass@1 is modest in absolute terms (COBOL is a hard, low-resource domain β even GPT-4o is at 16.4%); the value is being #1 among deployable models at a fraction of the size, fully local.
License & attribution
Β© A. Ivanovitch β Apache-2.0. Code & framework: github.com/2sophia/skylar
- Downloads last month
- 99
