Skylar-980M-Cobol

πŸ”¬ Published as a transparent research for the Skylar framework, not as a general-purpose or SOTA model.

A 980M decoder-only COBOL specialist, trained 100% from scratch (random init, no warm-start, no external pretrained weights) and instruction-tuned to write, explain, modify and translate COBOL. Local, sovereign, and small enough to run on a modest GPU β€” built for legacy banking / insurance / PA settings where code cannot leave the building.

πŸ”— Watch it grow: https://skyl4r.ai

A from-scratch Skylar model generates COBOL; GnuCOBOL compiles and runs it

βš–οΈ Honest scope

  • COBOL-only by design. It deliberately refuses other languages ("Python non rientra nelle mie competenze: sono specializzato esclusivamente in COBOL"). That focus is a feature, not a gap.
  • Small & token-budget-limited (980M params, 20.37B pretrain tokens) β†’ strong on COBOL syntax, generation and multi-turn edits, weak on multi-step arithmetic/abstract reasoning (a 0.9B capacity ceiling β€” see results). Expect a competent, honest assistant, not a reasoning engine.
  • A research artifact of the Skylar from-scratch stack, not a finished product.

What it does well (real, measured)

  • Writes correct, compilable COBOL from a request, and modifies it across turns (e.g. "add an OVERFLOW check if the sum exceeds 9999" β†’ correct IF … DISPLAY 'OVERFLOW' ELSE …).
  • Explains COBOL constructs, knows its scope, and declines out-of-domain questions honestly instead of hallucinating.
  • On the executable COBOL benchmark it beats every deployable 7B generalist (below).

Architecture

Parameters ~980M
Layers 36
Model dim (d_model) 1536
Attention heads 12 (GQA, 4 KV heads)
Head dim 128 (decoupled)
FFN SwiGLU, d_ff 4096
Normalization RMSNorm (fp32) + QK-Norm
Positional RoPE (ΞΈ=1e6)
Context window 8192 (trained) Β· 16384 max (config)
Vocabulary 48128 (code-aware BPE, digit-split)
Embeddings tied (input/output)

Training

  • From scratch, random init β€” no warm-start, no third-party pretrained weights.
  • Pretrain: 20.37B tokens, sequence length 8192.
  • SFT: ChatML, assistant-only loss, on a curated COBOL instruction mix.

Results β€” head-to-head on COBOLEval

All models evaluated on the same harness, greedy decoding, seed 0, GnuCOBOL compile+execute, official {NAME}.TXT scoring.

Ordered by pass@1. Skylar-980M-Cobol posts the highest compile-success rate of any model here (80.1% β€” above even the 14B COBOL-Coder), and is #1 on pass@1 among general-purpose, from-scratch models. The specialized COBOL-Coder (a 14B Qwen2.5-Coder fine-tune, not from-scratch) leads pass@1 β€” at ~14Γ— the parameters and warm-started from a pretrained code model.

Model Params CSR (compile rate) pass@1
Claude Opus 4.8 (reference ceiling, not deployable) β€” 96.6% 81.5%
COBOL-Coder-14B (Qwen2.5-Coder fine-tune, not from-scratch) 14B 73.95% 49.33%
Skylar-980M-Cobol (this model) 980M 80.1% 7.5%
Qwen2.5-Coder-7B-Instruct 7B 6.2% 2.1%
CodeLlama-7B-Instruct 7B 6.8% 0.7%
StarCoder2-7B 7B 48.6% 0.0%

COBOLEval β€” Skylar-980M-Cobol vs the deployable 7B code models, same harness

β†’ At ~7Γ— fewer parameters than the 7B baselines (and 14Γ— fewer than COBOL-Coder), fully local and from-scratch, it posts the highest compile rate of any deployable model and out-performs the mainstream 7B code models on pass@1.

Conversational usability

Hand-evaluated on real multi-turn dev tasks (identity, generation, multi-turn modification, code comprehension, honest refusal): usable β€” concise, in-character, writes correct COBOL, keeps context across turns, and stays honestly in-scope.

Usage

Use the skylar library β€” no custom modeling code lives in this repo; the architecture is provided by the package.

pip install skylar
import skylar

m = skylar.load("Sophia-AI/Skylar-980M-Cobol")

print(m.generate(
    "Scrivimi un programma COBOL che somma due numeri PIC 9(4) e stampa il risultato.",
    system="Sei un esperto programmatore COBOL.",
    max_new_tokens=400))
Alternative β€” plain πŸ€— transformers

import skylar registers the nano-transformer architecture, so AutoModelForCausalLM works with no trust_remote_code and no modeling files in the repo:

import skylar                          # registers the architecture
from transformers import AutoModelForCausalLM
from tokenizers import Tokenizer
from huggingface_hub import hf_hub_download
import torch

repo = "Sophia-AI/Skylar-980M-Cobol"
model = AutoModelForCausalLM.from_pretrained(repo).eval()
tok = Tokenizer.from_file(hf_hub_download(repo, "tokenizer.json"))
# (wrap the turn in ChatML <|im_start|>…<|im_end|> as the skylar lib does)

Limitations

  • Multi-step arithmetic / abstract reasoning is unreliable (0.9B ceiling) β€” verify generated logic.
  • COBOL-only: it will refuse or re-frame non-COBOL requests.
  • pass@1 is modest in absolute terms (COBOL is a hard, low-resource domain β€” even GPT-4o is at 16.4%); the value is being #1 among deployable models at a fraction of the size, fully local.

License & attribution

Β© A. Ivanovitch β€” Apache-2.0. Code & framework: github.com/2sophia/skylar

Downloads last month
99
Safetensors
Model size
1.0B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including Sophia-AI/Skylar-980M-Cobol