Skylar-236M-Base

Skylar is a from-scratch decoder-only LLM (Qwen3-style: RMSNorm · RoPE · GQA · QK-Norm · SwiGLU · µP), pretrained on 1.12B tokens of Italian legal/normative text (EUR-Lex, Banca d'Italia, Gazzetta Ufficiale, normattiva, TED) on a single RTX 4090. Code & full results: https://github.com/2sophia/skylar.

This is the base language model (next-token pretraining only — no instruction tuning).

Honest scope — read before using. This is a 236M domain model, intentionally small and undertrained by Chinchilla (~1.12B tokens ≈ 5× below compute-optimal for this size). It is an Italian specialist (English is not fluent) and it is not a factual oracle — it hallucinates open-domain facts. Its real, measured strength is grounded Italian tasks (answer/extract/classify from provided context). Published as a transparent reference for the Skylar framework, not as a general-purpose or SOTA model.

Architecture

Field Value
Params ~236M
Layers 18
d_model 1024
Heads (Q/KV) 16 / 4 (GQA)
d_ff 2816 (SwiGLU)
Context 2048
Vocab 32,768 (ByteLevel BPE)
Pos. enc. RoPE (θ=1e6) · QK-Norm · RMSNorm
License Apache-2.0

Results (measured, reproducible)

Public Italian benchmarks (likelihood-based multiple-choice, best of acc / acc_norm):

Task Skylar-236M random
XCOPA-it (causal commonsense) 0.562 0.50
HellaSwag-it 0.292 0.25
Belebele-it 0.267 0.25

Base val perplexity 15.4. Real signal on Italian causal commonsense; near-chance on knowledge-heavy tasks — expected for a 236M model trained on a narrow legal corpus.

Usage

# pip install git+https://github.com/2sophia/skylar.git
from huggingface_hub import hf_hub_download
from tokenizers import Tokenizer
from models.decoder import NanoTransformer

model = NanoTransformer.from_pretrained("Sophia-AI/Skylar-236M-Base").eval()
tok = Tokenizer.from_file(hf_hub_download("Sophia-AI/Skylar-236M-Base", "tokenizer.json"))

import torch
ids = torch.tensor([[tok.token_to_id("<bos>")] + tok.encode("Il regolamento DORA", add_special_tokens=False).ids])
out = model.generate(ids, max_new_tokens=40, temperature=0.7, top_p=0.9, eos_token_id=[tok.token_to_id("<eos>")])
print(tok.decode(out[0].tolist()))

For chat use Skylar-236M-Chat; for retrieval Skylar-236M-Embed.

© A. Ivanovitch — Apache-2.0.

Downloads last month
26
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sophia-AI/Skylar-236M-Base

Finetunes
2 models

Collection including Sophia-AI/Skylar-236M-Base