ZygAI-OSS-138M 🇱🇹

138M parametrų lietuvių kalbos modelis, sukurtas nuo nulio ir apmokytas atsakyti į klausimus.

ZygAI-OSS-138M is a 138.6 million parameter Lithuanian Large Language Model built entirely from scratch using a custom Transformer architecture. It has undergone Supervised Fine-Tuning (SFT) to act as a conversational assistant that can answer questions truthfully in Lithuanian.

Note: This repository includes the SFT (Supervised Fine-Tuned) version of the model, which understands the Question: [prompt]\nAnswer: format and uses a custom <EOS> token to cleanly stop generating text once the answer is complete.

🏗️ Architecture

A Decoder-only Transformer, comparable in scale to GPT-2 Small.

Parameter	Value
Total Parameters	138.6M
Layers	16
Attention Heads	12
Model Dimensions (`d_model`)	768
Context Length	1024 tokens
Vocabulary Size	16,000 (Custom BPE Tokenizer)

⚡ Training

Trained on a single NVIDIA RTX A5000 (24GB VRAM) GPU on RunPod with the following PyTorch optimizations:

BFloat16 + TF32 — mixed-precision for speed and stability
FlashAttention — via F.scaled_dot_product_attention
torch.compile — kernel fusion and architecture acceleration
Gradient Checkpointing — to save massive amounts of VRAM, allowing a 138M model to train on 24GB GPUs

Detail	Value
Dataset	`lt_corpus.txt` (~94 MB — Lithuanian Wikipedia + other texts)
Training Duration	~6.5 hours (15,000 optimization steps) + SFT Phase
Best Validation Loss	~3.45

⚠️ Known Limitations

Hallucinations — While SFT has drastically reduced base-model rambling, the model is still relatively small (~138M) and may occasionally hallucinate facts or struggle with complex reasoning.

Recommended generation settings: temperature between 0.6–0.8 with top_k=50 enabled.

🔮 Roadmap

Add an English dataset → bilingual (LT + EN) model
Instruction Fine-Tuning → conversational assistant capability (SFT Complete!)

🙏 Special Thanks

A huge thank you to everyone who inspired, supported, and made this project possible:

Ruby2001 · 0daysophie · italian_tech_person · Julia's Tech Spot · RunPod

Built in Lithuania 🇱🇹 · ZygMediaGroup

Downloads last month: 11

Safetensors

Model size

0.2B params

Tensor type

BF16