ZygAI-OSS-138M ๐Ÿ‡ฑ๐Ÿ‡น

138M parametrลณ lietuviลณ kalbos modelis, sukurtas nuo nulio ir apmokytas atsakyti ฤฏ klausimus.

ZygAI-OSS-138M is a 138.6 million parameter Lithuanian Large Language Model built entirely from scratch using a custom Transformer architecture. It has undergone Supervised Fine-Tuning (SFT) to act as a conversational assistant that can answer questions truthfully in Lithuanian.

Note: This repository includes the SFT (Supervised Fine-Tuned) version of the model, which understands the Question: [prompt]\nAnswer: format and uses a custom <EOS> token to cleanly stop generating text once the answer is complete.


๐Ÿ—๏ธ Architecture

A Decoder-only Transformer, comparable in scale to GPT-2 Small.

Parameter Value
Total Parameters 138.6M
Layers 16
Attention Heads 12
Model Dimensions (d_model) 768
Context Length 1024 tokens
Vocabulary Size 16,000 (Custom BPE Tokenizer)

โšก Training

Trained on a single NVIDIA RTX A5000 (24GB VRAM) GPU on RunPod with the following PyTorch optimizations:

  • BFloat16 + TF32 โ€” mixed-precision for speed and stability
  • FlashAttention โ€” via F.scaled_dot_product_attention
  • torch.compile โ€” kernel fusion and architecture acceleration
  • Gradient Checkpointing โ€” to save massive amounts of VRAM, allowing a 138M model to train on 24GB GPUs
Detail Value
Dataset lt_corpus.txt (~94 MB โ€” Lithuanian Wikipedia + other texts)
Training Duration ~6.5 hours (15,000 optimization steps) + SFT Phase
Best Validation Loss ~3.45

โš ๏ธ Known Limitations

Hallucinations โ€” While SFT has drastically reduced base-model rambling, the model is still relatively small (~138M) and may occasionally hallucinate facts or struggle with complex reasoning.

Recommended generation settings: temperature between 0.6โ€“0.8 with top_k=50 enabled.


๐Ÿ”ฎ Roadmap

  • Add an English dataset โ†’ bilingual (LT + EN) model
  • Instruction Fine-Tuning โ†’ conversational assistant capability (SFT Complete!)

๐Ÿ™ Special Thanks

A huge thank you to everyone who inspired, supported, and made this project possible:

Ruby2001 ยท 0daysophie ยท italian_tech_person ยท Julia's Tech Spot ยท RunPod


Built in Lithuania ๐Ÿ‡ฑ๐Ÿ‡น ยท ZygMediaGroup

Downloads last month
11
Safetensors
Model size
0.2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support