Supra1.5-50M Base

Continued Pretraining โ€ข 50M Parameters โ€ข 5K Context

Supra-1.5-Base-EXP

Supra1.5-50M-base-exp is a continued-pretrained 50M parameter Llama-style base model derived from SupraLabs/Supra-50M-Base. The target update expands the usable context window from 1,024 tokens to 5,120 tokens using RoPE scaling and full-weight continued pretraining.

Architecture

The model keeps the original Supra-50M architecture and tokenizer:

Specification Value
Architecture LlamaForCausalLM
Parameters ~50M
Vocabulary Size 32,000
Hidden Size 512
Layers 12
Attention Heads 8
KV Heads 4
Context Length 5,120 tokens
Tokenizer Original Supra byte-level BPE tokenizer

Continued Pretraining Objective

This is CPT, not instruction fine-tuning. Training uses packed raw text with standard causal language-modeling loss:

  • labels = input_ids
  • all non-pad tokens are trained
  • no response-only masking
  • no system/user/assistant masking
  • no LoRA adapters in the default run

Data Mix

The current local training mix prepared for this run is:

  • 3,000,000,062 CPT tokens
    • 30% Tool Calling
    • 30% ChatML Conversations
    • 25% Factual Text (articles, essays, blogs)
    • 15% Math & Logic Questions

Intended Use

Supervised Fine-Tuning (SFT) and Reinforcement Learning

Downloads last month
-
Safetensors
Model size
51.8M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SupraLabs/Supra-1.5-50M-base-exp

Finetuned
(4)
this model

Space using SupraLabs/Supra-1.5-50M-base-exp 1