Arrow V2 โ€” 0.92B Parameter LLM

Custom ~1B parameter decoder-only language model trained from scratch.

Architecture

Component Detail
Parameters 0.915B
Layers 16
d_model 1024
Attention Multi-Head Latent Attention (MLA) + YaRN RoPE
FFN DeepSeekMoE โ€” 64 experts, 4 active
Expert weights BitNet 1.58-bit ternary {-1, 0, +1}
Norm Hyper-Connection RMSNorm
Recursion Mixture of Recursions (r_max=3)
MTP heads 2 (predicts token+2 and token+3)
Vocab 256,000 tokens (GPT-2 base + custom special tokens)
Context length 512 tokens

Training

  • Data: FineWeb-Edu (70%) + The Stack v2 (30%)
  • Optimiser: AdamW (fused) โ€” lr=4e-4, ฮฒ=(0.9, 0.95), ฮต=1e-5
  • Schedule: Cosine decay with 1,000 warmup steps
  • Precision: FP16 mixed precision with dynamic loss scaling
  • Memory: Gradient checkpointing (fits on NVIDIA T4 16 GB)

Loading

from safetensors.torch import load_file
from transformers import AutoTokenizer
import json, torch

tok = AutoTokenizer.from_pretrained("imsuprtwo2/ArrowAI-1B", subfolder="tokenizer")
cfg_dict = json.load(open("config.json"))
# Reconstruct ArrowV2 from config, then:
weights = load_file("checkpoints/step_XXXX.safetensors")
model.load_state_dict(weights)
Downloads last month
337
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support