Arrow V2 — 0.92B Parameter LLM

Custom ~1B parameter decoder-only language model trained from scratch.

Architecture

Component	Detail
Parameters	0.915B
Layers	16
d_model	1024
Attention	Multi-Head Latent Attention (MLA) + YaRN RoPE
FFN	DeepSeekMoE — 64 experts, 4 active
Expert weights	BitNet 1.58-bit ternary {-1, 0, +1}
Norm	Hyper-Connection RMSNorm
Recursion	Mixture of Recursions (r_max=3)
MTP heads	2 (predicts token+2 and token+3)
Vocab	256,000 tokens (GPT-2 base + custom special tokens)
Context length	512 tokens

Training

Data: FineWeb-Edu (70%) + The Stack v2 (30%)
Optimiser: AdamW (fused) — lr=4e-4, β=(0.9, 0.95), ε=1e-5
Schedule: Cosine decay with 1,000 warmup steps
Precision: FP16 mixed precision with dynamic loss scaling
Memory: Gradient checkpointing (fits on NVIDIA T4 16 GB)

Loading

from safetensors.torch import load_file
from transformers import AutoTokenizer
import json, torch

tok = AutoTokenizer.from_pretrained("imsuprtwo2/ArrowAI-1B", subfolder="tokenizer")
cfg_dict = json.load(open("config.json"))
# Reconstruct ArrowV2 from config, then:
weights = load_file("checkpoints/step_XXXX.safetensors")
model.load_state_dict(weights)

Downloads last month: 337

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support