Arrow V2 โ 0.92B Parameter LLM
Custom ~1B parameter decoder-only language model trained from scratch.
Architecture
| Component | Detail |
|---|---|
| Parameters | 0.915B |
| Layers | 16 |
| d_model | 1024 |
| Attention | Multi-Head Latent Attention (MLA) + YaRN RoPE |
| FFN | DeepSeekMoE โ 64 experts, 4 active |
| Expert weights | BitNet 1.58-bit ternary {-1, 0, +1} |
| Norm | Hyper-Connection RMSNorm |
| Recursion | Mixture of Recursions (r_max=3) |
| MTP heads | 2 (predicts token+2 and token+3) |
| Vocab | 256,000 tokens (GPT-2 base + custom special tokens) |
| Context length | 512 tokens |
Training
- Data: FineWeb-Edu (70%) + The Stack v2 (30%)
- Optimiser: AdamW (fused) โ lr=4e-4, ฮฒ=(0.9, 0.95), ฮต=1e-5
- Schedule: Cosine decay with 1,000 warmup steps
- Precision: FP16 mixed precision with dynamic loss scaling
- Memory: Gradient checkpointing (fits on NVIDIA T4 16 GB)
Loading
from safetensors.torch import load_file
from transformers import AutoTokenizer
import json, torch
tok = AutoTokenizer.from_pretrained("imsuprtwo2/ArrowAI-1B", subfolder="tokenizer")
cfg_dict = json.load(open("config.json"))
# Reconstruct ArrowV2 from config, then:
weights = load_file("checkpoints/step_XXXX.safetensors")
model.load_state_dict(weights)
- Downloads last month
- 337
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support