PowerGQA 778M experimental checkpoint

Experimental causal language model checkpoint trained from scratch.

Checkpoint

File: ckpt_1500.pt
Step: 1500
Parameters: about 778.2M
Tokenizer: Qwen/Qwen2.5-0.5B tokenizer files included under okenizer/
Context length used in training: 1024

Architecture

Custom PowerGQA block:

grouped-query attention
Q/K RMSNorm
RoPE
talking-head pre/post mixing
learnable per-head gates
depthwise local convolution residual
SwiGLU MLP

This is a raw research checkpoint, not a polished instruction model. Load with the included rain_powergqa_500m.py definitions.

Training notes

Phase 1 used filtered FineWeb-Edu. After this checkpoint, local training was switched to a no-code QA/reasoning curriculum, but this repo snapshot contains the saved ckpt_1500.pt.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support