Add fixed Raven package for Quasar hybrid generation

#3
by NiroQ - opened

Adds the repo-local Raven implementation expected by modeling_quasar_long.py.

Fixes the GQA gate expansion bug in RavenAttention: K/V are repeated from 4 KV heads to 16 query heads, but Mamba2 router/memory gates are already 16-head tensors and must not be repeated again to 64. For GLA decay, gates are expanded before router masking so f/s remain aligned with q/k/v. Includes a runtime head-count assert to catch future mismatches.

NiroQ changed pull request status to closed

Sign up or log in to comment