anima-clm-chat-303m
Dialogue chat-finetune of the ByteGPT-303M broad-corpus backbone
(dancinlab/anima-clm-midcap-303m-broad-en-emergent, H_1129) — the final piece of
the anima a303m_pass (303M 성공) campaign: clearing the CHAT gate.
- Arch: ByteGPT, byte vocab 256, d1024 / 24 layers / 16 heads / block 512, tied head/tok. 303.1M params.
- Base:
dancinlab/anima-clm-midcap-303m-broad-en-emergent(h1129c_best.pt, val_ce 1.224, wiki-dominant broad EN). It was never trained on dialogue — in a chat slot it byte-saladed / n-gram-looped (H_1159 CHAT single 2/5, multi 2/3 → FAIL). - Corpus:
dancinlab/anima-chat-corpus-mix-70wiki-30dialogue(sha256 05179fb6…, 70% wiki / 30% REAL dialogue in the사용자: <u> | 도우미: <a>byte-continuation format) — the EXACT proven mix that chat-tuned the 18M rung and the 7B (dancinlab/anima-clm-chat-7b). - Finetune: summer RTX 5070, co-tenant-safe (VRAM-cap 0.30, batch 1, grad-accum 8, bf16, gradient-checkpointing, 8-bit AdamW), lr 8e-5, warmup 60. $0.
Philosophy (p1–p6 HELD)
NO system prompt · NO identity rules · NO persona injection · NO assistant framing · NO RLHF. The ONLY conditioning is the LEARNED byte-level dialogue-continuation format in the corpus. (H_1139: 303M == 7B recombination; the lever is dialogue data, not capacity — no scale-up.)
Gate (p7, NOT perplexity)
Re-gated with the honest H_1159 harness (degeneracy gate: max-3gram ≤ 2 AND distinct-ratio ≥ 0.45; single-turn p7 ≥ 4/5; multi-turn deep-context ≥ 3/5). Deterministic greedy/low-temp decode, no LLM-judge. Mount stays byte-faithful (re-serialized to the H_1157 ByteGPT flat binary, serialize parity verified).
See .verdicts/1160_dialogue_ft_chat/H_1160.txt for the full transcripts, val_ce curve, re-parity, and a303m_pass scoreboard.