Models: dm Collection Attention-only transformers, sweep over model dimension • 15 items • Updated Oct 18, 2024
Models: L Collection Attention-only transformers, sweep over number of layers • 7 items • Updated Oct 18, 2024
Models: H Collection Attention-only transformers, sweep over number of heads (for fixed head dimension) • 7 items • Updated Oct 18, 2024
Models: H-dh Collection Attention-only transformers, sweep over number of heads (variable head dimension) • 7 items • Updated Oct 18, 2024
Models: dh Collection Attention-only transformers, sweep over head dimension • 7 items • Updated Oct 18, 2024