9B - query_pre_attn_scalar = 256 not 224

#22

by danielhanchen - opened Jul 10, 2024

←

Jul 10, 2024

See https://github.com/google/gemma_pytorch/commit/03e657582d17cb5a8617ebf333c1c16f3694670e
Gemma 9b should use 256 and not 224 (self.config.hidden_size // self.config.num_attention_heads)

osanseviero changed pull request status to merged Jul 10, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment