Is RoPE scaling correct?

#2
by Noeda - opened

Rope theta is 100k here: https://huggingface.co/keyfan/grok-1-hf/blob/main/config.json#L30 (unless I missed it being overridden anywhere in code).

It's 10k here: https://github.com/xai-org/grok-1/blob/main/model.py#L801

You're right, I forget to correct that. Thank you for spotting this out.

Thanks :) Also thanks for the HF version. It's much easier to follow than the original Jax implementation.

Noeda changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment