Rope scaling implementation

by cvdbdo - opened

Do you plan on implementing rope scaling in the near future?

(In transformers such as

model = AutoModelForCausalLM.from_pretrained(
    rope_scaling={"type": "dynamic", "factor": 2.},


but don't they use sliding window attention mechanism for larger contexts?

Mistral AI_ org

That's not planned in the near future!

lerela changed discussion status to closed

Sign up or log in to comment