Rope scaling implementation

#11
by cvdbdo - opened

Do you plan on implementing rope scaling in the near future?

(In transformers such as

model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.1",
    rope_scaling={"type": "dynamic", "factor": 2.},
    device_map='auto'
)

)

but don't they use sliding window attention mechanism for larger contexts?

Mistral AI_ org

That's not planned in the near future!

lerela changed discussion status to closed

Sign up or log in to comment