Instruct-v0.1 or Instruct-v0.2 based?

#4
by zappa2005 - opened

I couldn't find this information in the model page, hence the question: Did you base on v0.1 or v0.2?

I read somewhere that the v0.1 uses a context extension method SWA (sliding-window attention) which was not working well in some cases, so they changed that in v0.2.

I guess I read it here, but I'm not sure - it was late. Thank you!
https://www.reddit.com/r/LocalLLaMA/comments/18k0fek/psa_you_can_and_may_want_to_disable_mixtrals/

NeverSleep org

what? this is Mixtral-8x7B-Instruct-v0.1, not Mistral-7B-Instruct-v0.2, this is a mixtral model, there is no 0.2 instruct for mixtral

IkariDev changed discussion status to closed

Sign up or log in to comment