No description provided.
Owner

Is there a reason it should be mistral? i tried to stay to solar models which use "LlamaForCausalLM"?
I'm still learning all these small things :3

It's for tokenizing and stuff, since SOLAR is initialized from mistral, it never hurts to be correct as it might cause a bug later on (token trimming, counting, or other things)

Owner

I guess being based on mistral explains the higher benchmarks than llama 2 13b

saishf changed pull request status to merged

Sign up or log in to comment