How is this model different from Llama 2-7B?

by dheerajpai - opened

It's better :)

Also, it has gqa which other llama 7b models don’t in their architecture. The architecture is very similar but not the same. Also, it’s pretrained on different data.

Mistral AI_ org

GQA and Sliding Window Attention are the visible differences which should help increase inference throughput and context length.

timlacroix changed discussion status to closed

Is this model pre-trained from scratch? Just curious.

dheerajpai changed discussion status to open
Mistral AI_ org

Yes, it is pre-trained from scratch.

for me, it respond well in chinese. For llama 7b, whenever I ask in chinese, it somewhat understand my question by respond in English.

Sign up or log in to comment