Why your 8B model faster than original LLama3-8B-Instruct?
#1
by
HBD007
- opened
I understood that the reason for the faster inference of the 70B model is the Speculative Decoding mentioned in your blog post.
However, what is the reason why elyza/Llama-3-ELYZA-JP-8B is faster than the original LLama3-8B-Instruct?