Why your 8B model faster than original LLama3-8B-Instruct?

by HBD007 - opened Jul 9, 2024

Jul 9, 2024

I understood that the reason for the faster inference of the 70B model is the Speculative Decoding mentioned in your blog post.
However, what is the reason why elyza/Llama-3-ELYZA-JP-8B is faster than the original LLama3-8B-Instruct?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment