What is the difference between this and "neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8"

#1
by RonanMcGovern - opened

Does the "-dynamic" signify something?

Neural Magic org

Great question; yes, the dynamic means it has dynamic quantization for the activations where the activations to each quantized layer are quantized at runtime based on shifting quantized ranges to best fit that distribution. So, there is a slight performance penalty depending on the scenario, but it is a more accurate quantization scheme than generating a static sampled distribution and quantizing the activations based on that.

Sign up or log in to comment