What is the difference between this and "neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8"
#1
by
RonanMcGovern
- opened
Does the "-dynamic" signify something?
Great question; yes, the dynamic means it has dynamic quantization for the activations where the activations to each quantized layer are quantized at runtime based on shifting quantized ranges to best fit that distribution. So, there is a slight performance penalty depending on the scenario, but it is a more accurate quantization scheme than generating a static sampled distribution and quantizing the activations based on that.