---
tags:
- fp8
- vllm
license: llama3
license_link: https://llama.meta.com/llama3/license/
language:
- en
---


# Meta-Llama-3-8B-Instruct-FP8

## Model Overview
- **Model Architecture:** Meta-Llama-3
  - **Input:** Text
  - **Output:** Text
- **Model Optimizations:**
  - **Weight quantization:** FP8
  - **Activation quantization:** FP8
  - **KV cache quantization:** FP8
- **Intended Use Cases:** Intended for commercial and research use in English. Similarly to [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct), this models is intended for assistant-like chat.
- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
- **Release Date:** 6/8/2024
- **Version:** 1.0
- **License(s):** [Llama3](https://llama.meta.com/llama3/license/)
- **Model Developers:** Neural Magic

Quantized version of [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct).


```
lm_eval --model vllm --model_args pretrained=nm-testing/Meta-Llama-3-8B-Instruct-FP8-K-V,kv_cache_dtype=fp8,add_bos_token=True --tasks gsm8k --num_fewshot 5 --batch_size auto

vllm (pretrained=nm-testing/Meta-Llama-3-8B-Instruct-FP8-K-V,kv_cache_dtype=fp8,add_bos_token=True), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: auto
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.7748|±  |0.0115|
|     |       |strict-match    |     5|exact_match|↑  |0.7763|±  |0.0115|
```