Edit model card

Quantizations of https://huggingface.co/meta-llama/Meta-Llama-3-8B

Update (May 1, 2024): re-uploaded models after this merge: https://github.com/ggerganov/llama.cpp/pull/6920 Models now work correctly (tried with 7777+3333 and 3333+777 using Q8_0, both gave correct results)

From original readme

How to use

This repository contains two versions of Meta-Llama-3-8B, for use with transformers and with the original llama3 codebase.

Use with transformers

See the snippet below for usage with Transformers:

>>> import transformers
>>> import torch

>>> model_id = "meta-llama/Meta-Llama-3-8B"

>>> pipeline = transformers.pipeline(
    "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto"
)
>>> pipeline("Hey how are you doing today?")

Use with llama3

Please, follow the instructions in the repository.

To download Original checkpoints, see the example command below leveraging huggingface-cli:

huggingface-cli download meta-llama/Meta-Llama-3-8B --include "original/*" --local-dir Meta-Llama-3-8B

For Hugging Face support, we recommend using transformers or TGI, but a similar command works.

Downloads last month
486
GGUF
Model size
8.03B params
Architecture
llama

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference API (serverless) has been turned off for this model.