fsaudm
/

Meta-Llama-3.1-8B-Instruct-INT8

Text Generation

text-generation-inference

Inference Endpoints

8-bit precision

Model card Files Files and versions Community

Model Card for Model ID

This is a quantized version of Llama 3.1 8B Instruct. Quantized to 8-bit using bistandbytes and accelerate.

Developed by: Farid Saud @ DSRS
License: llama3.1
Base Model: meta-llama/Meta-Llama-3.1-8B-Instruct

Use this model

Use a pipeline as a high-level helper:

# Use a pipeline as a high-level helper
from transformers import pipeline

messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe = pipeline("text-generation", model="fsaudm/Meta-Llama-3.1-8B-Instruct-INT8")
pipe(messages)

Load model directly

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("fsaudm/Meta-Llama-3.1-8B-Instruct-INT8")
model = AutoModelForCausalLM.from_pretrained("fsaudm/Meta-Llama-3.1-8B-Instruct-INT8")

The base model information can be found in the original meta-llama/Meta-Llama-3.1-8B-Instruct

Downloads last month: 54

Safetensors

Model size

8.03B params

Tensor type

F32

·

FP16

·

I8

·

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for fsaudm/Meta-Llama-3.1-8B-Instruct-INT8

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Quantized

(302)

this model

Collection including fsaudm/Meta-Llama-3.1-8B-Instruct-INT8

Meta-Llama-3.1-Quantized

Collection of quantized Llama 3.1 models (8B & 70B versions for now), using bitsandbites. • 4 items • Updated Aug 28 • 1

Evaluation results

Metadata error: specify a dataset to view leaderboard