Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

RedHatAI
/
Meta-Llama-3.1-405B-Instruct-quantized.w4a16

Text Generation
Safetensors
llama
int4
vllm
conversational
compressed-tensors
Model card Files Files and versions Community
4
New discussion
Resources
  • PR & discussions documentation
  • Code of Conduct
  • Hub documentation

how many resources were used for quantizing this model?

1
#4 opened 8 months ago by
fengyang1995

Unable to use fp8 kv cache with neuralmagic quants on ampere

#3 opened 8 months ago by
ndurkee

Storage format differs from other w4a16 models

2
#2 opened 8 months ago by
timdettmers

weights does not exist when trying to deploy in sagemaker endpoint

1
#1 opened 9 months ago by
LorenzoCevolaniAXA
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs