Quantized Llama 3 8B Instruct to Q40 format supported by Distributed Llama.

License

Before download this repository please accept Llama 3 Community License.

How to run

  1. Clone this repository.
  2. Clone Distributed Llama:
git clone https://github.com/b4rtaz/distributed-llama.git
  1. Build Distributed Llama:
make main
  1. Run Distributed Llama:
./main inference --prompt "Hello world" --steps 128 --weights-float-type q40 --buffer-float-type q80 --nthreads 4 --model path/to/dllama_meta-llama-3-8b_q40.bin --tokenizer path/to/dllama_meta-llama3-tokenizer.t

Chat Template

Please keep in mind this model expects the prompt to use the chat template of llama 3.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.