Lewdiculous/Model-Requests · Request: meta-llama/Meta-Llama-3-8B-Instruct

Jul 8

Model name: meta-llama/Meta-Llama-3-8B-Instruct

[Required] Model link: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

[Required] Brief description: Solely for testing purposes, i recently ran MMLU-Pro on mradermacher's Llama3-8B-Instruct, I-matrix and non I-matrix quants and found that the i-matrix quants performed worse in the test by 5.7%. I'd like another quant to compare against to make sure it isn't just an anomaly.
Discussion & Results: https://huggingface.co/mradermacher/Meta-Llama-3-8B-Instruct-i1-GGUF/discussions/1

[Required] An image/direct image link to represent the model (square shaped):
If i must 😶‍🌫️

[Optional] Additonal quants (if you want any):

I only plan to use a Q5_K_M 😸

Lewdiculous changed discussion title from meta-llama/Meta-Llama-3-8B-Instruct to Request: meta-llama/Meta-Llama-3-8B-Instruct Jul 8

Lewdiculous

Owner Jul 8

Sure thing.

Lewdiculous

Owner Jul 8

•

edited Jul 8

I'll upload the Q5_K_M as soon as it's ready for your testing.

Conversion:

HF Model in BF16 =(convert_hf_to_gguf.py)=> BF16-GGUF
HF Model in BF16 =(convert_hf_to_gguf.py)=> FP16-GGUF
Generate imatrix.dat (llama-imatrix.exe) from FP16-GGUF (source data is the imatrix-with-rp-ex.txt)
Quantize (llama-quantize.exe) BF16-GGUF with the imatrix.dat to the other smaller sizes

Update:
File is up.

Lewdiculous changed discussion status to closed Jul 8

Lewdiculous

Owner Jul 8

Will add the Q4 quants as well, if you need to test those, the differences between non-imatrix and imatrix quants should be more pronounced there.