Request: meta-llama/Meta-Llama-3-8B-Instruct

#64
by saishf - opened

Model name: meta-llama/Meta-Llama-3-8B-Instruct

[Required] Model link: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

[Required] Brief description: Solely for testing purposes, i recently ran MMLU-Pro on mradermacher's Llama3-8B-Instruct, I-matrix and non I-matrix quants and found that the i-matrix quants performed worse in the test by 5.7%. I'd like another quant to compare against to make sure it isn't just an anomaly.
Discussion & Results: https://huggingface.co/mradermacher/Meta-Llama-3-8B-Instruct-i1-GGUF/discussions/1

[Required] An image/direct image link to represent the model (square shaped):
If i must πŸ˜Άβ€πŸŒ«οΈ
background_transparent-9c586d08f02bbbd962b74fe14b7acd8d220aab12f6969fe-1020x1024.png

[Optional] Additonal quants (if you want any):

I only plan to use a Q5_K_M 😸

Lewdiculous changed discussion title from meta-llama/Meta-Llama-3-8B-Instruct to Request: meta-llama/Meta-Llama-3-8B-Instruct

Sure thing.

I'll upload the Q5_K_M as soon as it's ready for your testing.


Conversion:

  1. HF Model in BF16 =(convert_hf_to_gguf.py)=> BF16-GGUF
  2. HF Model in BF16 =(convert_hf_to_gguf.py)=> FP16-GGUF
  3. Generate imatrix.dat (llama-imatrix.exe) from FP16-GGUF (source data is the imatrix-with-rp-ex.txt)
  4. Quantize (llama-quantize.exe) BF16-GGUF with the imatrix.dat to the other smaller sizes

Update:
File is up.

Lewdiculous changed discussion status to closed

Will add the Q4 quants as well, if you need to test those, the differences between non-imatrix and imatrix quants should be more pronounced there.

Sign up or log in to comment