8 bit version
#3
by
bullerwins
- opened
Hi!
In my benchmarks seems like smaller parameter models suffer more from quantization, would it be possible to upload the 8 bit version as GPTQ also supports it?
These are the results from my MMLU-pro tests:
Llama3.1-8B-Instruct
overall | biology | business | chemistry | computer science | economics | engineering | health | history | law | math | philosophy | physics | psychology | other |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
48.28 | 65.41 | 56.27 | 40.37 | 49.51 | 59.72 | 32.61 | 56.60 | 43.83 | 33.79 | 50.11 | 44.49 | 42.88 | 62.66 | 49.57 |
Llama3.1-8B-Instruct-GPTQ-INT4
overall | biology | business | chemistry | computer science | economics | engineering | health | history | law | math | philosophy | physics | psychology | other |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
39.52 | 57.74 | 42.08 | 31.36 | 42.20 | 49.53 | 25.39 | 47.43 | 36.22 | 28.25 | 37.68 | 36.67 | 34.64 | 55.01 | 43.18 |
Hi @bullerwins ,
Could you please share how you run these benchmarks? Is there publicly available code to run the MMLU/MMLU-Pro benchmarks? I would like to test a quantized version of Llama-3.1-8B that I created. Thank you for your time!
I'm using https://github.com/chigkim/Ollama-MMLU-Pro pointing it to the openai api endpoint of vLLM
Thank you for the information!