hugging-quants/Meta-Llama-3.1-8B-Instruct-GPTQ-INT4

Jul 30, 2024

Hi!

In my benchmarks seems like smaller parameter models suffer more from quantization, would it be possible to upload the 8 bit version as GPTQ also supports it?

These are the results from my MMLU-pro tests:

Llama3.1-8B-Instruct

overall	biology	business	chemistry	computer science	economics	engineering	health	history	law	math	philosophy	physics	psychology	other
48.28	65.41	56.27	40.37	49.51	59.72	32.61	56.60	43.83	33.79	50.11	44.49	42.88	62.66	49.57

Llama3.1-8B-Instruct-GPTQ-INT4

overall	biology	business	chemistry	computer science	economics	engineering	health	history	law	math	philosophy	physics	psychology	other
39.52	57.74	42.08	31.36	42.20	49.53	25.39	47.43	36.22	28.25	37.68	36.67	34.64	55.01	43.18

clearcash

Aug 13, 2024

Hi @bullerwins ,

Could you please share how you run these benchmarks? Is there publicly available code to run the MMLU/MMLU-Pro benchmarks? I would like to test a quantized version of Llama-3.1-8B that I created. Thank you for your time!

bullerwins

Aug 13, 2024

I'm using https://github.com/chigkim/Ollama-MMLU-Pro pointing it to the openai api endpoint of vLLM

clearcash

Aug 13, 2024

Thank you for the information!