neural-chat-7b-v3-1-dare-0.85 / README.md

Adding Evaluation Results

425fec8 verified 5 months ago

5.8 kB

	---
	license: llama2
	model-index:
	- name: neural-chat-7b-v3-1-dare-0.85
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 61.95
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/neural-chat-7b-v3-1-dare-0.85
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 83.84
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/neural-chat-7b-v3-1-dare-0.85
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 64.43
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/neural-chat-7b-v3-1-dare-0.85
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 44.9
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/neural-chat-7b-v3-1-dare-0.85
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 79.16
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/neural-chat-7b-v3-1-dare-0.85
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 42.15
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=uukuguy/neural-chat-7b-v3-1-dare-0.85
	name: Open LLM Leaderboard
	---
	Experiment for DARE(Drop and REscale), most of the delta parameters can be directly set to zeros without affecting the capabilities of SFT LMs and larger models can tolerate a higher proportion of discarded parameters.

	weight_mask_rate: 0.85 / use_weight_rescale: True / mask_stratery: random / scaling_coefficient: 1.0

	\| Model \| Average \| ARC \| HellaSwag \| MMLU \| TruthfulQA \| Winogrande \| GSM8K \| DROP \|
	\| ------ \| ------ \| ------ \| ------ \| ------ \| ------ \| ------ \| ------ \| ------ \|
	\| Intel/neural-chat-7b-v3-1 \| 59.06 \| 66.21 \| 83.64 \| 62.37 \| 59.65 \| 78.14 \| 19.56 \| 43.84 \|
	\| migtissera/SynthIA-7B-v1.3 \| 57.11 \| 62.12 \| 83.45 \| 62.65 \| 51.37 \| 78.85 \| 17.59 \| 43.76 \|
	\| bhenrym14/mistral-7b-platypus-fp16 \| 56.89 \| 63.05 \| 84.15 \| 64.11 \| 45.07 \| 78.53 \| 17.36 \| 45.92 \|
	\| jondurbin/airoboros-m-7b-3.1.2 \| 56.24 \| 61.86 \| 83.51 \| 61.91 \| 53.75 \| 77.58 \| 13.87 \| 41.2 \|
	\| uukuguy/speechless-code-mistral-orca-7b-v1.0 \| 55.33 \| 59.64 \| 82.25 \| 61.33 \| 48.45 \| 77.51 \| 8.26 \| 49.89 \|
	\| teknium/CollectiveCognition-v1.1-Mistral-7B \| 53.87 \| 62.12 \| 84.17 \| 62.35 \| 57.62 \| 75.37 \| 15.62 \| 19.85 \|
	\| Open-Orca/Mistral-7B-SlimOrca \| 53.34 \| 62.54 \| 83.86 \| 62.77 \| 54.23 \| 77.43 \| 21.38 \| 11.2 \|
	\| uukuguy/speechless-mistral-dolphin-orca-platypus-samantha-7b \| 53.34 \| 64.33 \| 84.4 \| 63.72 \| 52.52 \| 78.37 \| 21.38 \| 8.66 \|
	\| ehartford/dolphin-2.2.1-mistral-7b \| 53.06 \| 63.48 \| 83.86 \| 63.28 \| 53.17 \| 78.37 \| 21.08 \| 8.19 \|
	\| teknium/CollectiveCognition-v1-Mistral-7B \| 52.55 \| 62.37 \| 85.5 \| 62.76 \| 54.48 \| 77.58 \| 17.89 \| 7.22 \|
	\| HuggingFaceH4/zephyr-7b-alpha \| 52.4 \| 61.01 \| 84.04 \| 61.39 \| 57.9 \| 78.61 \| 14.03 \| 9.82 \|
	\| ehartford/samantha-1.2-mistral-7b \| 52.16 \| 64.08 \| 85.08 \| 63.91 \| 50.4 \| 78.53 \| 16.98 \| 6.13 \|

	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_uukuguy__neural-chat-7b-v3-1-dare-0.85)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|62.74\|
	\|AI2 Reasoning Challenge (25-Shot)\|61.95\|
	\|HellaSwag (10-Shot) \|83.84\|
	\|MMLU (5-Shot) \|64.43\|
	\|TruthfulQA (0-shot) \|44.90\|
	\|Winogrande (5-shot) \|79.16\|
	\|GSM8k (5-shot) \|42.15\|