Adding Evaluation Results (#1)

19561b0 verified about 2 months ago

No virus

8.38 kB

	---
	language:
	- en
	license: apache-2.0
	library_name: transformers
	tags:
	- transformers
	datasets:
	- mwitiderrick/OpenPlatypus
	base_model: vihangd/shearedplats-2.7b-v2
	inference: true
	model_type: llama
	prompt_template: '### Instruction:\n

	{prompt}

	### Response:

	'
	created_by: mwitiderrick
	pipeline_tag: text-generation
	model-index:
	- name: mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1
	results:
	- task:
	type: text-generation
	dataset:
	name: hellaswag
	type: hellaswag
	metrics:
	- type: hellaswag (0-Shot)
	value: 0.5283
	name: hellaswag(0-Shot)
	- task:
	type: text-generation
	dataset:
	name: winogrande
	type: winogrande
	metrics:
	- type: winogrande (0-Shot)
	value: 0.6464
	name: winogrande(0-Shot)
	- task:
	type: text-generation
	dataset:
	name: arc_challenge
	type: arc_challenge
	metrics:
	- type: arc_challenge (0-Shot)
	value: 0.3652
	name: arc_challenge(0-Shot)
	source:
	url: https://huggingface.co/mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1
	name: shearedplats-2.7b-v2-instruct-v0.1 model card
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 40.19
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 70.08
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 28.12
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 41.23
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 65.04
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 2.12
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1
	name: Open LLM Leaderboard
	---
	# ShearedPlats-7b Instruct

	This is an [ShearedPlats-7b model](https://huggingface.co/vihangd/shearedplats-2.7b-v2) that has been fine-tuned on 2 epochs of the
	[Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus) dataset.

	The modified version of the dataset can be found [here](mwitiderrick/Open-Platypus)
	## Prompt Template
	```
	### Instruction:

	{query}

	### Response:
	<Leave new line for model to respond>
	```
	## Usage
	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM,pipeline

	tokenizer = AutoTokenizer.from_pretrained("mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1")
	model = AutoModelForCausalLM.from_pretrained("mwitiderrick/shearedplats-2.7b-v2-instruct-v0.1")
	query = "Provide step-by-step instructions for making a sweet chicken bugger"
	text_gen = pipeline(task="text-generation", model=model, tokenizer=tokenizer, max_length=350)
	output = text_gen(f"### Instruction:\n{query}\n### Response:\n")
	print(output[0]['generated_text'])
	"""
	### Instruction:
	Provide step-by-step instructions for making a sweet chicken bugger
	### Response:
	Step 1: Prepare the ingredients

	You will need a mixture of ground chicken, breadcrumbs, butter, Worcestershire sauce, garlic powder, onion powder, salt, and pepper.

	Step 2: Form the bugger

	Take a piece of chicken breast meat and use a sharp knife to cut it into small cubes. Place the cubes in a bowl and add the remaining ingredients: breadcrumbs, butter, Worcestershire sauce, garlic powder, onion powder, salt, and pepper. Mix the ingredients together until they are well combined.

	Step 3: Shape the bugger

	Take a piece of the bugger mixture and form it into a ball. Place the ball on a plate or in a bag and refrigerate it for 30 minutes.

	Step 4: Cook the bugger

	Heat a grill pan or grill to medium-high heat. Take the bugger out of the refrigerator and place it on the grill. Cook the bugger for 5-7 minutes on each side, or until it is cooked through.

	Step 5: Serve and enjoy!

	Once the bugger is cooked, serve it hot and enjoy!

	Note: You can also use a sweet chicken bugger mix to make sweet chicken buggers. Simply follow the instructions above, but use the sweet chicken bugger mix instead of the ground chicken.

	Enjoy your sweet chicken buggers!
	"""
	```
	## Evals
	```
	\| Tasks \|Version\|Filter\|n-shot\| Metric \|Value \| \|Stderr\|
	\|---------\|-------\|------\|-----:\|--------\|-----:\|---\|-----:\|
	\|hellaswag\|Yaml \|none \| 0\|acc \|0.5283\|± \|0.0050\|
	\| \| \|none \| 0\|acc_norm\|0.7068\|± \|0.0045\|


	\| Groups \|Version\|Filter\|n-shot\| Metric \| Value \| \|Stderr\|
	\|----------\|-------\|------\|-----:\|-----------\|------:\|---\|-----:\|
	\|truthfulqa\|N/A \|none \| 0\|acc \| 0.3411\|± \|0.0016\|
	\| \| \|none \| 0\|bleu_max \|19.4174\|± \|0.6888\|
	\| \| \|none \| 0\|bleu_acc \| 0.3378\|± \|0.0166\|
	\| \| \|none \| 0\|bleu_diff \|-4.4165\|± \|0.6611\|
	\| \| \|none \| 0\|rouge1_max \|43.6923\|± \|0.8239\|
	\| \| \|none \| 0\|rouge1_acc \| 0.3305\|± \|0.0165\|
	\| \| \|none \| 0\|rouge1_diff\|-6.4023\|± \|0.7680\|
	\| \| \|none \| 0\|rouge2_max \|28.4074\|± \|0.8883\|
	\| \| \|none \| 0\|rouge2_acc \| 0.2827\|± \|0.0158\|
	\| \| \|none \| 0\|rouge2_diff\|-6.7716\|± \|0.8844\|
	\| \| \|none \| 0\|rougeL_max \|40.2657\|± \|0.8218\|
	\| \| \|none \| 0\|rougeL_acc \| 0.3023\|± \|0.0161\|
	\| \| \|none \| 0\|rougeL_diff\|-6.5447\|± \|0.7706\|

	\|----------\|-------\|------\|-----:\|------\|-----:\|---\|-----:\|
	\|winogrande\|Yaml \|none \| 0\|acc \|0.6464\|± \|0.0134\|

	\|-------------\|-------\|------\|-----:\|--------\|-----:\|---\|-----:\|
	\|arc_challenge\|Yaml \|none \| 0\|acc \|0.3652\|± \|0.0141\|
	\| \| \|none \| 0\|acc_norm\|0.3908\|± \|0.0143\|
	```
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_mwitiderrick__shearedplats-2.7b-v2-instruct-v0.1)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|41.13\|
	\|AI2 Reasoning Challenge (25-Shot)\|40.19\|
	\|HellaSwag (10-Shot) \|70.08\|
	\|MMLU (5-Shot) \|28.12\|
	\|TruthfulQA (0-shot) \|41.23\|
	\|Winogrande (5-shot) \|65.04\|
	\|GSM8k (5-shot) \| 2.12\|