Not-For-All-Audiences

conversational

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

PiVoT-0.1-Evil-a / README.md

leaderboard-pr-bot

Adding Evaluation Results

bca0b1b verified 5 months ago

preview code

raw

history blame

5.03 kB

	---
	language:
	- en
	- ko
	license: cc-by-sa-4.0
	tags:
	- not-for-all-audiences
	datasets:
	- maywell/ko_wikidata_QA
	- kyujinpy/OpenOrca-KO
	- Anthropic/hh-rlhf
	pipeline_tag: text-generation
	model-index:
	- name: PiVoT-0.1-Evil-a
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: AI2 Reasoning Challenge (25-Shot)
	type: ai2_arc
	config: ARC-Challenge
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: acc_norm
	value: 59.64
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/PiVoT-0.1-Evil-a
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HellaSwag (10-Shot)
	type: hellaswag
	split: validation
	args:
	num_few_shot: 10
	metrics:
	- type: acc_norm
	value: 81.48
	name: normalized accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/PiVoT-0.1-Evil-a
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: MMLU (5-Shot)
	type: cais/mmlu
	config: all
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 58.94
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/PiVoT-0.1-Evil-a
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: TruthfulQA (0-shot)
	type: truthful_qa
	config: multiple_choice
	split: validation
	args:
	num_few_shot: 0
	metrics:
	- type: mc2
	value: 39.23
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/PiVoT-0.1-Evil-a
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Winogrande (5-shot)
	type: winogrande
	config: winogrande_xl
	split: validation
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 75.3
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/PiVoT-0.1-Evil-a
	name: Open LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: GSM8k (5-shot)
	type: gsm8k
	config: main
	split: test
	args:
	num_few_shot: 5
	metrics:
	- type: acc
	value: 40.41
	name: accuracy
	source:
	url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=maywell/PiVoT-0.1-Evil-a
	name: Open LLM Leaderboard
	---

	# PiVoT-0.1-early

	![image/png](./PiVoT.png)

	# Model Details

	### Description
	PivoT is Finetuned model based on Mistral 7B. It is variation from Synatra v0.3 RP which has shown decent performance.

	PiVoT-0.1-Evil-a is an Evil tuned Version of PiVoT. It finetuned by method below.

	PiVot-0.1-Evil-b has Noisy Embedding tuned. It would have more variety in results.

	![image/png](./eviltune.png)


	<!-- prompt-template start -->
	## Prompt template: Alpaca-InstructOnly2

	```
	### Instruction:
	{prompt}

	### Response:

	```

	<!-- prompt-template end -->


	### Disclaimer
	The AI model provided herein is intended for experimental purposes only. The creator of this model makes no representations or warranties of any kind, either express or implied, as to the model's accuracy, reliability, or suitability for any particular purpose. The creator shall not be held liable for any outcomes, decisions, or actions taken on the basis of the information generated by this model. Users of this model assume full responsibility for any consequences resulting from its use.

	OpenOrca Dataset used when finetune PiVoT variation. Arcalive Ai Chat Chan log 7k, [ko_wikidata_QA](https://huggingface.co/datasets/maywell/ko_wikidata_QA), [kyujinpy/OpenOrca-KO](https://huggingface.co/datasets/kyujinpy/OpenOrca-KO) and other datasets used on base model.

	Follow me on twitter: https://twitter.com/stablefluffy

	Consider Support me making these model alone: https://www.buymeacoffee.com/mwell or with Runpod Credit Gift 💕

	Contact me on Telegram: https://t.me/AlzarTakkarsen
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_maywell__PiVoT-0.1-Evil-a)

	\| Metric \|Value\|
	\|---------------------------------\|----:\|
	\|Avg. \|59.16\|
	\|AI2 Reasoning Challenge (25-Shot)\|59.64\|
	\|HellaSwag (10-Shot) \|81.48\|
	\|MMLU (5-Shot) \|58.94\|
	\|TruthfulQA (0-shot) \|39.23\|
	\|Winogrande (5-shot) \|75.30\|
	\|GSM8k (5-shot) \|40.41\|