Fixing some errors of the leaderboard evaluation results in the ModelCard yaml (#5)

3b26b76 verified 7 months ago

5.21 kB

	---
	language:
	- en
	- pt
	datasets:
	- cnmoro/WizardVicuna-PTBR-Instruct-Clean
	model-index:
	- name: Mistral-7B-Portuguese
	results:
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: ENEM Challenge (No Images)
	type: eduagarcia/enem_challenge
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 58.08
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: BLUEX (No Images)
	type: eduagarcia-temp/BLUEX_without_images
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 48.68
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: OAB Exams
	type: eduagarcia/oab_exams
	split: train
	args:
	num_few_shot: 3
	metrics:
	- type: acc
	value: 37.08
	name: accuracy
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 RTE
	type: assin2
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 90.31
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: Assin2 STS
	type: eduagarcia/portuguese_benchmark
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: pearson
	value: 76.55
	name: pearson
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: FaQuAD NLI
	type: ruanchaves/faquad-nli
	split: test
	args:
	num_few_shot: 15
	metrics:
	- type: f1_macro
	value: 58.84
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: HateBR Binary
	type: ruanchaves/hatebr
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 79.21
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: PT Hate Speech Binary
	type: hate_speech_portuguese
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 68.87
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
	name: Open Portuguese LLM Leaderboard
	- task:
	type: text-generation
	name: Text Generation
	dataset:
	name: tweetSentBR
	type: eduagarcia-temp/tweetsentbr
	split: test
	args:
	num_few_shot: 25
	metrics:
	- type: f1_macro
	value: 64.71
	name: f1-macro
	source:
	url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=cnmoro/Mistral-7B-Portuguese
	name: Open Portuguese LLM Leaderboard
	---

	This is a finetuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2) using [unsloth](https://github.com/unslothai/unsloth) on a instruct portuguese dataset, as an attempt to improve the performance of the model on the language.

	No benchmarks have been executed yet.

	The original prompt format was used:

	```plaintext
	<s>[INST] {Prompt goes here} [/INST]
	```
	# [Open Portuguese LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/cnmoro/Mistral-7B-Portuguese)

	\| Metric \| Value \|
	\|--------------------------\|--------\|
	\|Average \|64.7\|
	\|ENEM Challenge (No Images)\| 58.08\|
	\|BLUEX (No Images) \| 48.68\|
	\|OAB Exams \| 37.08\|
	\|Assin2 RTE \| 90.31\|
	\|Assin2 STS \| 76.55\|
	\|FaQuAD NLI \| 58.84\|
	\|HateBR Binary \| 79.21\|
	\|PT Hate Speech Binary \| 68.87\|
	\|tweetSentBR \| 64.71\|