euclaise
/

Ferret_7B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Ferret_7B / README.md

euclaise's picture

Update README.md

72ab9f1 8 months ago

|

No virus

1.75 kB

	---
	license: other
	datasets:
	- euclaise/MiniCoT
	- euclaise/SciCoT
	- euclaise/symtune_mini
	- euclaise/gsm8k_self_correct
	- euclaise/mathoverflow-accepted
	- euirim/goodwiki
	---

	A pre-finetuning finetuned version of Mistral 7B 0.1, focused on CoT reasoning tasks.

	Probably decent at reasoning, but also probably not great as a chat assistant- it's designed to be finetuned further to give it a friendlier style. As such, it is intentionally somewhat undertrained.

	Current benchmarks aren't great for instruct models, so I've temporarily omitted them. I'm working on a benchmark suite for instruct models though, and will update this with scores when that is released.

	Uses ChatML prompt formatting.

	I reserve no rights to the model. To the extent possible under law, I release it as public domain. However, the datasets used have various licenses that may impact how the model may be used in your jurisdiction.
	# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
	Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_euclaise__Ferret-7B)

	\| Metric \| Value \|
	\|-----------------------\|---------------------------\|
	\| Avg. \| 47.81 \|
	\| ARC (25-shot) \| 62.2 \|
	\| HellaSwag (10-shot) \| 81.75 \|
	\| MMLU (5-shot) \| 60.82 \|
	\| TruthfulQA (0-shot) \| 40.94 \|
	\| Winogrande (5-shot) \| 77.35 \|
	\| GSM8K (5-shot) \| 5.76 \|
	\| DROP (3-shot) \| 5.87 \|

	I'm not sure what's going on with GSM8K. Since GSK8K (train split) data was included in the Ferret dataset, I suspect that either it is over-correcting itself or the eval is broken.