ShinojiResearch
/

Senku-70B-Full

Generated from Trainer

Model card Files Files and versions Community

Senku-70B-Full / README.md

alicecomfy's picture

Update README.md

06128af verified 9 months ago

|

946 Bytes

	---
	license: cc-by-2.0
	---
	Finetune of miqu-70b-sf dequant of miqudev's leak of Mistral-70B (allegedly an early mistral medium). My diffs are available under CC-0 (That is the Senku-70B repo, full includes the merge), this is a merge with the leaked model, you can use the other repository to save bandwidth.

	EQ-Bench: 84.89
	GSM8k: 77.18 (71.04 when using ChatML)
	Hellaswag: 87.67

	Edit: Upon further testing a score of 85.09 was achieved using ChatML instead of Mistral's prompt.

	I recommend using the ChatML format instead, I will run more benchmarks. This also fixes the bug with Miqu dequant failing to provide a stop.
	<\|im_start\|>system
	Provide some context and/or instructions to the model.
	<\|im_end\|>
	<\|im_start\|>user
	The user’s message goes here
	<\|im_end\|>
	<\|im_start\|>assistant <\|im_end\|>

	Credit to https://twitter.com/hu_yifei for providing GSM & Hellaswag. It is the first open weight model to dethrone GPT-4 on EQ bench,