Ba2han/BioMistral-v0.2 · Hugging Face

limit: None, provide_description: False, num_fewshot: 5, batch_size: None

Task	Version	Metric	Value		Stderr
hendrycksTest-college_chemistry	1	acc	0.4600	±	0.0501
		acc_norm	0.4600	±	0.0501
hendrycksTest-high_school_chemistry	1	acc	0.5222	±	0.0351
		acc_norm	0.5222	±	0.0351
hendrycksTest-college_biology	1	acc	0.7222	±	0.0375
		acc_norm	0.7222	±	0.0375
hendrycksTest-high_school_biology	1	acc	0.7355	±	0.0251
		acc_norm	0.7355	±	0.0251
winogrande	0	acc	0.7758	±	0.0117

This model was trained from base Mistral-7B-Instruct-v0.2 on 710 examples, 200 of which comes from camel-ai/biology set. The rest were scraped personally and consists of very long scientific articles and text books.

It beats Mistral-7B-Instruct-v0.2 in MMLU chemistry and biology. It should be able to generate mostly factual, basic and lengthy scientific text. I guess it could be "we have cosmopedia at home" for people who want to create cheap pretraining datasets from scratch.

Template:

[Context]
You are a helpful assistant. Read the instruction and write a response accordingly.

[User]
{prompt}

[Assistant]

Ba2han
/

BioMistral-v0.2

Dataset used to train Ba2han/BioMistral-v0.2