Edit model card

limit: None, provide_description: False, num_fewshot: 5, batch_size: None

Task Version Metric Value Stderr
hendrycksTest-college_chemistry 1 acc 0.4600 ± 0.0501
acc_norm 0.4600 ± 0.0501
hendrycksTest-high_school_chemistry 1 acc 0.5222 ± 0.0351
acc_norm 0.5222 ± 0.0351
hendrycksTest-college_biology 1 acc 0.7222 ± 0.0375
acc_norm 0.7222 ± 0.0375
hendrycksTest-high_school_biology 1 acc 0.7355 ± 0.0251
acc_norm 0.7355 ± 0.0251
winogrande 0 acc 0.7758 ± 0.0117

This model was trained from base Mistral-7B-Instruct-v0.2 on 710 examples, 200 of which comes from camel-ai/biology set. The rest were scraped personally and consists of very long scientific articles and text books.

It beats Mistral-7B-Instruct-v0.2 in MMLU chemistry and biology. It should be able to generate mostly factual, basic and lengthy scientific text. I guess it could be "we have cosmopedia at home" for people who want to create cheap pretraining datasets from scratch.

Template:

[Context]
You are a helpful assistant. Read the instruction and write a response accordingly.

[User]
{prompt}

[Assistant]

image/png

Downloads last month
1
Safetensors
Model size
7.24B params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Ba2han/BioMistral-v0.2