File size: 1,730 Bytes
1551f6b
 
0eeda98
 
1551f6b
0eeda98
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
license: cc-by-4.0
datasets:
- camel-ai/biology
---

limit: None, provide_description: False, num_fewshot: 5, batch_size: None
|               Task                |Version| Metric |Value |   |Stderr|
|-----------------------------------|------:|--------|-----:|---|-----:|
|hendrycksTest-college_chemistry    |      1|acc     |0.4600|±  |0.0501|
|                                   |       |acc_norm|**0.4600**|±  |0.0501|
|hendrycksTest-high_school_chemistry|      1|acc     |0.5222|±  |0.0351|
|                                   |       |acc_norm|**0.5222**|±  |0.0351|
|hendrycksTest-college_biology      |      1|acc     |0.7222|±  |0.0375|
|                                   |       |acc_norm|**0.7222**|±  |0.0375|
|hendrycksTest-high_school_biology  |      1|acc     |0.7355|±  |0.0251|
|                                   |       |acc_norm|**0.7355**|±  |0.0251|
|winogrande                         |      0|acc     |**0.7758**|±  |0.0117|

This model was trained from base Mistral-7B-Instruct-v0.2 on 710 examples, 200 of which comes from camel-ai/biology set. The rest were scraped personally and consists of very long scientific articles and text books.

It beats Mistral-7B-Instruct-v0.2 in MMLU chemistry and biology. It should be able to generate mostly factual, basic and lengthy scientific text. I guess it could be "we have cosmopedia at home" for people who want to create cheap pretraining datasets from scratch.

Template:

    [Context]
    You are a helpful assistant. Read the instruction and write a response accordingly.

    [User]
    {prompt}

    [Assistant]



![image/png](https://cdn-uploads.huggingface.co/production/uploads/6324eabf05bd8a54c6eb1650/ywxKzcQra_1g8EWtMeZ8Q.png)