Locutusque commited on
Commit
cf2582b
1 Parent(s): 3bc693f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md CHANGED
@@ -1,3 +1,49 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ - code
6
+ datasets:
7
+ - open-phi/programming_books_llama
8
+ - open-phi/textbooks
9
+ tags:
10
+ - merge
11
+ - computer science
12
+ inference:
13
+ parameters:
14
+ do_sample: true
15
+ temperature: 0.7
16
+ top_p: 0.2
17
+ top_k: 14
18
+ max_new_tokens: 250
19
+ repetition_penalty: 1.16
20
  ---
21
+ # TinyMistral-248M-v2.5
22
+ This model was created by merging TinyMistral-248M-v1 and v2, then further pretraining on synthetic textbooks. The resulting model's performance is superior to both, after personal evaluation.
23
+
24
+ During training, this model reached an average perplexity score of 4, outperforming V1 by nearly 7x, and V2 by almost 4x.
25
+
26
+ You can use the following config to reproduce the merged model:
27
+
28
+ ```
29
+ base_model: Locutusque/TinyMistral-248M-v2
30
+ dtype: float16
31
+ merge_method: ties
32
+ parameters:
33
+ int8_mask: 1.0
34
+ normalize: 1.0
35
+ slices:
36
+ - sources:
37
+ - layer_range: [0, 12]
38
+ model: Locutusque/TinyMistral-248M
39
+ parameters:
40
+ density: [1.0, 0.7, 0.1]
41
+ weight: 1.0
42
+ - layer_range: [0, 12]
43
+ model: Locutusque/TinyMistral-248M-v2
44
+ parameters:
45
+ density: 0.5
46
+ weight: [0.0, 0.3, 0.7, 1.0]
47
+ ```
48
+
49
+ This model can also answer basic questions, without needing to do any fine-tuning. Go ahead and try in the inference API.