Locutusque
/

TinyMistral-248M-v2.5

Text Generation

computer science

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Locutusque commited on Jan 24

Commit

cf2582b

•

1 Parent(s): 3bc693f

Update README.md

Files changed (1) hide show

README.md +46 -0

README.md CHANGED Viewed

@@ -1,3 +1,49 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
+language:
+- en
+- code
+datasets:
+- open-phi/programming_books_llama
+- open-phi/textbooks
+tags:
+- merge
+- computer science
+inference:
+  parameters:
+    do_sample: true
+    temperature: 0.7
+    top_p: 0.2
+    top_k: 14
+    max_new_tokens: 250
+    repetition_penalty: 1.16
 ---
+# TinyMistral-248M-v2.5
+This model was created by merging TinyMistral-248M-v1 and v2, then further pretraining on synthetic textbooks. The resulting model's performance is superior to both, after personal evaluation.
+During training, this model reached an average perplexity score of 4, outperforming V1 by nearly 7x, and V2 by almost 4x.
+You can use the following config to reproduce the merged model:
+```
+base_model: Locutusque/TinyMistral-248M-v2
+dtype: float16
+merge_method: ties
+parameters:
+  int8_mask: 1.0
+  normalize: 1.0
+slices:
+- sources:
+  - layer_range: [0, 12]
+    model: Locutusque/TinyMistral-248M
+    parameters:
+      density: [1.0, 0.7, 0.1]
+      weight: 1.0
+  - layer_range: [0, 12]
+    model: Locutusque/TinyMistral-248M-v2
+    parameters:
+      density: 0.5
+      weight: [0.0, 0.3, 0.7, 1.0]
+```
+This model can also answer basic questions, without needing to do any fine-tuning. Go ahead and try in the inference API.