Triangle104 commited on
Commit
2daae61
·
verified ·
1 Parent(s): b028a9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -0
README.md CHANGED
@@ -11,6 +11,55 @@ tags:
11
  This model was converted to GGUF format from [`arcee-ai/Arcee-Blitz`](https://huggingface.co/arcee-ai/Arcee-Blitz) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
12
  Refer to the [original model card](https://huggingface.co/arcee-ai/Arcee-Blitz) for more details on the model.
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ## Use with llama.cpp
15
  Install llama.cpp through brew (works on Mac and Linux)
16
 
 
11
  This model was converted to GGUF format from [`arcee-ai/Arcee-Blitz`](https://huggingface.co/arcee-ai/Arcee-Blitz) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
12
  Refer to the [original model card](https://huggingface.co/arcee-ai/Arcee-Blitz) for more details on the model.
13
 
14
+ ---
15
+ Arcee-Blitz (24B) is a new Mistral-based 24B model distilled from DeepSeek, designed to be both fast and efficient. We view it as a practical “workhorse” model that can tackle a range of tasks without the overhead of larger architectures.
16
+
17
+ Model Details
18
+ -
19
+ Architecture Base: Mistral-Small-24B-Instruct-2501
20
+ Parameter Count: 24B
21
+ Distillation Data:
22
+ Merged Virtuoso pipeline with Mistral architecture, hotstarting the
23
+ training with over 3B tokens of pretraining distillation from
24
+ DeepSeek-V3 logits
25
+
26
+ Fine-Tuning and Post-Training:
27
+ After capturing core logits, we performed additional fine-tuning and distillation steps to enhance overall performance.
28
+
29
+ License: Apache-2.0
30
+
31
+ Improving World Knowledge
32
+ -
33
+ Arcee-Blitz shows large improvements to performance on MMLU-Pro
34
+ versus the original Mistral-Small-3, reflecting a dramatic increase in
35
+ world knowledge.
36
+
37
+ Data contamination checking
38
+ -
39
+ We carefully examined our training data and pipeline to avoid contamination. While we’re confident in the validity of these gains, we remain open to further community validation and testing (one of the key reasons we release these models as open-source).
40
+
41
+ Limitations
42
+ -
43
+ Context Length: 32k Tokens (may vary depending on the final tokenizer settings and system resources).
44
+ Knowledge Cut-off: Training data may not reflect the latest events or developments beyond June 2024.
45
+
46
+ Ethical Considerations
47
+ -
48
+ Content Generation Risks: Like any language model, Arcee-Blitz can generate potentially harmful or biased content if prompted in certain ways.
49
+
50
+ License
51
+ -
52
+ Arcee-Blitz (24B) is released under the Apache-2.0 License.
53
+ You are free to use, modify, and distribute this model in both
54
+ commercial and non-commercial applications, subject to the terms and
55
+ conditions of the license.
56
+
57
+
58
+ If you have questions or would like to share your experiences using
59
+ Arcee-Blitz (24B), please connect with us on social media. We’re excited
60
+ to see what you build—and how this model helps you innovate!
61
+
62
+ ---
63
  ## Use with llama.cpp
64
  Install llama.cpp through brew (works on Mac and Linux)
65