Update README.md

d212598 verified 2 days ago

3.71 kB

	---
	base_model: arcee-ai/Arcee-Blitz
	library_name: transformers
	license: apache-2.0
	tags:
	- llama-cpp
	- gguf-my-repo
	---

	# Triangle104/Arcee-Blitz-Q3_K_L-GGUF
	This model was converted to GGUF format from [`arcee-ai/Arcee-Blitz`](https://huggingface.co/arcee-ai/Arcee-Blitz) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
	Refer to the [original model card](https://huggingface.co/arcee-ai/Arcee-Blitz) for more details on the model.

	---
	Arcee-Blitz (24B) is a new Mistral-based 24B model distilled from DeepSeek, designed to be both fast and efficient. We view it as a practical “workhorse” model that can tackle a range of tasks without the overhead of larger architectures.

	Model Details
	-
	Architecture Base: Mistral-Small-24B-Instruct-2501
	Parameter Count: 24B
	Distillation Data:
	Merged Virtuoso pipeline with Mistral architecture, hotstarting the
	training with over 3B tokens of pretraining distillation from
	DeepSeek-V3 logits

	Fine-Tuning and Post-Training:
	After capturing core logits, we performed additional fine-tuning and distillation steps to enhance overall performance.

	License: Apache-2.0

	Improving World Knowledge
	-
	Arcee-Blitz shows large improvements to performance on MMLU-Pro
	versus the original Mistral-Small-3, reflecting a dramatic increase in
	world knowledge.

	Data contamination checking
	-
	We carefully examined our training data and pipeline to avoid contamination. While we’re confident in the validity of these gains, we remain open to further community validation and testing (one of the key reasons we release these models as open-source).

	Limitations
	-
	Context Length: 32k Tokens (may vary depending on the final tokenizer settings and system resources).
	Knowledge Cut-off: Training data may not reflect the latest events or developments beyond June 2024.

	Ethical Considerations
	-
	Content Generation Risks: Like any language model, Arcee-Blitz can generate potentially harmful or biased content if prompted in certain ways.

	License
	-
	Arcee-Blitz (24B) is released under the Apache-2.0 License.
	You are free to use, modify, and distribute this model in both
	commercial and non-commercial applications, subject to the terms and
	conditions of the license.


	If you have questions or would like to share your experiences using
	Arcee-Blitz (24B), please connect with us on social media. We’re excited
	to see what you build—and how this model helps you innovate!

	---
	## Use with llama.cpp
	Install llama.cpp through brew (works on Mac and Linux)

	```bash
	brew install llama.cpp

	```
	Invoke the llama.cpp server or the CLI.

	### CLI:
	```bash
	llama-cli --hf-repo Triangle104/Arcee-Blitz-Q3_K_L-GGUF --hf-file arcee-blitz-q3_k_l.gguf -p "The meaning to life and the universe is"
	```

	### Server:
	```bash
	llama-server --hf-repo Triangle104/Arcee-Blitz-Q3_K_L-GGUF --hf-file arcee-blitz-q3_k_l.gguf -c 2048
	```

	Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.

	Step 1: Clone llama.cpp from GitHub.
	```
	git clone https://github.com/ggerganov/llama.cpp
	```

	Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
	```
	cd llama.cpp && LLAMA_CURL=1 make
	```

	Step 3: Run inference through the main binary.
	```
	./llama-cli --hf-repo Triangle104/Arcee-Blitz-Q3_K_L-GGUF --hf-file arcee-blitz-q3_k_l.gguf -p "The meaning to life and the universe is"
	```
	or
	```
	./llama-server --hf-repo Triangle104/Arcee-Blitz-Q3_K_L-GGUF --hf-file arcee-blitz-q3_k_l.gguf -c 2048
	```