invisietch
/

MiS-Firefly-v0.2-22B-Q6_K-GGUF

@@ -4,54 +4,86 @@ tags:
 - not-for-all-audiences
 - axolotl
 - qlora
-- llama-cpp
-- gguf-my-repo
 language:
 - en
 license: other
-base_model: invisietch/MiS-Firefly-v0.2-22B
 ---
-# invisietch/MiS-Firefly-v0.2-22B-Q6_K-GGUF
-This model was converted to GGUF format from [`invisietch/MiS-Firefly-v0.2-22B`](https://huggingface.co/invisietch/MiS-Firefly-v0.2-22B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
-Refer to the [original model card](https://huggingface.co/invisietch/MiS-Firefly-v0.2-22B) for more details on the model.
-## Use with llama.cpp
-Install llama.cpp through brew (works on Mac and Linux)
-```bash
-brew install llama.cpp
-```
-Invoke the llama.cpp server or the CLI.
-### CLI:
-```bash
-llama-cli --hf-repo invisietch/MiS-Firefly-v0.2-22B-Q6_K-GGUF --hf-file mis-firefly-v0.2-22b-q6_k.gguf -p "The meaning to life and the universe is"
-```
-### Server:
-```bash
-llama-server --hf-repo invisietch/MiS-Firefly-v0.2-22B-Q6_K-GGUF --hf-file mis-firefly-v0.2-22b-q6_k.gguf -c 2048
-```
-Note: You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
-Step 1: Clone llama.cpp from GitHub.
-```
-git clone https://github.com/ggerganov/llama.cpp
-```
-Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).
-```
-cd llama.cpp && LLAMA_CURL=1 make
-```
-Step 3: Run inference through the main binary.
-```
-./llama-cli --hf-repo invisietch/MiS-Firefly-v0.2-22B-Q6_K-GGUF --hf-file mis-firefly-v0.2-22b-q6_k.gguf -p "The meaning to life and the universe is"
-```
-or
 ```
-./llama-server --hf-repo invisietch/MiS-Firefly-v0.2-22B-Q6_K-GGUF --hf-file mis-firefly-v0.2-22b-q6_k.gguf -c 2048
 ```

 - not-for-all-audiences
 - axolotl
 - qlora
 language:
 - en
 license: other
 ---
+<div align="center">
+  <b style="font-size: 36px;">MiS-Firefly-v0.2-22B (Q6_K)</b>
+  <img src="https://huggingface.co/invisietch/MiS-Firefly-v0.2-22B/resolve/main/header.png" style="width:60%">
+<b>HF</b> :
+<a href="https://huggingface.co/invisietch/MiS-Firefly-v0.2-22B">FP16</a>
+&vert;
+<b>GGUF</b> :
+<a href="https://huggingface.co/invisietch/MiS-Firefly-v0.2-22B-Q6_K-GGUF">Q6_K</a> &middot;
+<a href="https://huggingface.co/invisietch/MiS-Firefly-v0.2-22B-Q4_K_M-GGUF">Q4_K_M</a>
+</div>
+# Model Details
+**This is a fix for the quantization issue in Firefly v0.1.**
+Firefly is a Mistral Small 22B finetune designed for creative writing and roleplay. The model is largely uncensored and should support
+context up to 32,768 tokens.
+The model has been tested in various roleplay scenarios up to 16k context, as well as in a role as an assistant. It shows a broad
+competency &amp; coherence across various scenarios.
+Special thanks to <a href="https://huggingface.co/SicariusSicariiStuff">SicariusSicariiStuff</a> for bouncing ideas back &amp; forth on
+training, and <a href="https://huggingface.co/SytanSD">SytanSD</a> for quants.
+# Feedback
+I appreciate all feedback on any of my models, you can use:
+* [My Discord server](https://discord.gg/AJwZuu7Ncx) - requires Discord.
+* [The Community tab](https://huggingface.co/invisietch/MiS-Firefly-v0.2-22B/discussions) - requires HF login.
+* Discord DMs to **invisietch**.
+Your feedback is how I improve these models for future versions.
+# Disclaimer
+This model is extensively uncensored. It can generate explicit, disturbing or offensive responses. Use responsibly. I am not responsible for
+your use of this model.
+This model is a finetune of Mistral Small 22B (2409) and usage must follow the terms of Mistral's license. By downloading this model, you
+agree not to use it for commercial purposes unless you have a valid Mistral commercial license. See [the base model card](https://huggingface.co/mistralai/Mistral-Small-Instruct-2409)
+for more details.
+# Prompting Format
+I'd recommend Mistral v2v3 prompting format:
 ```
+<s>[INST] User message here.[/INST] Bot response here</s>[INST] User message 2 here.
 ```
+# Sampler Settings
+I'm running the following sampler settings but this is an RC and they may not be optimal.
+- **Temperature:** Dynamic 0.7-1.1
+- **Min-P:** 0.07
+- **Rep Pen:** 1.08
+- **Rep Pen Range:** 1536
+- **XTC:** 0.1/0.15
+If you get completely incoherent responses, feel free to use these as a starting point.
+# Training Strategy
+I started with a finetune of Mistral Small 22B which had been trained on the Gutenberg dataset: [nbeerbower/Mistral-Small-Gutenberg-Doppel-22B](https://huggingface.co/nbeerbower/Mistral-Small-Gutenberg-Doppel-22B).
+The first stage of my training was a single epoch at low LR over a 474 million token text completion dataset.
+I followed this up with a coherence, decensorship & roleplay finetune over a 172 million token instruct dataset over two epochs.
+I did a slerp merge of epoch 1 into epoch 2 at a light weight which resolved the name-spelling issues on quantized versions of Firefly v0.1.
+Total training time was about 32hrs on 4x Nvidia A100 80GB.
+<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>