Triangle104
/

L3.1-8B-Slush-v1.1-Q4_K_S-GGUF

@@ -1,5 +1,5 @@
 ---
-license: llama3.1
 license_name: llama3
 license_link: LICENSE
 library_name: transformers
@@ -22,73 +22,6 @@ base_model: crestf411/L3.1-8B-Slush-v1.1
 This model was converted to GGUF format from [`crestf411/L3.1-8B-Slush-v1.1`](https://huggingface.co/crestf411/L3.1-8B-Slush-v1.1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/crestf411/L3.1-8B-Slush-v1.1) for more details on the model.
----
-Model details:
--
-Slush is a two-stage model trained with high LoRA dropout, where stage 1 is a pretraining continuation on the base model, aimed at boosting the model's creativity and writing capabilities. This is then merged into the instruction tune model, and stage 2 is a fine tuning step on top of this to further enhance its roleplaying capabilities and/or to repair any damage caused in the stage 1 merge.
-This is an initial experiment done on the at-this-point-infamous Llama 3.1 8B model, in an attempt to retain its smartness while addressing its abysmal lack of imagination/creativity. As always, feedback is welcome, and begone if you demand perfection.
-The second stage, like the Sunfall series, follows the Silly Tavern preset, so ymmv in particular if you use some other tool and/or preset.
-This update (v1.1) addresses some of the feedback from the first iteration by ramping down the training parameters, and also introduces a custom merge using mergekit.
-Parameter suggestions:
--
-I did all my testing with temp 1, min-p 0.1, DRY 0.8. I enabled XTC at higher contexts.
-Training details:
--
-    Stage 1 (continued pretraining)
-        Target: meta-llama/Llama-3.1-8B (resulting LoRA merged into meta-llama/Llama-3.1-8B-Instruct)
-        LoRA dropout 0.5 (motivation)
-        LoRA rank 64, alpha 128 (motivation)
-        LR cosine 4e-6
-        LoRA+ with LR Ratio: 15
-        Context size: 16384
-        Gradient accumulation steps: 4
-        Epochs: 1
-    Stage 2 (fine tune)
-        Target: Stage 1 model
-        LoRA dropout 0.5
-        LoRA rank 32, alpha 64
-        LR cosine 5e-6 (min 5e-7)
-        LoRA+ with LR Ratio: 15
-        Context size: 16384
-        Gradient accumulation steps: 4
-        Epochs: 2
-Merge Method
--
-This model was merged using the TIES merge method using meta-llama/Llama-3.1-8B as a base.
-Configuration
-The following YAML configuration was used to produce this model:
-models:
-  - model: stage1-on-instruct
-    parameters:
-      weight: 1.5
-      density: 1
-  - model: stage2-on-stage1
-    parameters:
-      weight: 1.5
-      density: 1
-  - model: meta-llama/Llama-3.1-8B-Instruct
-    parameters:
-      weight: 1
-      density: 1
-merge_method: ties
-base_model: meta-llama/Llama-3.1-8B
-parameters:
-  weight: 1
-  density: 1
-  normalize: true
-  int8_mask: true
-tokenizer_source: meta-llama/Llama-3.1-8B-Instruct
-dtype: bfloat16
----
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)
@@ -127,4 +60,4 @@ Step 3: Run inference through the main binary.
 or
 ```
 ./llama-server --hf-repo Triangle104/L3.1-8B-Slush-v1.1-Q4_K_S-GGUF --hf-file l3.1-8b-slush-v1.1-q4_k_s.gguf -c 2048
-```

 ---
+license: llama3
 license_name: llama3
 license_link: LICENSE
 library_name: transformers
 This model was converted to GGUF format from [`crestf411/L3.1-8B-Slush-v1.1`](https://huggingface.co/crestf411/L3.1-8B-Slush-v1.1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/crestf411/L3.1-8B-Slush-v1.1) for more details on the model.
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)
 or
 ```
 ./llama-server --hf-repo Triangle104/L3.1-8B-Slush-v1.1-Q4_K_S-GGUF --hf-file l3.1-8b-slush-v1.1-q4_k_s.gguf -c 2048
+```