Triangle104 commited on
Commit
df1a35d
1 Parent(s): cf6db69

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +0 -67
README.md CHANGED
@@ -22,73 +22,6 @@ base_model: crestf411/L3.1-8B-Slush-v1.1
22
  This model was converted to GGUF format from [`crestf411/L3.1-8B-Slush-v1.1`](https://huggingface.co/crestf411/L3.1-8B-Slush-v1.1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
23
  Refer to the [original model card](https://huggingface.co/crestf411/L3.1-8B-Slush-v1.1) for more details on the model.
24
 
25
- ---
26
- Model details:
27
- -
28
- Slush is a two-stage model trained with high LoRA dropout, where stage 1 is a pretraining continuation on the base model, aimed at boosting the model's creativity and writing capabilities. This is then merged into the instruction tune model, and stage 2 is a fine tuning step on top of this to further enhance its roleplaying capabilities and/or to repair any damage caused in the stage 1 merge.
29
-
30
- This is an initial experiment done on the at-this-point-infamous Llama 3.1 8B model, in an attempt to retain its smartness while addressing its abysmal lack of imagination/creativity. As always, feedback is welcome, and begone if you demand perfection.
31
-
32
- The second stage, like the Sunfall series, follows the Silly Tavern preset, so ymmv in particular if you use some other tool and/or preset.
33
-
34
- This update (v1.1) addresses some of the feedback from the first iteration by ramping down the training parameters, and also introduces a custom merge using mergekit.
35
-
36
- Parameter suggestions:
37
- -
38
- I did all my testing with temp 1, min-p 0.1, DRY 0.8. I enabled XTC at higher contexts.
39
-
40
- Training details:
41
- -
42
- Stage 1 (continued pretraining)
43
- Target: meta-llama/Llama-3.1-8B (resulting LoRA merged into meta-llama/Llama-3.1-8B-Instruct)
44
- LoRA dropout 0.5 (motivation)
45
- LoRA rank 64, alpha 128 (motivation)
46
- LR cosine 4e-6
47
- LoRA+ with LR Ratio: 15
48
- Context size: 16384
49
- Gradient accumulation steps: 4
50
- Epochs: 1
51
- Stage 2 (fine tune)
52
- Target: Stage 1 model
53
- LoRA dropout 0.5
54
- LoRA rank 32, alpha 64
55
- LR cosine 5e-6 (min 5e-7)
56
- LoRA+ with LR Ratio: 15
57
- Context size: 16384
58
- Gradient accumulation steps: 4
59
- Epochs: 2
60
-
61
- Merge Method
62
- -
63
- This model was merged using the TIES merge method using meta-llama/Llama-3.1-8B as a base.
64
- Configuration
65
-
66
- The following YAML configuration was used to produce this model:
67
-
68
- models:
69
- - model: stage1-on-instruct
70
- parameters:
71
- weight: 1.5
72
- density: 1
73
- - model: stage2-on-stage1
74
- parameters:
75
- weight: 1.5
76
- density: 1
77
- - model: meta-llama/Llama-3.1-8B-Instruct
78
- parameters:
79
- weight: 1
80
- density: 1
81
- merge_method: ties
82
- base_model: meta-llama/Llama-3.1-8B
83
- parameters:
84
- weight: 1
85
- density: 1
86
- normalize: true
87
- int8_mask: true
88
- tokenizer_source: meta-llama/Llama-3.1-8B-Instruct
89
- dtype: bfloat16
90
-
91
- ---
92
  ## Use with llama.cpp
93
  Install llama.cpp through brew (works on Mac and Linux)
94
 
 
22
  This model was converted to GGUF format from [`crestf411/L3.1-8B-Slush-v1.1`](https://huggingface.co/crestf411/L3.1-8B-Slush-v1.1) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
23
  Refer to the [original model card](https://huggingface.co/crestf411/L3.1-8B-Slush-v1.1) for more details on the model.
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ## Use with llama.cpp
26
  Install llama.cpp through brew (works on Mac and Linux)
27