Update README.md
Browse files
README.md
CHANGED
@@ -12,8 +12,16 @@ An experimental model, fine-tuned using the ["multiplicative-LoRA" method](#the-
|
|
12 |
Other experimental models, based off `creative-writer-v0.1-alfa-35b` that attempt to encourage more diverse/creative text generation:
|
13 |
|
14 |
- [creative-writer-v0.1-bravo-35b](https://huggingface.co/jukofyork/creative-writer-v0.1-bravo-35b) - Scaled the pre-softmax logits by `1.1` during training (and then reset after training).
|
15 |
-
- [creative-writer-v0.1-charlie-35b](https://huggingface.co/jukofyork/creative-writer-v0.1-charlie-35b) - Scaled the pre-softmax logits by `0.9` during training (and didn't reset after training).
|
16 |
-
- [creative-writer-v0.1-delta-35b](https://huggingface.co/jukofyork/creative-writer-v0.1-delta-35b) - Trained using [Focal Loss](https://arxiv.org/abs/1708.02002) with `gamma=2` (instead of stock [Cross Entropy Loss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html)).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
---
|
19 |
|
@@ -27,7 +35,7 @@ instead of the normal "addative-LoRA" method of:
|
|
27 |
|
28 |
`h = (tensor + lora_B @ lora_A) @ x = tensor @ x + lora_B @ lora_A @ x`
|
29 |
|
30 |
-
I only apply this to the `down_proj` matrices, and
|
31 |
|
32 |
This currently requires hacking [PEFT's layer.py](https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/layer.py) like so:
|
33 |
|
@@ -63,6 +71,7 @@ tensor = tensor.to(old_type)
|
|
63 |
|
64 |
- Took just under 4 days using dual-A6000 GPUs connected via NVLink, using [qlora-pipe](https://github.com/tdrussell/qlora-pipe).
|
65 |
- The dataset consisted of approximately 1000 pre-2012 books converted to Markdown (~180M tokens) using the same `dataset_combination_mode = 'concatenate'` as [Llama-3-70B-Instruct-Storywriter](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter).
|
|
|
66 |
|
67 |
## `config_creative_writer.toml`
|
68 |
|
@@ -128,4 +137,11 @@ eval_size = 0.01
|
|
128 |
"gradient_clipping": 1.0,
|
129 |
"steps_per_print": 1
|
130 |
}
|
131 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
12 |
Other experimental models, based off `creative-writer-v0.1-alfa-35b` that attempt to encourage more diverse/creative text generation:
|
13 |
|
14 |
- [creative-writer-v0.1-bravo-35b](https://huggingface.co/jukofyork/creative-writer-v0.1-bravo-35b) - Scaled the pre-softmax logits by `1.1` during training (and then reset after training).
|
15 |
+
- **[CURRENTLY UPLOADING...]** [creative-writer-v0.1-charlie-35b](https://huggingface.co/jukofyork/creative-writer-v0.1-charlie-35b) - Scaled the pre-softmax logits by `0.9` during training (and didn't reset after training).
|
16 |
+
- **[CURRENTLY TRAINING...]** [creative-writer-v0.1-delta-35b](https://huggingface.co/jukofyork/creative-writer-v0.1-delta-35b) - Trained using [Focal Loss](https://arxiv.org/abs/1708.02002) with `gamma=2` (instead of stock [Cross Entropy Loss](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html)).
|
17 |
+
|
18 |
+
---
|
19 |
+
|
20 |
+
# Usage
|
21 |
+
|
22 |
+
- Use the normal `command-r` chat template: `'<|START_OF_TURN_TOKEN|><|USER_TOKEN|>prompt<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>reply...'`.
|
23 |
+
- I suggest using **no system prompt** with this (and all other `Cohere` models!), as it writes *much* better without IMO...
|
24 |
+
- You **must used some small value of min-p** with this (and the original `c4ai-command-r-v01` model!), or the model will output gibberish!
|
25 |
|
26 |
---
|
27 |
|
|
|
35 |
|
36 |
`h = (tensor + lora_B @ lora_A) @ x = tensor @ x + lora_B @ lora_A @ x`
|
37 |
|
38 |
+
I only apply this to the `down_proj` matrices, and skipped the last layer's `down_proj` matrix in the same way as [creative-writing-control-vectors-v3.0](https://huggingface.co/jukofyork/creative-writing-control-vectors-v3.0).
|
39 |
|
40 |
This currently requires hacking [PEFT's layer.py](https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/layer.py) like so:
|
41 |
|
|
|
71 |
|
72 |
- Took just under 4 days using dual-A6000 GPUs connected via NVLink, using [qlora-pipe](https://github.com/tdrussell/qlora-pipe).
|
73 |
- The dataset consisted of approximately 1000 pre-2012 books converted to Markdown (~180M tokens) using the same `dataset_combination_mode = 'concatenate'` as [Llama-3-70B-Instruct-Storywriter](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter).
|
74 |
+
- I used the same `sequence_len = 8192` and `batch_size_tokens = 8192` as [Llama-3-70B-Instruct-Storywriter](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter).
|
75 |
|
76 |
## `config_creative_writer.toml`
|
77 |
|
|
|
137 |
"gradient_clipping": 1.0,
|
138 |
"steps_per_print": 1
|
139 |
}
|
140 |
+
```
|
141 |
+
|
142 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65995c45539c808e84c38bf1/DcGilkmIa7wBQJIhCWbHP.png)
|
143 |
+
|
144 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65995c45539c808e84c38bf1/TnsnTqtAd9S3JE8VacxN6.png)
|
145 |
+
|
146 |
+
![image/png](https://cdn-uploads.huggingface.co/production/uploads/65995c45539c808e84c38bf1/Ly3Y4TK1S2TsTCLEslzZ2.png)
|
147 |
+
|