jukofyork
/

creative-writer-v0.1-alfa-35b

Text Generation

creative-writing

creative-writer

multiplicative-lora

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jukofyork commited on Oct 25

Commit

11394b4

•

1 Parent(s): 0649029

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -114,7 +114,10 @@ but whereas `lora_A` looks at the ***input*** to the transformation for "additiv
 - Took just over 4 days using dual-A6000 GPUs connected via NVLink, using [qlora-pipe](https://github.com/tdrussell/qlora-pipe).
 - The dataset consisted of approximately 1000 pre-2012 books converted to Markdown (~180M tokens) using the same `dataset_combination_mode = 'concatenate'` as [Llama-3-70B-Instruct-Storywriter](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter).
-- I used the same `sequence_len = 8192` and `batch_size_tokens = 8192` as [Llama-3-70B-Instruct-Storywriter](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter).
 ## `config_creative_writer.toml`

 - Took just over 4 days using dual-A6000 GPUs connected via NVLink, using [qlora-pipe](https://github.com/tdrussell/qlora-pipe).
 - The dataset consisted of approximately 1000 pre-2012 books converted to Markdown (~180M tokens) using the same `dataset_combination_mode = 'concatenate'` as [Llama-3-70B-Instruct-Storywriter](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter).
+- I used the same `sequence_len = 8192` and `batch_size_tokens = 8192` as [Llama-3-70B-Instruct-Storywriter](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter), but since I only target `down_proj` in a very specific way; I doubt this will affect the useable context length of the model...
+- I used `pipeline_stages = 2` and `"gradient_accumulation_steps": 16` to roughly match the "tokens-per-step" as [Llama-3-70B-Instruct-Storywriter](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter) used.
+- I used a much lower learning-rate of `5e-6`, as the `5e-5` value used by [Llama-3-70B-Instruct-Storywriter](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter) dropped the evaluation loss far too quickly; meaning that 90%+ of the samples weren't going to be used properly...
+- I set `lora_dropout = 0.0` as it doesn't really make sense to use with `epochs = 1`.
 ## `config_creative_writer.toml`