jukofyork
/

creative-writer-v0.1-alfa-35b

Text Generation

creative-writing

creative-writer

multiplicative-lora

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jukofyork commited on Oct 25

Commit

0649029

•

1 Parent(s): f5323d3

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -112,7 +112,7 @@ but whereas `lora_A` looks at the ***input*** to the transformation for "additiv
 # Training
-- Took just under 4 days using dual-A6000 GPUs connected via NVLink, using [qlora-pipe](https://github.com/tdrussell/qlora-pipe).
 - The dataset consisted of approximately 1000 pre-2012 books converted to Markdown (~180M tokens) using the same `dataset_combination_mode = 'concatenate'` as [Llama-3-70B-Instruct-Storywriter](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter).
 - I used the same `sequence_len = 8192` and `batch_size_tokens = 8192` as [Llama-3-70B-Instruct-Storywriter](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter).

 # Training
+- Took just over 4 days using dual-A6000 GPUs connected via NVLink, using [qlora-pipe](https://github.com/tdrussell/qlora-pipe).
 - The dataset consisted of approximately 1000 pre-2012 books converted to Markdown (~180M tokens) using the same `dataset_combination_mode = 'concatenate'` as [Llama-3-70B-Instruct-Storywriter](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter).
 - I used the same `sequence_len = 8192` and `batch_size_tokens = 8192` as [Llama-3-70B-Instruct-Storywriter](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter).