rAIfle commited on
Commit
a0fb734
1 Parent(s): 88bc75d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ```
2
+ e88 88e d8
3
+ d888 888b 8888 8888 ,"Y88b 888 8e d88
4
+ C8888 8888D 8888 8888 "8" 888 888 88b d88888
5
+ Y888 888P Y888 888P ,ee 888 888 888 888
6
+ "88 88" "88 88" "88 888 888 888 888
7
+ b
8
+ 8b,
9
+
10
+ e88'Y88 d8 888
11
+ d888 'Y ,"Y88b 888,8, d88 ,e e, 888
12
+ C8888 "8" 888 888 " d88888 d88 88b 888
13
+ Y888 ,d ,ee 888 888 888 888 , 888
14
+ "88,d88 "88 888 888 888 "YeeP" 888
15
+
16
+ PROUDLY PRESENTS
17
+ ```
18
+ # Llama-3-70B-Instruct-Storywriter-exl2-rpcal
19
+ Quantized using 200 samples of 8192 tokens from an RP-oriented [PIPPA](https://huggingface.co/datasets/royallab/PIPPA-cleaned) dataset.
20
+
21
+ Branches:
22
+ - `main` -- `measurement.json`
23
+ - `2.25b6h` -- 2.25bpw, 6bit lm_head
24
+ - `3.5b6h` -- 3.5bpw, 6bit lm_head
25
+ - `3.75b6h` -- 3.75bpw, 6bit lm_head
26
+ - `4.5b6h` -- 4.5bpw, 6bit lm_head
27
+ - `4.65b6h` -- 4.65bpw, 6bit lm_head
28
+ - `6b6h` -- 6bpw, 6bit lm_head
29
+ - `8b8h` -- 8bpw, 8bit lm_head
30
+
31
+ Original model link: [tdrussell/Llama-3-70B-Instruct-Storywriter](https://huggingface.co/tdrussell/Llama-3-70B-Instruct-Storywriter)
32
+
33
+ Original model README below.
34
+
35
+ -----
36
+
37
+ # Llama 3 70B Instruct Storywriter
38
+ Llama 3 70B Instruct, further finetuned on a dataset consisting of books in the fiction genre.
39
+
40
+ This was just an experiment, but it turned out well enough that I'm sharing it. The finetuning has caused a significant shift in the model's writing style, and seems to have made it more creative. There may be a slight decrease in overall intelligence.
41
+
42
+ Because this was trained on Instruct, you can use the normal Instruct chat formatting. It may also work well in raw completion mode.
43
+
44
+ ## Training details
45
+ Trained on 4 4090s using [qlora-pipe](https://github.com/tdrussell/qlora-pipe).
46
+ Dataset consists of about 800 books in the fiction genre, totaling 570 MB of raw text.
47
+ Rank 64 QLoRA trained at 8192 sequence length.
48
+ ### Evaluation metrics
49
+
50
+ <img src="https://i.imgur.com/sCMjix4.png" width="800" />
51
+
52
+ ## Why no 8B?
53
+ I tried multiple times to train this on Llama 3 8B Instruct, using a variety of hyperparameters. It never worked well. The model took a huge hit to intelligence every time, to the point of being unusable. 70B fared much better. I don't know why, maybe 8B is just too small for this type of technique, and loses too much of the instruction-tuned smarts.