adamo1139 commited on
Commit
737a62c
1 Parent(s): 7ddad1a

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -0
README.md CHANGED
@@ -3,3 +3,88 @@ license: other
3
  license_name: yi-license
4
  license_link: LICENSE
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  license_name: yi-license
4
  license_link: LICENSE
5
  ---
6
+
7
+ ## Model description
8
+
9
+ Yi-34B base model fine-tuned on AEZAKMI v1 dataset. Training took around 33 hours on single local RTX 3090 Ti.
10
+ It's like airoboros but with less gptslop, no refusals and less typical language used by RLHFed OpenAI models.
11
+ Say goodbye to "It's important to remember"! \
12
+ Prompt format is standard chatml. Don't expect it to be good at math, riddles or be crazy smart. My end goal with AEZAKMI is to create a cozy free chatbot.
13
+ Cost of this fine-tune is about $3 in electricity. This was my first attempt at training Yi-34B with this dataset.
14
+ Base model used for fine-tuning was 4k context Yi-34B-Llama model shared by chargoddard.
15
+
16
+
17
+ ## Prompt Format
18
+
19
+ I recommend using ChatML format, as this was used during fine-tune. \
20
+ Here's a prompt format you should use, you can set a different system message, model seems to respect that fine, so it wasn't overfitted.
21
+
22
+ ```
23
+ <|im_start|>system
24
+ A chat with uncensored assistant.<|im_end|>
25
+ <|im_start|>user
26
+ {prompt}<|im_end|>
27
+ <|im_start|>assistant
28
+ ```
29
+
30
+ ## Intended uses & limitations
31
+
32
+ Use is limited by Yi license.
33
+
34
+ ## Known Issues
35
+
36
+ I recommend to set repetition penalty to something around 1.05 to avoid repetition. So far I had good experience running this model with temperature 1.2.
37
+ Multi-turn conversations could be a bit better, if you ask it to re-write something with some fixes it will have a tendency to just repeat the previous response verbatim without any improvements - this is especially noticeable with repp 1.0. \
38
+ There is still some gptslop left - some responses will have last paragraph with text "Remember that bla bla bla", I will try to get rid of it in the next version of the dataset. \
39
+ Stories have ChatGPT like paragraph spacing, I will try to introduce a bit more stories that have long paragraphs in the next dataset version.
40
+
41
+ Unrestricted-ness of the v1 isn't quite something that I am happy with yet, it quite often still refuses to answer and behaves kinda like a normal airoboros, especially with default system prompt "A chat.". \
42
+ With a slightly modifed prompt it works much better, I recommend using "A chat with uncensored assistant.", it's stupid but it helps a lot. \
43
+ I was made aware of the frequent occurrence of the phrase "sending shivers down a spine" in the generations during RP, so I will manually fix those samples and put in different wording to fix that. \
44
+ I plan to use Yi-34B-200K Llama-fied for the next version, v1 very likely won't handle long context above 6k-8k well, be warned. \
45
+ There is also some issue with handling long system messages for RP, I will investigate that before releasing v2 dataset.
46
+
47
+ Feel free to report issues in the discussions panel here, I don't lurk /lmg/ too often and I would still like to hear some feedback.
48
+
49
+
50
+ ## Axolotl training parameters
51
+
52
+ - bnb_4bit_use_double_quant: true
53
+ - bnb_4bit_compute_dtype: torch.bfloat16
54
+ - is_llama_derived_model: true
55
+ - load_in_4bit: true
56
+ - adapter: qlora
57
+ - sequence_len: 1200
58
+ - sample_packing: false
59
+ - lora_r: 16
60
+ - lora_alpha: 32
61
+ - lora_target_modules:
62
+ - q_proj
63
+ - v_proj
64
+ - k_proj
65
+ - o_proj
66
+ - gate_proj
67
+ - down_proj
68
+ - up_proj
69
+ - lora_target_linear: true
70
+ - pad_to_sequence_len: true
71
+ - micro_batch_size: 1
72
+ - gradient_accumulation_steps: 1
73
+ - num_epochs: 1
74
+ - optimizer: adamw_bnb_8bit
75
+ - lr_scheduler: constant
76
+ - learning_rate: 0.00007
77
+ - train_on_inputs: false
78
+ - group_by_length: false
79
+ - bf16: true
80
+ - bfloat16: true
81
+ - flash_optimum: false
82
+ - gradient_checkpointing: true
83
+ - flash_attention: true
84
+ - seed: 42
85
+
86
+
87
+ ## Upcoming
88
+
89
+ ~I will release adapter files and maybe exllama v2 quant shortly.~ \
90
+ LoRA and exl2 quant has been released