adamo1139 commited on
Commit
411a918
1 Parent(s): 6dd3234

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md CHANGED
@@ -2,4 +2,84 @@
2
  license: other
3
  license_name: yi-license
4
  license_link: LICENSE
 
 
 
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: other
3
  license_name: yi-license
4
  license_link: LICENSE
5
+ datasets:
6
+ - adamo1139/AEZAKMI_v2
7
+ - adamo1139/rawrr_v1
8
  ---
9
+ ## Model Description
10
+
11
+ EXPERIMENTAL MODEL, NOT FINAL, IT HAS SOME ISSUES, BUT IT'S REALLY COOL WHEN IT WORKS
12
+
13
+ The biggest change from my previous AEZAKMI models is that this one is much much less likely to refuse completing request!
14
+
15
+ Yi-34B 200K base model fine-tuned on RAWrr v1 dataset via DPO and then fine-tuned on AEZAKMI v2 dataset via SFT. DPO training took around 6 hours, SFT took around 18 hours.
16
+ I used `unsloth` for both stages.
17
+ It's like airoboros but with less gptslop, no refusals and less typical language used by RLHFed OpenAI models.
18
+ Say goodbye to "It's important to remember"! \
19
+ Prompt format is standard chatml. Don't expect it to be good at math, riddles or be crazy smart. My end goal with AEZAKMI is to create a cozy free chatbot.
20
+ Base model used for fine-tuning was 200k context Yi-34B-Llama model shared by larryvrh.
21
+
22
+ Training was done with max_position_embeddings set at 4096. Then it was reverted back to 200K after applying LoRA.
23
+
24
+ ## Prompt Format
25
+
26
+ I recommend using ChatML format, as this was used during fine-tune. \
27
+ Here's a prompt format you should use, you can set a different system message, model seems to respect that fine, so it wasn't overfitted.
28
+ Both `A chat.` and `A chat with uncensored assistant.` system prompt work fine and are pretty refusal-free.
29
+
30
+ ```
31
+ <|im_start|>system
32
+ A chat with uncensored assistant.<|im_end|>
33
+ <|im_start|>user
34
+ {prompt}<|im_end|>
35
+ <|im_start|>assistant
36
+ ```
37
+
38
+ ## Intended uses & limitations
39
+
40
+ It's a chat model, not a base completion-only one.
41
+ Use is limited by Yi license. Since no-robots dataset was used for making rawrr_v1, I guess you maybe shouldn't use it for commercial activities.
42
+
43
+ ## Known Issues
44
+
45
+ I recommend to set repetition penalty to something around 1.05 to avoid repetition. So far I had somewhat good experience running this model with temperature 1.0-1.2.
46
+
47
+ One big issue I noticed is that I think I set too small of a learning rate for SFT fine-tuning. Sometimes completion-mode shines through and responses are moreso completion-like rather than being instruct response.
48
+ Other small issue is that when you enter a prompt that might have resulted with refusal in a previous model, the response will be more free-form and probably will have a touch of completion in it.
49
+ So far, it seems like the strongest anti-refusal bias is at 0 ctx - the first prompt. But it's also present, albeit a little bit less, further down. I plan to expand rawrr dataset and include more samples without system prompt, this should help here.
50
+
51
+
52
+ ## Unsloth training parameters DPO Stage
53
+
54
+ - lora_r: 16
55
+ - lora_alpha: 32
56
+ - max_length: 500
57
+ - learning_rate: 0.00005
58
+ - lr_scheduler_type: "linear"
59
+ - target_modules: ["q_proj", "k_proj", "v_proj", "o_proj",
60
+ "gate_proj", "up_proj", "down_proj",]
61
+ - gradient_accumulation_steps: 16
62
+ - per_device_batch_size: 1
63
+ - num_train_epochs: 1
64
+
65
+ Script used for DPO training can be found here:
66
+ https://huggingface.co/adamo1139/Yi-34B-200K-rawrr1-LORA-DPO-experimental-r3/blob/main/yi-34b-dpo-unsloth-1.py
67
+
68
+ ## Unsloth training parameters SFT Stage
69
+
70
+ - lora_r: 16
71
+ - lora_alpha: 32
72
+ - max_length: 2200
73
+ - learning_rate: 0.00006
74
+ - lr_scheduler_type: "cosine
75
+ - lr_scheduler_kwargs: {
76
+ "num_cycles" : 0.3,
77
+ }
78
+ - target_modules: ["q_proj", "k_proj", "v_proj", "o_proj",
79
+ "gate_proj", "up_proj", "down_proj",]
80
+ - gradient_accumulation_steps: 1
81
+ - per_device_batch_size: 1
82
+ - num_train_epochs: 1.4
83
+
84
+ Script used for SFT training can be found here:
85
+ https://huggingface.co/adamo1139/Yi-34B-200K-AEZAKMI-RAW-2301-LoRA/blob/main/yi-34b-aezakmi-sft-1-hf.py