File size: 7,207 Bytes
dcf6059
 
cb00ce7
 
 
 
db90c3d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9b5691b
 
 
 
 
dcf6059
9b5691b
 
 
 
 
 
 
cb00ce7
 
9b5691b
 
cb00ce7
 
 
 
 
 
 
 
 
 
 
 
90053ed
93d5dc0
90053ed
cb00ce7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f332a68
 
cb00ce7
 
 
20dc45e
 
 
cb00ce7
 
 
 
 
 
 
f6cc7a6
 
 
 
db90c3d
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
---
license: other
datasets:
- adamo1139/rawrr_v2
- adamo1139/AEZAKMI_v3-6
- unalignment/toxic-dpo-v0.1
license_name: other
license_link: LICENSE
model-index:
- name: Yi-34B-200K-AEZAKMI-XLCTX-v3
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 64.85
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-XLCTX-v3
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 84.76
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-XLCTX-v3
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 74.48
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-XLCTX-v3
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 37.14
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-XLCTX-v3
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 81.06
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-XLCTX-v3
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 44.05
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=adamo1139/Yi-34B-200K-AEZAKMI-XLCTX-v3
      name: Open LLM Leaderboard




      
---
## NEWS

<b>This model has been renamed from adamo1139/Yi-34B-200K-AEZAKMI-XLCTX-v3 to adamo1139/Yi-34B-200K-AEZAKMI-RAW-TOXIC-XLCTX-2303 on 2024-03-30. \
I am not happy with how often this model starts enumerating lists and I plan to improve toxic dpo dataset to fix it. Due to this, I don't think it deserves to be called AEZAKMI v3 and will be just a next testing iteration of AEZAKMI RAW TOXIC. \
I think I will be uploading one EXL2 quant before moving onto a different training run.</b>


## Model description



Yi-34B 200K XLCTX base model fine-tuned on RAWrr_v2 (DPO), AEZAKMI-3-6 (SFT) and unalignment/toxic-dpo-0.1 (DPO) datasets. Training took around 20-30 hours total on RTX 3090 Ti, all finetuning was done locally.
It's like airoboros but with less gptslop, no refusals and less typical language used by RLHFed OpenAI models, with extra spicyness.
Say goodbye to  "It's important to remember"! \
Prompt format is standard chatml. Don't expect it to be good at math, riddles or be crazy smart. My end goal with AEZAKMI is to create a cozy free chatbot.
Cost of this fine-tune is about $5-$10 in electricity.
Base model used for fine-tuning was Yi-34B-200K model shared by 01.ai, the newer version that has improved long context needle in a haystack retrieval. They didn't give it a new name, giving it numbers would mess up AEZAKMI naming scheme by adding a second number, so I will be calling it XLCTX.


I had to lower max_positional_embeddings in config.json and model_max_length for training to start, otherwise I was OOMing straight away. 
This attempt had both max_position_embeddings and model_max_length set to 4096, which worked perfectly fine. I then reversed this to 200000 once I was uploading it.
I think it should keep long context capabilities of the base model.

In my testing it seems less unhinged than adamo1139/Yi-34b-200K-AEZAKMI-RAW-TOXIC-2702 and maybe a touch less uncensored, but still very much uncensored even with default system prompt "A chat."
If you want to see training scripts, let me know and I will upload them. LoRAs are uploaded [here adamo1139/Yi-34B-200K-AEZAKMI-XLCTX-v3-LoRA](https://huggingface.co/adamo1139/Yi-34B-200K-AEZAKMI-XLCTX-v3-LoRA)

## Quants!

EXL2 quants coming soon, I think I will start by uploading 4bpw quant in a few days.


## Prompt Format

I recommend using ChatML format, as this was used during fine-tune. \
Here's a prompt format you should use, you can set a different system message, model was trained on SystemChat dataset, so it should respect system prompts fine.

```
<|im_start|>system
A chat.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
```

## Intended uses & limitations

Use is limited by Yi license. \
Some datasets that were used prohibit commercial use, so please use non-commercially only.

## Known Issues

This model loves making numbered lists, to an exhaustion.


It's more of an assistant feel rather than a human feel, at least with system chat "A chat." \
Long context wasn't tested yet, it should work fine though - feel free to give me feedback about it.

## Credits

Thanks to unsloth and huggingface team for providing software packages used during fine-tuning. \
Thanks to Jon Durbin, abacusai, huggingface, sandex, NobodyExistsOnTheInternet, Nous-Research for open sourcing datasets I included in the AEZAKMI dataset. \
AEZAKMI is basically a mix of open source datasets I found on HF, so without them this would not be possible at all.

[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" alt="made with Unsloth" width="400" height="64"/>](https://github.com/unslothai/unsloth)


# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_adamo1139__Yi-34B-200K-AEZAKMI-XLCTX-v3)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |64.39|
|AI2 Reasoning Challenge (25-Shot)|64.85|
|HellaSwag (10-Shot)              |84.76|
|MMLU (5-Shot)                    |74.48|
|TruthfulQA (0-shot)              |37.14|
|Winogrande (5-shot)              |81.06|
|GSM8k (5-shot)                   |44.05|