adamo1139/Yi-34B-AEZAKMI-v1-exl2-4.65bpw

Model description

This is a repo for Yi-34B quantized to EXL2 format with 4.65bpw settings. Measurement.json is also in here, so you can quantize this model in different bpw provided you get a fine-tuned fp16 model by either downloading it or merging Yi-34B-Llama base with lora provided in my other repo. I don't plan on uploading GGUF version since I don't think there is demand for it. If there is, let me know and I will figure out a way to satisfy it.

Yi-34B base model fine-tuned on AEZAKMI v1 dataset. Training took around 33 hours on single local RTX 3090 Ti. It's like airoboros but with less gptslop, no refusals and less typical language used by RLHFed OpenAI models. Say goodbye to "It's important to remember"!
Prompt format is standard chatml. Don't expect it to be good at math, riddles or be crazy smart. My end goal with AEZAKMI is to create a cozy free chatbot. Cost of this fine-tune is about $3 in electricity. This was my first attempt at training Yi-34B with this dataset. Base model used for fine-tuning was 4k context Yi-34B-Llama model shared by chargoddard.

Prompt Format

I recommend using ChatML format, as this was used during fine-tune.
Here's a prompt format you should use, you can set a different system message, model seems to respect that fine, so it wasn't overfitted.

<|im_start|>system
A chat.<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Intended uses & limitations

Apache 2.0

Known Issues

I recommend to set repetition penalty to something around 1.05 to avoid repetition. So far I had good experience running this model with temperature 1.2. Multi-turn conversations could be a bit better, if you ask it to re-write something with some fixes it will have a tendency to just repeat the previous response verbatim without any improvements - this is especially noticeable with repp 1.0. There is still some gptslop left - some responses will have last paragraph with text "Remember that bla bla bla", I will try to get rid of it in the next version of the dataset. Stories have ChatGPT like paragraph spacing, I will try to introduce a bit more stories that have long paragraphs in the next dataset version.

Axolotl training parameters

bnb_4bit_use_double_quant: true
bnb_4bit_compute_dtype: torch.bfloat16
is_llama_derived_model: true
load_in_4bit: true
adapter: qlora
sequence_len: 1200
sample_packing: false
lora_r: 16
lora_alpha: 32
lora_target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- down_proj
- up_proj
lora_target_linear: true
pad_to_sequence_len: true
micro_batch_size: 1
gradient_accumulation_steps: 1
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: constant
learning_rate: 0.00007
train_on_inputs: false
group_by_length: false
bf16: true
bfloat16: true
flash_optimum: false
gradient_checkpointing: true
flash_attention: true
seed: 42

Upcoming

~~I will release adapter files and maybe exllama v2 quant shortly.~~
LoRA and exl2 quant has been released