airoboros-7b / README.md
jondurbin's picture
Update README.md
ea9bbc1
|
raw
history blame
2.01 kB
metadata
license: other

Overview

This is a fine-tuned 7b parameter LlaMa model, fine tuned on nearly 100k synthetic instructions generated airoboros

I used a jailbreak prompt to generate the synthetic instructions this time, which resulted in some questionable training data, such as synthesizing drugs, making homemade flamethrowers, etc. Mind you, this is all generated by ChatGPT, not me, so I won't speak for any outputs the model produces.

Training data

I'm still combing through the data a bit to make sure there's nothing blatantly illegal, but I'll publish it soon.

The jailbreak prompt I used is the default prompt in the python code when using the --uncensored flag: (https://github.com/jondurbin/airoboros/blob/main/airoboros/self_instruct.py#L39)

Fine-tuning method

I used the excellent FastChat module, running with:

torchrun --nproc_per_node=8 --master_port=20001 /workspace/FastChat/fastchat/train/train_mem.py \
  --model_name_or_path /workspace/llama-7b \
  --data_path /workspace/as_conversations.json \
  --bf16 True \
  --output_dir /workspace/airoboros-uncensored-7b \
  --num_train_epochs 3 \
  --per_device_train_batch_size 24 \
  --per_device_eval_batch_size 24 \
  --gradient_accumulation_steps 2 \
  --evaluation_strategy "steps" \
  --eval_steps 1000 \
  --save_strategy "steps" \
  --save_steps 1000 \
  --save_total_limit 10 \
  --learning_rate 2e-5 \
  --weight_decay 0. \
  --warmup_ratio 0.04 \
  --lr_scheduler_type "cosine" \
  --logging_steps 1 \
  --fsdp "full_shard auto_wrap" \
  --fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
  --tf32 True \
  --model_max_length 2048 \
  --gradient_checkpointing True \
  --lazy_preprocess True

This ran on 8x nvidia 80gb a100's for about 17 hours.

License

The model is licensed under the LLaMA model, and the dataset is licensed under the terms of OpenAI because it uses ChatGPT. Everything else is free.