|
--- |
|
datasets: |
|
- anon8231489123/ShareGPT_Vicuna_unfiltered |
|
language: |
|
- zh |
|
- en |
|
--- |
|
|
|
*TODO:Upload pending, training is finished. still testing. |
|
*Update: Having a bit issue with the tokenizer, still figuring things out. |
|
|
|
|
|
Reproduce Vicuna, but based on yi-6B. The training data I used was ShareGPT_V3_unfiltered_cleaned_split_no_imsorry.json. |
|
|
|
Hyper parameters: |
|
``` |
|
CUDA_VISIBLE_DEVICES=0,1,2,3,5 torchrun --nproc_per_node 5 ../supervised_finetuning.py \ |
|
--model_type auto \ |
|
--model_name_or_path /data/llm/models/Pretrained/yi-6B/01ai/Yi-6B \ |
|
--tokenizer_name_or_path /data/llm/models/Pretrained/yi-6B/01ai/Yi-6B \ |
|
--train_file_dir ../data/finetune/vicuna/ \ |
|
--per_device_train_batch_size 2\ |
|
--do_train \ |
|
--max_train_samples -1 \ |
|
--num_train_epochs 3 \ |
|
--learning_rate 2e-5 \ |
|
--weight_decay 0. \ |
|
--bf16 \ |
|
--use_peft False \ |
|
--logging_strategy steps \ |
|
--logging_steps 10 \ |
|
--save_strategy epoch \ |
|
--save_total_limit 5 \ |
|
--gradient_accumulation_steps 1 \ |
|
--preprocessing_num_workers 8 \ |
|
--output_dir ../outputs/20240106_yi6B_vicuna \ |
|
--overwrite_output_dir \ |
|
--ddp_timeout 30000 \ |
|
--logging_first_step True \ |
|
--torch_dtype bfloat16 \ |
|
--device_map auto \ |
|
--report_to tensorboard \ |
|
--ddp_find_unused_parameters False \ |
|
--gradient_checkpointing True \ |
|
--cache_dir ./cache \ |
|
--model_max_length 4096 \ |
|
--deepspeed ../deepspeed_zero_stage2_config_no16.json \ |
|
--template_name yi |
|
``` |
|
|
|
The training used 5*A800 for 3 epochs |
|
``` |
|
***** train metrics ***** |
|
epoch = 3.0 |
|
train_loss = 0.3785 |
|
train_runtime = 1 day, 10:01:13.95 |
|
train_samples = 93204 |
|
train_samples_per_second = 2.24 |
|
train_steps_per_second = 0.224 |
|
``` |
|
|
|
We can see from some preliminary results, the conversation is natural and informative (unsurprisingly), also we observe the unfiltering seems to be working! |
|
|
|
**Heads up** some examples are unsafe and inappropriate, this is entirely for the purpose of testing how un-aligned SFT data affect LLM's final output. |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6413d7be996b2e426f230fb7/pklSsljCRN34QuL2ZF2zU.png) |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6413d7be996b2e426f230fb7/22pTSVkBCVlQ5N8A8JBkF.png) |
|
|
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6413d7be996b2e426f230fb7/WfQYyyLxtXA2KlePmIPQJ.png) |
|
|