yi6B_Vicuna / README.md
lorinma's picture
Update README.md
54d0bf8
|
raw
history blame
2.55 kB
---
datasets:
- anon8231489123/ShareGPT_Vicuna_unfiltered
language:
- zh
- en
---
*TODO:Upload pending, training is finished. still testing.
*Update: Having a bit issue with the tokenizer, still figuring things out.
Reproduce Vicuna, but based on yi-6B. The training data I used was ShareGPT_V3_unfiltered_cleaned_split_no_imsorry.json.
Hyper parameters:
```
CUDA_VISIBLE_DEVICES=0,1,2,3,5 torchrun --nproc_per_node 5 ../supervised_finetuning.py \
--model_type auto \
--model_name_or_path /data/llm/models/Pretrained/yi-6B/01ai/Yi-6B \
--tokenizer_name_or_path /data/llm/models/Pretrained/yi-6B/01ai/Yi-6B \
--train_file_dir ../data/finetune/vicuna/ \
--per_device_train_batch_size 2\
--do_train \
--max_train_samples -1 \
--num_train_epochs 3 \
--learning_rate 2e-5 \
--weight_decay 0. \
--bf16 \
--use_peft False \
--logging_strategy steps \
--logging_steps 10 \
--save_strategy epoch \
--save_total_limit 5 \
--gradient_accumulation_steps 1 \
--preprocessing_num_workers 8 \
--output_dir ../outputs/20240106_yi6B_vicuna \
--overwrite_output_dir \
--ddp_timeout 30000 \
--logging_first_step True \
--torch_dtype bfloat16 \
--device_map auto \
--report_to tensorboard \
--ddp_find_unused_parameters False \
--gradient_checkpointing True \
--cache_dir ./cache \
--model_max_length 4096 \
--deepspeed ../deepspeed_zero_stage2_config_no16.json \
--template_name yi
```
The training used 5*A800 for 3 epochs
```
***** train metrics *****
epoch = 3.0
train_loss = 0.3785
train_runtime = 1 day, 10:01:13.95
train_samples = 93204
train_samples_per_second = 2.24
train_steps_per_second = 0.224
```
We can see from some preliminary results, the conversation is natural and informative (unsurprisingly).
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6413d7be996b2e426f230fb7/WfQYyyLxtXA2KlePmIPQJ.png)
Also we observe the unfiltering seems to be working! **Heads up** some examples are unsafe and inappropriate, this is entirely for research purposes, to test how alignment-filtered SFT data affect LLM's final output.
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6413d7be996b2e426f230fb7/pklSsljCRN34QuL2ZF2zU.png)
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6413d7be996b2e426f230fb7/22pTSVkBCVlQ5N8A8JBkF.png)