Edit model card

Bug: Having a bit issue with the tokenizer, still figuring out...You can use the original Yi tokenizer configuratin.

Reproduce Vicuna, but based on yi-6B. The training data I used was ShareGPT_V3_unfiltered_cleaned_split_no_imsorry.json.

The training framework I used https://github.com/shibing624/MedicalGPT , train shell:

CUDA_VISIBLE_DEVICES=0,1,2,3,5 torchrun --nproc_per_node 5 ../supervised_finetuning.py \
    --model_type auto \
    --model_name_or_path /data/llm/models/Pretrained/yi-6B/01ai/Yi-6B \
    --tokenizer_name_or_path /data/llm/models/Pretrained/yi-6B/01ai/Yi-6B \
    --train_file_dir ../data/finetune/vicuna/ \
    --per_device_train_batch_size 2\
    --do_train \
    --max_train_samples -1 \
    --num_train_epochs 3 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --bf16 \
    --use_peft False \
    --logging_strategy steps \
    --logging_steps 10 \
    --save_strategy epoch \
    --save_total_limit 5 \
    --gradient_accumulation_steps 1 \
    --preprocessing_num_workers 8 \
    --output_dir ../outputs/20240106_yi6B_vicuna \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --torch_dtype bfloat16 \
    --device_map auto \
    --report_to tensorboard \
    --ddp_find_unused_parameters False \
    --gradient_checkpointing True \
    --cache_dir ./cache \
    --model_max_length 4096 \
    --deepspeed ../deepspeed_zero_stage2_config_no16.json \
    --template_name yi   

The training used 5*A800 for 3 epochs

***** train metrics *****
  epoch                    =                3.0
  train_loss               =             0.3785
  train_runtime            = 1 day, 10:01:13.95
  train_samples            =              93204
  train_samples_per_second =               2.24
  train_steps_per_second   =              0.224

Post-training inference is also using this repository:

CUDA_VISIBLE_DEVICES=4 python gradio_demo.py  --model_type auto --base_model /data/mn/shibing624/MedicalGPT-1.6.3-231215/outputs/20240106_yi6B_vicuna    --tokenizer_path /data/mn/shibing624/MedicalGPT-1.6.3-231215/outputs/20240106_yi6B_vicuna --template_name yi --gpus 4
CUDA_VISIBLE_DEVICES=6 python inference.py --model_type auto --base_model /data/mn/shibing624/MedicalGPT-1.6.3-231215/outputs/20240106_yi6B_vicuna      --template_name yi --gpus 6     --interactive --tokenizer_path /data/llm/models/Pretrained/yi-6B/01ai/Yi-6B

We can see from some preliminary results, the conversation is natural and informative (unsurprisingly).

image/png

Also we observe the unfiltering seems to be working! Heads up some examples are unsafe and inappropriate, this is entirely for research purposes, to test how alignment-filtered SFT data affect LLM's final output.

image/png

image/png

Update: Evaluate on Open LLM Leaderboard:

image/png

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 51.02
AI2 Reasoning Challenge (25-Shot) 46.16
HellaSwag (10-Shot) 69.30
MMLU (5-Shot) 58.43
TruthfulQA (0-shot) 48.11
Winogrande (5-shot) 65.67
GSM8k (5-shot) 18.42
Downloads last month
2,968
Safetensors
Model size
6.06B params
Tensor type
BF16
·

Dataset used to train lorinma/yi6B_Vicuna

Evaluation results