tastelikefeet's picture
Update README.md
941d4ed verified
metadata
frameworks:
  - Pytorch
license: apache-2.0
tasks:
  - text-generation

Fine-tuning the qwen2-7b-instruct model using the msagent-pro dataset and the loss_scale technique with swift, the script is as follows:

NPROC_PER_NODE=8 \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
MASTER_PORT=29500 \
swift sft \
    --model_type qwen2-7b-instruct \
    --learning_rate 2e-6 \
    --sft_type full \
    --dataset msagent-pro \
    --gradient_checkpointing true \
    --gradient_accumulation_steps 8 \
    --deepspeed default-zero3 \
    --use_loss_scale true \
    --save_strategy epoch \
    --batch_size 1 \
    --num_train_epochs 1 \
    --max_length 4096 \
    --preprocess_num_proc 4 \
    --use_loss_scale true \
    --loss_scale_config_path agent-flan \
    --ddp_backend nccl \

Comparison with the Original Model on the ToolBench Evaluation Set

Model ToolBench (in-domain) ToolBench (out-of-domain)
Plan.EM Act.EM HalluRate (lower is better) Avg.F1 R-L Plan.EM Act.EM HalluRate (lower is better) Avg.F1
llama3-8b-instruct 74.11 54.74 4.16 46.53 8.51 73.17 57.67 3.84 48.58
llama3-8b-agent-instruct-v2 83.37 60.01 2.58 54.41 26.34 82.57 60.14 1.79 55.25

For detailed explanations of the evaluation metrics, please refer to document

deploy this model:

USE_HF=True swift deploy \
  --model_id_or_path modelscope/qwen2-7b-agent-instruct \
  --model_type qwen2-7b-instruct \
  --infer_backend vllm \
  --tools_prompt toolbench