Text Generation
Transformers
PyTorch
Chinese
English
llama
text-generation-inference

训练的时候loss为0

#5
by deerluffy - opened

fastchat v0.2.5

运行命令:

deepspeed fastchat/train/train.py --num_train_epochs 3 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 1 --evaluation_strategy "no" --save_strategy "steps" --save_steps 1000 --save_total_limit 2 --learning_rate 2e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type "cosine" --logging_steps 1 --tf32 False --fp16 True --model_max_length 2048 --gradient_checkpointing True --lazy_preprocess True --deepspeed /data/xxx/Llama-X/src/configs/deepspeed_config.json

log

[bjrw-platform-kube-node-di-gpu055:44105:44977 [0] NCCL INFO Launch mode Parallel
{'loss': 0.0, 'learning_rate': 1.5037593984962406e-07, 'epoch': 0.0}
0%| | 1/4410 [00:30<37:29:28, 30.61s/it]WARNING: tokenization mismatch: 422 vs. 430. (ignored)
{'loss': 0.0, 'learning_rate': 3.007518796992481e-07, 'epoch': 0.0}
0%| | 2/4410 [00:56<33:55:30, 27.71s/it]WARNING: tokenization mismatch: 312 vs. 319. (ignored)
{'loss': 0.0, 'learning_rate': 4.511278195488722e-07, 'epoch': 0.0}
0%| | 3/4410 [01:21<32:39:38, 26.68s/it]WARNING: tokenization mismatch: 386 vs. 394. (ignored)
{'loss': 0.0, 'learning_rate': 6.015037593984962e-07, 'epoch': 0.0}
0%| | 4/4410 [01:46<31:54:57, 26.08s/it]WARNING: tokenization mismatch: 357 vs. 362. (ignored)
{'loss': 0.0, 'learning_rate': 7.518796992481203e-07, 'epoch': 0.0}
0%|▏ | 5/4410 [02:13<32:09:14, 26.28s/it]WARNING: tokenization mismatch: 218 vs. 220. (ignored)
{'loss': 0.0, 'learning_rate': 9.022556390977444e-07, 'epoch': 0.0}
0%|▏ | 6/4410 [02:39<32:00:17, 26.16s/it]WARNING: tokenization mismatch: 137 vs. 139. (ignored)](WARNING: tokenization mismatch: 279 vs. 286. (ignored)
{'loss': 0.0, 'learning_rate': 3.7593984962406014e-06, 'epoch': 0.02}
1%|▋ | 25/4410 [10:29<28:45:22, 23.61s/it]WARNING: tokenization mismatch: 282 vs. 285. (ignored)
{'loss': 0.0, 'learning_rate': 3.909774436090225e-06, 'epoch': 0.02}
1%|▋ | 26/4410 [10:52<28:28:15, 23.38s/it]WARNING: tokenization mismatch: 426 vs. 434. (ignored))

是因为词表的问题导致数据都被忽略了吗?

这个问题是由于tokenizer编码错误,导致数据都被跳过了。我也遇到了这个问题,并重写了tokenize的函数以避免编码错误(可以参考https://huggingface.co/fireballoon/baichuan-vicuna-7b/blob/main/train_vicuna.py 中的tokenize函数,对应fastchat train中的preprocess函数)

fireballoon changed discussion status to closed

Sign up or log in to comment