HF transformers fine-tune code hangs with Llama3 ?

#48
by teddyyyy123 - opened

we used our existing fine-tune code, which worked with llama1 and llama2 base models

    trainer = Trainer(
        model=model,
        tokenizer=tokenizer,
        args=training_args,
        **data_module,
        callbacks=[ManifoldTensorBoardLoggerCallback()],
    )
    trainer.train()

but once the trainer starts fine-tuning from a llama3-8B, it barely makes any progress ("only prints the 0% on the progress status once, and then never updates it) after 5 hours. previously with llama2-7B, it runs through 40% of our examples within 25 minutes

Yes, I am also experiencing this issue.

I'm able to fine-tune Llama3 using Accelerate and DeepSpeed ZeRO-2. However, the resulting model doesn't know how to stop generating properly. It spews garbage after answering my question—until max_new_tokens is reached....just like Phi-2. The same training script works flawlessly with Phi-3 and Mistral-7B, though.

Sign up or log in to comment