Error in running finetuning using autotrain without GPU locally

#6
by zonatariq39 - opened

Hi Abhishek
Please help
Every time i run the below command i get the below error
autotrain llm --train --project-name llm-1 --model abhishek/llama-2-7b-hf-small-shards --data-path . --use-peft --quantization int4 --lr 2e-4 --train-batch-size 12 --epochs 3 --trainer sft

error is
File "C:\Users\zau3\AppData\Roaming\Python\Python311\site-packages\autotrain\trainers\clm_main_.py", line 526, in
training_config = json.load(open(_args.training_config))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'llm-1training_params.json'
Traceback (most recent call last):

Here is the full error with warnings

WARNING No GPU found. Forcing training on CPU. This will be super slow!
INFO ['accelerate', 'launch', '--cpu', '-m', 'autotrain.trainers.clm', '--training_config', 'llm-1\training_params.json']
The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 0
--num_machines was set to a value of 1
--mixed_precision was set to a value of 'no'
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
C:\Users\zau3\AppData\Roaming\Python\Python311\site-packages\trl\trainer\ppo_config.py:141: UserWarning: The optimize_cuda_cache arguement will be deprecated soon, please use optimize_device_cache instead.
warnings.warn(
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "C:\Users\zau3\AppData\Roaming\Python\Python311\site-packages\autotrain\trainers\clm_main
.py", line 526, in
training_config = json.load(open(_args.training_config))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'llm-1training_params.json'
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in run_code
File "C:\Users\zau3\AppData\Roaming\Python\Python311\Scripts\accelerate.exe_main
.py", line 7, in
File "C:\Users\zau3\AppData\Roaming\Python\Python311\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "C:\Users\zau3\AppData\Roaming\Python\Python311\site-packages\accelerate\commands\launch.py", line 1023, in launch_command
simple_launcher(args)
File "C:\Users\zau3\AppData\Roaming\Python\Python311\site-packages\accelerate\commands\launch.py", line 643, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\ProgramData\anaconda3\python.exe', '-m', 'autotrain.trainers.clm', '--training_config', 'llm-1training_params.json']' returned non-zero exit status 1.

it seems like you need to update your autotrain installation :) there was a bug but that has been fixed a few days ago now

Hi Abhishek please help on this
I am running the below command after removing quantization as you mentioned before

autotrain llm --train --project-name llm101 --model abhishek/llama-2-7b-hf-small-shards --data-path . --use-peft --quantization None --lr 2e-4 --train-batch-size 12 --epochs 3 --trainer sft

below is the error

warnings.warn(
C:\Users\zau3\AppData\Roaming\Python\Python311\site-packages\torch\utils\checkpoint.py:90: UserWarning: None of the inputs have requires_grad=True. Gradients w
ill be None
ERROR | 2024-02-17 15:26:35 | autotrain.trainers.common:wrapper:91 - train has failed due to an exception: Traceback (most recent call last):
File "C:\Users\zau3\AppData\Roaming\Python\Python311\site-packages\autotrain\trainers\common.py", line 88, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\zau3\AppData\Roaming\Python\Python311\site-packages\autotrain\trainers\clm_main_.py", line 475, in train
trainer.train()
File "C:\Users\zau3\AppData\Roaming\Python\Python311\site-packages\trl\trainer\sft_trainer.py", line 331, in train
output = super().train(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\zau3\AppData\Roaming\Python\Python311\site-packages\transformers\trainer.py", line 1539, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\zau3\AppData\Roaming\Python\Python311\site-packages\transformers\trainer.py", line 1869, in inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "C:\Users\zau3\AppData\Roaming\Python\Python311\site-packages\transformers\trainer.py", line 2777, in training_step
self.accelerator.backward(loss)
File "C:\Users\zau3\AppData\Roaming\Python\Python311\site-packages\accelerate\accelerator.py", line 1966, in backward
loss.backward(**kwargs)
File "C:\Users\zau3\AppData\Roaming\Python\Python311\site-packages\torch\tensor.py", line 522, in backward
torch.autograd.backward(
File "C:\Users\zau3\AppData\Roaming\Python\Python311\site-packages\torch\autograd_init.py", line 266, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
ERROR | 2024-02-17 15:26:35 | autotrain.trainers.common:wrapper:92 - element 0 of tensors does not require grad and does not have a grad_fn
0%|

Sign up or log in to comment