Problem with finetuning with Axolotl (qlora, lora, and FFT)

by Undi95 - opened Mar 16

Mar 16

•

Hello, I have a hard time trying to finetune this model with Axolotl.
I use 1xA100 and 125GB of RAM, and when Axolotl load it, 4 bit, 8 bit or full model, it get past the 125GB of RAM and don't even load the model on VRAM, so I get an Out Of Memory crash on every try.
Is the model have an issue? Or the config have an issue? I don't really know.

I was excited because 9B with 200k context sounded great, but it's impossible to train anything on my side.
I tried last version of axolotl, some old commit, etc... But no luck.

Maybe it's a problem on axolotl side too?
Thanks for any replies!

Edit:
6B work tho (batch size one for the test, qlora, load in 4bit, 4096 ctx)

cgato

Mar 16

I had the same issue doing QLoRA at 4096 context. It crashed with a generic error. Weird because I thought this model was llama compatible.

itsliupeng

01-ai org Mar 17

Hello, I have a hard time trying to finetune this model with Axolotl.
I use 1xA100 and 125GB of RAM, and when Axolotl load it, 4 bit, 8 bit or full model, it get past the 125GB of RAM and don't even load the model on VRAM, so I get an Out Of Memory crash on every try.
Is the model have an issue? Or the config have an issue? I don't really know.

I was excited because 9B with 200k context sounded great, but it's impossible to train anything on my side.
I tried last version of axolotl, some old commit, etc... But no luck.

Maybe it's a problem on axolotl side too?
Thanks for any replies!

Edit:
6B work tho (batch size one for the test, qlora, load in 4bit, 4096 ctx)

I lack experience with “Axolotl”.
If Yi-6B-200K works fine, Yi-9B-200K might need more memory. Also, the 'max_position_embeddings' for Yi-9B-200K is set at 256K by default. Maybe try adjusting it down to 200K and see if that solves the issue? https://huggingface.co/01-ai/Yi-9B-200K/blob/main/config.json#L13

itsliupeng

01-ai org Mar 17

I had the same issue doing QLoRA at 4096 context. It crashed with a generic error. Weird because I thought this model was llama compatible.

It has been confirmed to be compatible with Llama. Could you provide a more comprehensive set of crash logs?

cgato

Mar 17

Hi, thank you for your response. Unfortunately the log wasn't too informative. I observed an extreme amount of RAM ( not vRAM ) use as the model was loaded and then it crashed with the following after hanging for several minutes while loading the model. Unfortunately its not a helpful error.

torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -9) localrank: 3 (pid: 13727) of binary: /root/miniconda3/envs/py3.10/bin/python3
Traceback (most recent call last):
File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in
sys.exit(main())
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/acceleratecli.py", line 47, in main
args.func(args)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1014, in launch_command
multi_gpu_launcher(args)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 672, in multi_gpu_launcher
distrib_run.run(args)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

axolotl.cli.train FAILED

Failures:

Root Cause (first observed failure):
[0]:
time : 2024-03-16_19:34:31
host : ba783a85a837
rank : 3 (local_rank: 3)
exitcode : -9 (pid: 13727)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 13727

itsliupeng

01-ai org Mar 18

Hi, thank you for your response. Unfortunately the log wasn't too informative. I observed an extreme amount of RAM ( not vRAM ) use as the model was loaded and then it crashed with the following after hanging for several minutes while loading the model. Unfortunately its not a helpful error.

torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: -9) localrank: 3 (pid: 13727) of binary: /root/miniconda3/envs/py3.10/bin/python3
Traceback (most recent call last):
File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in
sys.exit(main())
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/acceleratecli.py", line 47, in main
args.func(args)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1014, in launch_command
multi_gpu_launcher(args)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 672, in multi_gpu_launcher
distrib_run.run(args)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

axolotl.cli.train FAILED

Failures:

Root Cause (first observed failure):
[0]:
time : 2024-03-16_19:34:31
host : ba783a85a837
rank : 3 (local_rank: 3)
exitcode : -9 (pid: 13727)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 13727

Unfortunately, no insights were derived. Is the Yi-9B base model capable of operating effectively when implementing QLoRA with a context length of 4096?
https://huggingface.co/01-ai/Yi-9B

adamo1139

Mar 31

I had similar issues when finetuning Yi-34B-200K on 24GB of VRAM. Set max_position_embeddings (I literally edit config.json file) to something like 4096 or 32768 for the finetuning and then change it back to higher values for inference. With 16384 resulting LoRA was broken, so not every value will work - I think you need to pick something that 01.ai used internally for pre-training, they probably scaled it to bigger and bigger values during pre-training. Also, make sure to have flash_attention_2 installed and activated. Most if not all of long context performance survives doing this once you revert the value post-tuning.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Problem with finetuning with Axolotl (qlora, lora, and FFT)

axolotl.cli.train FAILED

Failures:

Root Cause (first observed failure):[0]: time : 2024-03-16_19:34:31 host : ba783a85a837 rank : 3 (local_rank: 3) exitcode : -9 (pid: 13727) error_file: <N/A> traceback : Signal 9 (SIGKILL) received by PID 13727

axolotl.cli.train FAILED

Failures:

Root Cause (first observed failure):[0]: time : 2024-03-16_19:34:31 host : ba783a85a837 rank : 3 (local_rank: 3) exitcode : -9 (pid: 13727) error_file: <N/A> traceback : Signal 9 (SIGKILL) received by PID 13727

Root Cause (first observed failure):
[0]:
time : 2024-03-16_19:34:31
host : ba783a85a837
rank : 3 (local_rank: 3)
exitcode : -9 (pid: 13727)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 13727

Root Cause (first observed failure):
[0]:
time : 2024-03-16_19:34:31
host : ba783a85a837
rank : 3 (local_rank: 3)
exitcode : -9 (pid: 13727)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 13727