Fine tuning example
I really enjoyed the paper. Thanks for publishing these weights. Are there any code examples of fine tuning phi-1?
Cheers
Hello @gardner ! I hope everything is going well with you.
You can use the following snippet to fine-tune the model:
import torch
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments, DataCollatorForLanguageModeling
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1", trust_remote_code=True, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1", trust_remote_code=True, torch_dtype="auto")
tokenizer.pad_token = tokenizer.eos_token
dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="train[:1%]")
dataset = dataset.map(lambda x: tokenizer(x["text"], return_tensors="pt", padding="max_length", truncation=True), batched=True)
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
training_args = TrainingArguments("tmp", max_steps=1, per_device_train_batch_size=1)
trainer = Trainer(model, args=training_args, train_dataset=dataset, data_collator=data_collator)
trainer.train()
But please be aware that this is only an example, because the model still does not support attention_mask
/ padding. You will need to create a contiguous dataset that provides sequences with full length.
Regards,
Gustavo.
Hi @gugarosa thanks for the example training code! Just wanted to clarify, what do you mean by "create a contiguous dataset that provides sequences with full length"? Because when I tried to fine tune phi1 with a similar training code, I got error like this:
Traceback (most recent call last):
File "/home/t-xinyiwang/reasoning-tuning/train.py", line 427, in <module>
train()
File "/home/t-xinyiwang/reasoning-tuning/train.py", line 413, in train
trainer.train()
File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 2679, in training_step
loss = self.compute_loss(model, inputs)
File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 2704, in compute_loss
outputs = model(**inputs)
File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1148, in forward
self._sync_buffers()
File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1748, in _sync_buffers
self._sync_module_buffers(authoritative_rank)
File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1752, in _sync_module_buffers
self._default_broadcast_coalesced(
File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1775, in _default_broadcast_coalesced
self._distributed_broadcast_coalesced(
File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1689, in _distributed_broadcast_coalesced
dist._broadcast_coalesced(
RuntimeError: Tensors must be CUDA and dense
Thank you
@xinyi-wang The error message you provided is quite detailed, and it gives us some insight into what might be going wrong.
The last error in the traceback, RuntimeError: Tensors must be CUDA and dense
, suggests that the model or the data tensors are either not on the CUDA device (i.e., the GPU) or are not in the expected format (dense).
Here are some steps to troubleshoot and potentially resolve the issue:
Ensure the model is on the CUDA device: Before starting training, ensure that you've moved the model to the CUDA device using:
model = model.cuda()
Ensure the data tensors are on the CUDA device: Before passing data tensors to the model, ensure they are on the CUDA device:
input_tensor = input_tensor.cuda()
Check for Sparse Tensors: The error suggests that the tensors should be dense. If you're using sparse tensors for any reason, you'll need to convert them to dense format before passing them to the model. If you aren't explicitly using sparse tensors, then this might not be the issue.
Using Distributed Training: Since the traceback also includes references to distributed training, ensure that you've properly initialized the distributed environment. If you're using
torch.nn.parallel.DistributedDataParallel
, make sure you've set up the environment correctly withtorch.distributed.init_process_group
.Check GPU Memory: Ensure that your GPU has enough memory to hold the model and the data. If the GPU memory is full, it might not allow new tensors to be allocated on it.
Update Libraries: Sometimes, errors can be due to compatibility issues or bugs in libraries. Ensure that you're using compatible versions of PyTorch and Transformers. If possible, try updating both libraries to the latest versions.
Inspect the Training Code: Review the training loop, data loading, and model creation to ensure there's no part of the code accidentally converting tensors to CPU or changing their format.
Can you share the training code?
Thank you
@gardner
for the detailed reply! We have identified the issue, which is caused by distributed training. Since we are only using one GPU, we just removed the torchrun
and everything works fine.