Fine tuning example

#1
by gardner - opened

I really enjoyed the paper. Thanks for publishing these weights. Are there any code examples of fine tuning phi-1?

Cheers

Microsoft org

Hello @gardner ! I hope everything is going well with you.

You can use the following snippet to fine-tune the model:

import torch
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments, DataCollatorForLanguageModeling

model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1", trust_remote_code=True, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1", trust_remote_code=True, torch_dtype="auto")
tokenizer.pad_token = tokenizer.eos_token

dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="train[:1%]")
dataset = dataset.map(lambda x: tokenizer(x["text"], return_tensors="pt", padding="max_length", truncation=True), batched=True)
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

training_args = TrainingArguments("tmp", max_steps=1, per_device_train_batch_size=1)
trainer = Trainer(model, args=training_args, train_dataset=dataset, data_collator=data_collator)

trainer.train()

But please be aware that this is only an example, because the model still does not support attention_mask / padding. You will need to create a contiguous dataset that provides sequences with full length.

Regards,
Gustavo.

gugarosa changed discussion status to closed

Hi @gugarosa thanks for the example training code! Just wanted to clarify, what do you mean by "create a contiguous dataset that provides sequences with full length"? Because when I tried to fine tune phi1 with a similar training code, I got error like this:

Traceback (most recent call last):
  File "/home/t-xinyiwang/reasoning-tuning/train.py", line 427, in <module>
    train()
  File "/home/t-xinyiwang/reasoning-tuning/train.py", line 413, in train
    trainer.train()
  File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
    return inner_training_loop(
  File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 2679, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/transformers/trainer.py", line 2704, in compute_loss
    outputs = model(**inputs)
  File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1148, in forward
    self._sync_buffers()
  File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1748, in _sync_buffers
    self._sync_module_buffers(authoritative_rank)
  File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1752, in _sync_module_buffers
    self._default_broadcast_coalesced(
  File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1775, in _default_broadcast_coalesced
    self._distributed_broadcast_coalesced(
  File "/home/t-xinyiwang/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1689, in _distributed_broadcast_coalesced
    dist._broadcast_coalesced(
RuntimeError: Tensors must be CUDA and dense

Thank you

@xinyi-wang The error message you provided is quite detailed, and it gives us some insight into what might be going wrong.

The last error in the traceback, RuntimeError: Tensors must be CUDA and dense, suggests that the model or the data tensors are either not on the CUDA device (i.e., the GPU) or are not in the expected format (dense).

Here are some steps to troubleshoot and potentially resolve the issue:

  1. Ensure the model is on the CUDA device: Before starting training, ensure that you've moved the model to the CUDA device using:

    model = model.cuda()
    
  2. Ensure the data tensors are on the CUDA device: Before passing data tensors to the model, ensure they are on the CUDA device:

    input_tensor = input_tensor.cuda()
    
  3. Check for Sparse Tensors: The error suggests that the tensors should be dense. If you're using sparse tensors for any reason, you'll need to convert them to dense format before passing them to the model. If you aren't explicitly using sparse tensors, then this might not be the issue.

  4. Using Distributed Training: Since the traceback also includes references to distributed training, ensure that you've properly initialized the distributed environment. If you're using torch.nn.parallel.DistributedDataParallel, make sure you've set up the environment correctly with torch.distributed.init_process_group.

  5. Check GPU Memory: Ensure that your GPU has enough memory to hold the model and the data. If the GPU memory is full, it might not allow new tensors to be allocated on it.

  6. Update Libraries: Sometimes, errors can be due to compatibility issues or bugs in libraries. Ensure that you're using compatible versions of PyTorch and Transformers. If possible, try updating both libraries to the latest versions.

  7. Inspect the Training Code: Review the training loop, data loading, and model creation to ensure there's no part of the code accidentally converting tensors to CPU or changing their format.

Can you share the training code?

Thank you @gardner for the detailed reply! We have identified the issue, which is caused by distributed training. Since we are only using one GPU, we just removed the torchrun and everything works fine.

Sign up or log in to comment