Update config.json

by ybelkada - opened Dec 11, 2023

base: refs/heads/main

←

from: refs/pr/5

Discussion Files changed

-1

ybelkada

Dec 11, 2023

No description provided.

Update config.json33b736e9

ArthurZ

Dec 11, 2023

•

edited Dec 11, 2023

The fix in transformers for the loss computation is going to be something like

    if isinstance(gate_logits, tuple):
        # cat along the layers?
        gate_logits = torch.cat([gate.cpu() for gate in gate_logits], dim=0)

we overlooked the devices when computing the loss

pstock changed pull request status to merged Dec 11, 2023

joorei

Jan 4, 2024

@ArthurZ is this fixed in transformers? I am trying to fine tune with axolotl, but I get either

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:6 and cuda:0!

or when I change the config.json part like this:

  "output_router_logits": false,

I get :

RuntimeError: !grad_accumulator_.expired() INTERNAL ASSERT FAILED at "../torch/csrc/autograd/saved_variable.cpp":226, please report a bug to PyTorch. No grad accumulator for a saved leaf

Any hints?

No accelerate, just trying to run the training.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment