Fine tuning with transformers?

#17

by toranb - opened Dec 2, 2023

Discussion

toranb

Dec 2, 2023

•

edited Dec 2, 2023

I've had luck fine tuning Zephyr and other fine tunes of Mistral with transformers 4.35.2 but Starling throws an error related to vocab mismatch (likely because I'm using Mistral)

shape '[-1, 32000]' is invalid for input of size 19457216

this originates from transformers/models/mistral/modeling_mistral.py line 1032 for those interested.
shift_logits.view(-1, self.config.vocab_size)

Does anyone know of a workaround until we have 1st class support in transformers?

MB7977

Dec 3, 2023

For what it’s worth I’ve successfully done a full fine tune on Starling with Transformers 4.35.2 (with Axolotl). Are you perhaps adding tokens and changing the vocab size? I trained with the OpenChat prompt format and stuck with the default EOS, BOS tokens etc so no added tokens were necessary. I think the openchat.json file may also be relevant?

toranb

Dec 3, 2023

When do you use the openchat.json file? I didn't even pull that down ahead of converting weights/fine tuning so I'm curious to learn more

https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha/blob/main/openchat.json

MB7977

Dec 4, 2023

I may be wrong about that. It was one of a couple of files added after the fact that seemed to fix early training issues, but that one seems more related to compatibility with the OpenChat API. I'm working with Transformers indirectly, via Axolotl, so it's difficult to tease out why it's working in my instance versus yours. The OpenChat 3.5 format used by Starling adds a couple of tokens to the vocabulary that I suspect are the source of your issues. Hopefully the devs can help.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment