google-bert/bert-base-uncased · How to I fine tune this model?

How to I fine tune this model?

#12

by ethanjyx - opened Oct 2, 2022

Oct 2, 2022

Hey there, I am interested in finetuning this bert-base-uncased, how can I do it?

I found this tutorial https://huggingface.co/docs/transformers/training, but it focuses on finetuning a prediction head rather than the backbone weights.

I would like to

finetune the backbone weights here, by dumping large corpus of texts from my domain,
train a prediction head with a more limited dataset from my domain
is that possible?

nikharanirghin

Nov 23, 2022

Hey Ethan! Would love to chat on this if you have a few minutes to spare. Please let me know :)

ethanjyx

Nov 23, 2022

Hey Ethan! Would love to chat on this if you have a few minutes to spare. Please let me know :)

@nikharanirghin ? what do you want to chat about?

nikharanirghin

Nov 23, 2022

Feedback on finetuning bert!

Nasa2808

Dec 2, 2022

can you help me too ? I want to fine-tuning this model too

breadlicker45

Jan 19, 2023

This comment has been hidden

Immortalizer

Jan 24, 2023

You can simply train the complete model with a very low learning rate to fine-tune the entire model.
When you load the pretrained model and set model.train() it will, by default, have all the layers enabled for back propagation.
IE:

def single_training_epoch(model, optimizer, train_dataloader):
    model.train()
    # Loop over the training set
    for input_ids, attention_masks, labels in train_dataloader:
        # Clear the gradients
        optimizer.zero_grad()
        # Forward pass
        outputs = model(input_ids, attention_mask=attention_masks, labels=labels)
        loss = outputs[0]
        # Backward pass
        loss.backward()
        optimizer.step()
    return model, optimizer

You can also manually set each layer (True = updates/trains, False = do not update during back-prop) using a loop:

for param in model.bert.parameters():
    param.requires_grad = True

My recommendation from there is to save the model out and then train a new model taking the output from BERT as an input.
This enables a few things:

WAY faster training/retraining of your 'prediction head' model as you can run the data through the previous model a single time and then train your smaller model.
Easier to retrain and experiment with different architectures of your 'prediction head' without even interacting with the BERT model.
Able to add additional values into your model (such as ints and floats that could be present in other data fields - or your features you've created yourself)

The one drawback is slightly slower inference time... but this can be mitigated by creating a proper pipeline (or a more advanced method would be to load them separately with their weights and merge the models together).

nipuna99

7 days ago

May I know What is the shape of the model
When train the I got this error
Target size (torch.Size([8, 6])) must be the same as input size (torch.Size([8, 2]))

I want to adjust the input shape with expected shape

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment