Finetuning issues

by vladislabv - opened Jun 11, 2024

Jun 11, 2024

Hi there!

Do you have some tips or a finished finetuning guide for your model? I have tried to tune it with the transformers.Trainer, however I get error by saving a checkpoint and nowhere could find any solution to it.

I believe that in some way the Trainer doesn't support (yet) following tensors, like bbox_embed.0.layers.0.weight.

Thank you in advance, appreciate your work and sharing it with the huggingface community!

bohou

Aryn Inc. org Jun 11, 2024

Hey, thanks for trying this. There are couple of things:

I did not train the model using HuggingFace trainer, but use https://github.com/fundamentalvision/Deformable-DETR, which has discrepancy with HuggingFace. I ran a script to convert the weights.
based on the parameter name, I think it relates to the bbox refinement from the model, double check if you config that during your fine tuning.

vladislabv

Jul 30, 2024

•

edited Jul 30, 2024

@bohou Thank you for your answer! I have already a trained Deformable-DETR (the model which is accessable via link you've provided) model on my custom dataset and it works pretty well. However the model itself is limited to use gpu-only.

You mentioned, you've used a script to convert the weights. That sounds as a solution to me, since transformers support both devices. I would really appreciate if you can share the script you used to the huggingface community!

Thank you!

bohou

Aryn Inc. org Jul 30, 2024

Sure, paste below.

import torch
checkpoint = torch.load("your pth location", map_location='cpu')
nd = {}
for k, v in checkpoint['model'].items():
    if k == "transformer.level_embed":
        nd["model.level_embed"] = v
    elif k.startswith('backbone.0.body.'):
        nk = "model.backbone.conv_encoder.model" + k.removeprefix("backbone.0.body")
        nd[nk] = v
    elif k.startswith('input_proj.'):
        nd["model." + k] = v
    elif k.startswith('transformer.enc_') or k.startswith('transformer.pos_'):
        nk = k.removeprefix('transformer.')
        nd["model." + nk] = v
    elif k.startswith('transformer.encoder'):
        nk = k.removeprefix('transformer.')
        if "norm1" in k:
            nk = nk.replace("norm1", "self_attn_layer_norm")
        elif "norm2" in k:
            nk = nk.replace("norm2", "final_layer_norm")
        elif "linear1" in k:
            nk = nk.replace("linear1", "fc1")
        elif "linear2" in k:
            nk = nk.replace("linear2", "fc2")
        nd["model." + nk] = v
    elif k.startswith("transformer.decoder"):
        nk = k.removeprefix('transformer.')
        nk = "model." + nk
        if "in_proj_weight" in k:
            (q, k, v) = v.chunk(3)
            nk = nk.removesuffix("in_proj_weight")
            nd[nk+"q_proj.weight"] = q
            nd[nk+"k_proj.weight"] = k
            nd[nk+"v_proj.weight"] = v
        elif "in_proj_bias" in k:
            (q, k, v) = v.chunk(3)
            nk = nk.removesuffix("in_proj_bias")
            nd[nk + "q_proj.bias"] = q
            nd[nk + "k_proj.bias"] = k
            nd[nk + "v_proj.bias"] = v
        elif "out_proj.weight" in k or "out_proj.bias" in k:
            nd[nk] = v
        elif "bbox_embed" in k or "class_embed" in k:
            nd[nk] = v
            nd[nk.removeprefix("model.decoder.")] = v
        else:
            if "norm1" in k:
                nk = nk.replace("norm1", "self_attn_layer_norm")
            elif "norm2" in k:
                nk = nk.replace("norm2", "encoder_attn_layer_norm")
            elif "norm3" in k:
                nk = nk.replace("norm3", "final_layer_norm")
            elif "linear1" in k:
                nk = nk.replace("linear1", "fc1")
            elif "linear2" in k:
                nk = nk.replace("linear2", "fc2")
            elif "cross_attn" in k:
                nk = nk.replace("cross_attn", "encoder_attn")
            nd[nk] = v

from transformers import DeformableDetrConfig, DeformableDetrForObjectDetection
config = DeformableDetrConfig.from_pretrained("general deformable detr config directory")
model = DeformableDetrForObjectDetection(config)
model.load_state_dict(nd)
model.save_pretrained("the location to save huggingface safetensor weights")

vladislabv

Jul 31, 2024

@bohou Perfect, thank you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment