Aryn
/

Object Detection
Transformers
Safetensors
deformable_detr
vision
Inference Endpoints

Finetuning issues

#1
by vladislabv - opened

Hi there!

Do you have some tips or a finished finetuning guide for your model? I have tried to tune it with the transformers.Trainer, however I get error by saving a checkpoint and nowhere could find any solution to it.

I believe that in some way the Trainer doesn't support (yet) following tensors, like bbox_embed.0.layers.0.weight.

Thank you in advance, appreciate your work and sharing it with the huggingface community!

Aryn Inc. org

Hey, thanks for trying this. There are couple of things:

  1. I did not train the model using HuggingFace trainer, but use https://github.com/fundamentalvision/Deformable-DETR, which has discrepancy with HuggingFace. I ran a script to convert the weights.
  2. based on the parameter name, I think it relates to the bbox refinement from the model, double check if you config that during your fine tuning.

@bohou Thank you for your answer! I have already a trained Deformable-DETR (the model which is accessable via link you've provided) model on my custom dataset and it works pretty well. However the model itself is limited to use gpu-only.

You mentioned, you've used a script to convert the weights. That sounds as a solution to me, since transformers support both devices. I would really appreciate if you can share the script you used to the huggingface community!

Thank you!

Aryn Inc. org

Sure, paste below.

import torch
checkpoint = torch.load("your pth location", map_location='cpu')
nd = {}
for k, v in checkpoint['model'].items():
    if k == "transformer.level_embed":
        nd["model.level_embed"] = v
    elif k.startswith('backbone.0.body.'):
        nk = "model.backbone.conv_encoder.model" + k.removeprefix("backbone.0.body")
        nd[nk] = v
    elif k.startswith('input_proj.'):
        nd["model." + k] = v
    elif k.startswith('transformer.enc_') or k.startswith('transformer.pos_'):
        nk = k.removeprefix('transformer.')
        nd["model." + nk] = v
    elif k.startswith('transformer.encoder'):
        nk = k.removeprefix('transformer.')
        if "norm1" in k:
            nk = nk.replace("norm1", "self_attn_layer_norm")
        elif "norm2" in k:
            nk = nk.replace("norm2", "final_layer_norm")
        elif "linear1" in k:
            nk = nk.replace("linear1", "fc1")
        elif "linear2" in k:
            nk = nk.replace("linear2", "fc2")
        nd["model." + nk] = v
    elif k.startswith("transformer.decoder"):
        nk = k.removeprefix('transformer.')
        nk = "model." + nk
        if "in_proj_weight" in k:
            (q, k, v) = v.chunk(3)
            nk = nk.removesuffix("in_proj_weight")
            nd[nk+"q_proj.weight"] = q
            nd[nk+"k_proj.weight"] = k
            nd[nk+"v_proj.weight"] = v
        elif "in_proj_bias" in k:
            (q, k, v) = v.chunk(3)
            nk = nk.removesuffix("in_proj_bias")
            nd[nk + "q_proj.bias"] = q
            nd[nk + "k_proj.bias"] = k
            nd[nk + "v_proj.bias"] = v
        elif "out_proj.weight" in k or "out_proj.bias" in k:
            nd[nk] = v
        elif "bbox_embed" in k or "class_embed" in k:
            nd[nk] = v
            nd[nk.removeprefix("model.decoder.")] = v
        else:
            if "norm1" in k:
                nk = nk.replace("norm1", "self_attn_layer_norm")
            elif "norm2" in k:
                nk = nk.replace("norm2", "encoder_attn_layer_norm")
            elif "norm3" in k:
                nk = nk.replace("norm3", "final_layer_norm")
            elif "linear1" in k:
                nk = nk.replace("linear1", "fc1")
            elif "linear2" in k:
                nk = nk.replace("linear2", "fc2")
            elif "cross_attn" in k:
                nk = nk.replace("cross_attn", "encoder_attn")
            nd[nk] = v

from transformers import DeformableDetrConfig, DeformableDetrForObjectDetection
config = DeformableDetrConfig.from_pretrained("general deformable detr config directory")
model = DeformableDetrForObjectDetection(config)
model.load_state_dict(nd)
model.save_pretrained("the location to save huggingface safetensor weights")

@bohou Perfect, thank you!

Sign up or log in to comment