Finetuning issues
Hi there!
Do you have some tips or a finished finetuning guide for your model? I have tried to tune it with the transformers.Trainer, however I get error by saving a checkpoint and nowhere could find any solution to it.
I believe that in some way the Trainer doesn't support (yet) following tensors, like bbox_embed.0.layers.0.weight
.
Thank you in advance, appreciate your work and sharing it with the huggingface community!
Hey, thanks for trying this. There are couple of things:
- I did not train the model using HuggingFace trainer, but use https://github.com/fundamentalvision/Deformable-DETR, which has discrepancy with HuggingFace. I ran a script to convert the weights.
- based on the parameter name, I think it relates to the bbox refinement from the model, double check if you config that during your fine tuning.
@bohou Thank you for your answer! I have already a trained Deformable-DETR (the model which is accessable via link you've provided) model on my custom dataset and it works pretty well. However the model itself is limited to use gpu-only.
You mentioned, you've used a script to convert the weights. That sounds as a solution to me, since transformers support both devices. I would really appreciate if you can share the script you used to the huggingface community!
Thank you!
Sure, paste below.
import torch
checkpoint = torch.load("your pth location", map_location='cpu')
nd = {}
for k, v in checkpoint['model'].items():
if k == "transformer.level_embed":
nd["model.level_embed"] = v
elif k.startswith('backbone.0.body.'):
nk = "model.backbone.conv_encoder.model" + k.removeprefix("backbone.0.body")
nd[nk] = v
elif k.startswith('input_proj.'):
nd["model." + k] = v
elif k.startswith('transformer.enc_') or k.startswith('transformer.pos_'):
nk = k.removeprefix('transformer.')
nd["model." + nk] = v
elif k.startswith('transformer.encoder'):
nk = k.removeprefix('transformer.')
if "norm1" in k:
nk = nk.replace("norm1", "self_attn_layer_norm")
elif "norm2" in k:
nk = nk.replace("norm2", "final_layer_norm")
elif "linear1" in k:
nk = nk.replace("linear1", "fc1")
elif "linear2" in k:
nk = nk.replace("linear2", "fc2")
nd["model." + nk] = v
elif k.startswith("transformer.decoder"):
nk = k.removeprefix('transformer.')
nk = "model." + nk
if "in_proj_weight" in k:
(q, k, v) = v.chunk(3)
nk = nk.removesuffix("in_proj_weight")
nd[nk+"q_proj.weight"] = q
nd[nk+"k_proj.weight"] = k
nd[nk+"v_proj.weight"] = v
elif "in_proj_bias" in k:
(q, k, v) = v.chunk(3)
nk = nk.removesuffix("in_proj_bias")
nd[nk + "q_proj.bias"] = q
nd[nk + "k_proj.bias"] = k
nd[nk + "v_proj.bias"] = v
elif "out_proj.weight" in k or "out_proj.bias" in k:
nd[nk] = v
elif "bbox_embed" in k or "class_embed" in k:
nd[nk] = v
nd[nk.removeprefix("model.decoder.")] = v
else:
if "norm1" in k:
nk = nk.replace("norm1", "self_attn_layer_norm")
elif "norm2" in k:
nk = nk.replace("norm2", "encoder_attn_layer_norm")
elif "norm3" in k:
nk = nk.replace("norm3", "final_layer_norm")
elif "linear1" in k:
nk = nk.replace("linear1", "fc1")
elif "linear2" in k:
nk = nk.replace("linear2", "fc2")
elif "cross_attn" in k:
nk = nk.replace("cross_attn", "encoder_attn")
nd[nk] = v
from transformers import DeformableDetrConfig, DeformableDetrForObjectDetection
config = DeformableDetrConfig.from_pretrained("general deformable detr config directory")
model = DeformableDetrForObjectDetection(config)
model.load_state_dict(nd)
model.save_pretrained("the location to save huggingface safetensor weights")
@bohou Perfect, thank you!