Instructions to use UWGZQ/TRASER with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use UWGZQ/TRASER with Transformers:
# Load model directly from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration_Insert processor = AutoProcessor.from_pretrained("UWGZQ/TRASER") model = Qwen2_5_VLForConditionalGeneration_Insert.from_pretrained("UWGZQ/TRASER") - Notebooks
- Google Colab
- Kaggle
lm_head.weight missing from checkpoint and inference returns garbage on the bundled example
I'm trying to use TRASER on traffic-CCTV footage. I'm running into an issue I can't get past on my own, and I'd really appreciate any guidance. If I'm missing a required step please point me to it. I may very well be holding it wrong.
I'm following inference.py as documented, on your bundled example:
python inference.py \
--model_path . \
--video_path example/2401075277.mp4 \
--mask_path example/2401075277_rle.json \
--out_dir ./output
The decoded output is 31 tokens and looks like this:
{"cri child",away playingful's " " " "{"cri", "{"{"{"]} "attributes playingfulfulroom",]},{"]}
A few observations while I was trying to debug ( not sure if this is expected behaviour):
In
model.safetensors.index.jsonI see 888 tensors total, includingmodel.embed_tokens.weightand everyperceiver_resampler.*/second_perceiver_resampler.*tensor, but I can't findlm_head.weightlisted anywhere (neither shard).In
config.json, the outerQwen2_5_VLConfigdoesn't settie_word_embeddings. The innertext_confighas it astrue, but my understanding is thatfrom_pretrainedreads the outer one (which would default tofalsehere).After loading,
model.lm_head.weight.abs().sum()comes out as0.0whilemodel.model.language_model.embed_tokens.weight.abs().sum()is around5.45M, which makes me think the LM head is sitting at its random init.
I tried two workarounds in case I was misreading the situation:
- Adding "tie_word_embeddings": true to the outer config and re-loading.
- Manually doing model.lm_head.weight = model.model.language_model.embed_tokens.weight after from_pretrained.
With both applied, the output does change. The perceivers seem to surface real visual concepts (tokens like "motorcycle", "scooter", "rider", "cycl" appear in the prefix), but the rest of the output collapses into long runs of the digit 0 (β96% of a 4,200-character output on a different clip).
My read is that the visual side is working, but the LM head isn't really tied, which would be expected if the head was trained separately and just wasn't included in the upload, but I'm honestly not sure.
I'd be very grateful for any pointer
Thankx again for the work.
Hey - thanks for your interests at first! Could you please share the version of your transformers package? We are using 4.54.0 and I guess module names may vary across different versions, so some weight loading would fail. As for the tie_word_embeddings setting, I think the default value is True in transformers.