TransReID (ViT-Base) โ€” Fine-Tuned for Person Re-Identification

Our fine-tuned TransReID model for pedestrian re-identification, trained on the Market-1501 dataset for 120 epochs. This model can be used as the appearance encoder in our RetailHeat multi-object tracking pipeline.

Model Details

Property Value
Architecture TransReID (ViT-Base, patch16-224)
Backbone type vit_base_patch16_224_TransReID
Stride size [12, 12]
JPM Enabled
SIE camera embedding Enabled
Embedding dim 3840-D (L2-normalized)
Input size 256 ร— 128 (H ร— W)
Training dataset Market-1501
Training setup Fine-tuned for 120 epochs
Task Person re-identification

Performance

Evaluation on Market-1501 from the final 120-epoch checkpoint:

Metric Value
Rank-1 Accuracy 95.1%
Rank-5 Accuracy 98.2%
Rank-10 Accuracy 99.0%
mAP 88.6%

Intermediate validation results during training:

Epoch Rank-1 mAP
20 93.2% 84.5%
40 94.4% 86.6%
60 94.8% 87.6%
80 95.0% 88.0%
100 95.0% 88.5%
120 95.1% 88.6%

Architecture

TransReID is a transformer-based person re-identification model built on a ViT-Base backbone and adapted for re-ID with camera-aware side information embeddings (SIE) and the JPM module for stronger local-global feature learning. In this setup, the model uses a stride size of [12, 12], camera-aware training for 6 cameras, and produces a 3840-dimensional embedding that is L2-normalized at inference time.

Compared with lightweight CNN-based encoders, TransReID provides stronger identity discrimination and can improve appearance matching quality in crowded scenes, at the cost of higher compute.

Training

We fine-tuned this model on the Market-1501 person re-identification benchmark.

  • Dataset: Market-1501
  • Training split: 12,936 images of 751 identities
  • Query/Gallery: 19,732 images of 750 identities
  • Input resolution: 256 ร— 128
  • Optimizer: SGD
  • Learning rate: 0.004 in the 2-GPU training run used in the notebook
  • Batch size: 32 total for training, 128 for evaluation
  • Epochs: 120
  • Sampler: softmax_triplet
  • Loss setup: triplet metric loss
  • Augmentations: random horizontal flip, random erasing, padding
  • Pretraining: ImageNet ViT weights (jx_vit_base_p16_224-80ecf9dd.pth)

Usage

Download the weights

pip install huggingface_hub
huggingface-cli download MYerassyl/retail-heat-transreid TransReID.pth --local-dir weights/

Load in Python

import sys
from pathlib import Path
import cv2
import numpy as np
import torch

TRANSREID_REPO = Path("TransReID")
sys.path.insert(0, str(TRANSREID_REPO))

from config import cfg
from model import make_model

CONFIG_PATH = "configs/Market/vit_transreid_stride.yml"
WEIGHT_PATH = "weights/TransReID.pth"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

cfg = cfg.clone()
cfg.defrost()
cfg.merge_from_file(CONFIG_PATH)
cfg.MODEL.PRETRAIN_CHOICE = "self"
cfg.TEST.WEIGHT = WEIGHT_PATH
cfg.TEST.NECK_FEAT = "before"
cfg.TEST.FEAT_NORM = "yes"
cfg.freeze()

model = make_model(cfg, num_class=751, camera_num=6, view_num=1)
model.load_param(WEIGHT_PATH)
model.to(DEVICE).eval()

# Example crop -> embedding
img = cv2.imread("person_crop.jpg")
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (128, 256))
x = torch.from_numpy(img).permute(2, 0, 1).float() / 255.0
x = (x - 0.5) / 0.5
x = x.unsqueeze(0).to(DEVICE)

cam = torch.zeros(1, dtype=torch.long, device=DEVICE)
view = torch.zeros(1, dtype=torch.long, device=DEVICE)

with torch.no_grad():
    feat = model(x, cam_label=cam, view_label=view)
    feat = torch.nn.functional.normalize(feat, dim=1)

embedding = feat[0].cpu().numpy()
print(embedding.shape)  # (3840,)

Use with the RetailHeat pipeline

This model can be used as the appearance encoder in the BoT-SORT / RetailHeat tracking pipeline.

git clone https://github.com/MYerassyl/retail-heat.git
cd retail-heat
mkdir -p weights
huggingface-cli download MYerassyl/retail-heat-transreid TransReID.pth --local-dir weights/

Then load TransReID.pth inside your TransReID-based encoder wrapper exactly as in your notebook.

Notes

  • The exported checkpoint used in the notebook is TransReID.pth.
  • The evaluation notebook also used this model inside a BoT-SORT tracker with YOLO detections.
  • Since this is a transformer-based re-ID encoder, inference is heavier than OSNet but typically yields stronger appearance features.

Citation

If you use this model, please cite our RetailHeat project:

@software{retail_heat,
  author = {Yerassyl},
  title = {RetailHeat: Multi-Object Tracking and Heatmap Generation for Retail Analytics},
  url = {https://github.com/MYerassyl/retail-heat}
}

License

This model is released under the MIT License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support