TransReID (ViT-Base) โ Fine-Tuned for Person Re-Identification
Our fine-tuned TransReID model for pedestrian re-identification, trained on the Market-1501 dataset for 120 epochs. This model can be used as the appearance encoder in our RetailHeat multi-object tracking pipeline.
Model Details
| Property | Value |
|---|---|
| Architecture | TransReID (ViT-Base, patch16-224) |
| Backbone type | vit_base_patch16_224_TransReID |
| Stride size | [12, 12] |
| JPM | Enabled |
| SIE camera embedding | Enabled |
| Embedding dim | 3840-D (L2-normalized) |
| Input size | 256 ร 128 (H ร W) |
| Training dataset | Market-1501 |
| Training setup | Fine-tuned for 120 epochs |
| Task | Person re-identification |
Performance
Evaluation on Market-1501 from the final 120-epoch checkpoint:
| Metric | Value |
|---|---|
| Rank-1 Accuracy | 95.1% |
| Rank-5 Accuracy | 98.2% |
| Rank-10 Accuracy | 99.0% |
| mAP | 88.6% |
Intermediate validation results during training:
| Epoch | Rank-1 | mAP |
|---|---|---|
| 20 | 93.2% | 84.5% |
| 40 | 94.4% | 86.6% |
| 60 | 94.8% | 87.6% |
| 80 | 95.0% | 88.0% |
| 100 | 95.0% | 88.5% |
| 120 | 95.1% | 88.6% |
Architecture
TransReID is a transformer-based person re-identification model built on a ViT-Base backbone and adapted for re-ID with camera-aware side information embeddings (SIE) and the JPM module for stronger local-global feature learning. In this setup, the model uses a stride size of [12, 12], camera-aware training for 6 cameras, and produces a 3840-dimensional embedding that is L2-normalized at inference time.
Compared with lightweight CNN-based encoders, TransReID provides stronger identity discrimination and can improve appearance matching quality in crowded scenes, at the cost of higher compute.
Training
We fine-tuned this model on the Market-1501 person re-identification benchmark.
- Dataset: Market-1501
- Training split: 12,936 images of 751 identities
- Query/Gallery: 19,732 images of 750 identities
- Input resolution: 256 ร 128
- Optimizer: SGD
- Learning rate: 0.004 in the 2-GPU training run used in the notebook
- Batch size: 32 total for training, 128 for evaluation
- Epochs: 120
- Sampler:
softmax_triplet - Loss setup: triplet metric loss
- Augmentations: random horizontal flip, random erasing, padding
- Pretraining: ImageNet ViT weights (
jx_vit_base_p16_224-80ecf9dd.pth)
Usage
Download the weights
pip install huggingface_hub
huggingface-cli download MYerassyl/retail-heat-transreid TransReID.pth --local-dir weights/
Load in Python
import sys
from pathlib import Path
import cv2
import numpy as np
import torch
TRANSREID_REPO = Path("TransReID")
sys.path.insert(0, str(TRANSREID_REPO))
from config import cfg
from model import make_model
CONFIG_PATH = "configs/Market/vit_transreid_stride.yml"
WEIGHT_PATH = "weights/TransReID.pth"
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
cfg = cfg.clone()
cfg.defrost()
cfg.merge_from_file(CONFIG_PATH)
cfg.MODEL.PRETRAIN_CHOICE = "self"
cfg.TEST.WEIGHT = WEIGHT_PATH
cfg.TEST.NECK_FEAT = "before"
cfg.TEST.FEAT_NORM = "yes"
cfg.freeze()
model = make_model(cfg, num_class=751, camera_num=6, view_num=1)
model.load_param(WEIGHT_PATH)
model.to(DEVICE).eval()
# Example crop -> embedding
img = cv2.imread("person_crop.jpg")
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (128, 256))
x = torch.from_numpy(img).permute(2, 0, 1).float() / 255.0
x = (x - 0.5) / 0.5
x = x.unsqueeze(0).to(DEVICE)
cam = torch.zeros(1, dtype=torch.long, device=DEVICE)
view = torch.zeros(1, dtype=torch.long, device=DEVICE)
with torch.no_grad():
feat = model(x, cam_label=cam, view_label=view)
feat = torch.nn.functional.normalize(feat, dim=1)
embedding = feat[0].cpu().numpy()
print(embedding.shape) # (3840,)
Use with the RetailHeat pipeline
This model can be used as the appearance encoder in the BoT-SORT / RetailHeat tracking pipeline.
git clone https://github.com/MYerassyl/retail-heat.git
cd retail-heat
mkdir -p weights
huggingface-cli download MYerassyl/retail-heat-transreid TransReID.pth --local-dir weights/
Then load TransReID.pth inside your TransReID-based encoder wrapper exactly as in your notebook.
Notes
- The exported checkpoint used in the notebook is
TransReID.pth. - The evaluation notebook also used this model inside a BoT-SORT tracker with YOLO detections.
- Since this is a transformer-based re-ID encoder, inference is heavier than OSNet but typically yields stronger appearance features.
Citation
If you use this model, please cite our RetailHeat project:
@software{retail_heat,
author = {Yerassyl},
title = {RetailHeat: Multi-Object Tracking and Heatmap Generation for Retail Analytics},
url = {https://github.com/MYerassyl/retail-heat}
}
License
This model is released under the MIT License.