Sync motion-tracking from metro-analytics-catalog

Browse files

Files changed (5) hide show

.gitattributes +2 -0
README.md +18 -133
expected_output_dlstreamer.gif +3 -0
expected_output_openvino.gif +3 -0
export_and_quantize.sh +8 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+expected_output_dlstreamer.gif filter=lfs diff=lfs merge=lfs -text
+expected_output_openvino.gif filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,7 +1,5 @@
 # Motion Tracking
-> **Validated with:** OpenVINO 2026.1.0, NNCF 3.0.0, DLStreamer 2026.0, Ultralytics 8.4.46, Python 3.11+
 | Property | Value |
 |---|---|
 | **Category** | Object Detection + Multi-Object Tracking |
@@ -19,7 +17,6 @@
 Motion Tracking is a Metro Analytics use case that detects objects and assigns persistent track IDs across frames, enabling trajectory analysis and temporal event detection.
 It is built on [YOLO26](https://docs.ultralytics.com/models/yolo26/), a state-of-the-art real-time object detector quantized to INT8, paired with a multi-object tracker:
-- **OpenVINO pipeline:** YOLO26 INT8 detection + Ultralytics built-in [BoT-SORT](https://github.com/NirAharon/BoT-SORT) or [ByteTrack](https://github.com/FoundationVision/ByteTrack) tracker via `model.track()`.
 - **DLStreamer pipeline:** YOLO26 FP16 detection via `gvadetect` + `gvatrack` element with `tracking-type=short-term-imageless`.
 Each detected object receives a unique `track_id` that persists across frames as long as the object remains visible.
@@ -42,7 +39,7 @@ The default tracker is BoT-SORT; ByteTrack is available as an alternative with l
 ## Prerequisites
 - Python 3.11+
-- `ffmpeg` (`sudo apt install ffmpeg`) — used by both samples to encode output video
 - [Install OpenVINO](https://docs.openvino.ai/2026/get-started/install-openvino.html) (latest version)
 - [Install Intel DLStreamer](https://docs.openedgeplatform.intel.com/2026.0/edge-ai-libraries/dlstreamer/get_started/install/install_guide_ubuntu.html) (latest version)
@@ -86,7 +83,7 @@ The second argument selects the precision (`FP32`, `FP16`, `INT8`); the default
 The script performs the following steps:
 1. Installs dependencies (`openvino`, `ultralytics`; adds `nncf` for INT8).
-2. Downloads the sample test video (`test_video.mp4`).
 3. Downloads the PyTorch weights and exports to OpenVINO IR.
 4. *(INT8 only)* Quantizes the model using NNCF post-training quantization.
@@ -107,128 +104,13 @@ Output files:
 > For production accuracy, replace it with a representative set of frames from
 > the target deployment site.
-### OpenVINO Sample
-The sample below uses the Ultralytics `model.track()` API with the PyTorch
-weights to detect and track objects in a video, assigning persistent track IDs
-via the built-in BoT-SORT tracker.
-Each annotated frame -- with bounding boxes, track IDs, and per-track
-trajectory polylines -- is written to `output.mp4`.
-> **Important:** The `model.track()` API requires PyTorch weights (`.pt`).
-> Using the OpenVINO model directory with `model.track()` produces zero
-> detections in Ultralytics 8.4.x due to an incompatibility in the tracker
-> integration. Use `model.predict()` for single-frame inference with the
-> OpenVINO backend, or use the DLStreamer sample below for OpenVINO-accelerated
-> tracking.
->
-> The INT8 model (`yolo26n_tracking_int8.xml`) can be used directly with the
-> OpenVINO Python API but not with the Ultralytics `YOLO()` wrapper.
-```python
-import subprocess
-from collections import defaultdict
-import cv2
-import numpy as np
-from ultralytics import YOLO
-# Use PyTorch weights for tracking -- model.track() requires the .pt backend.
-# The OpenVINO model directory works with model.predict() but not model.track().
-model = YOLO("yolo26n.pt", task="detect")
-video_path = "test_video.mp4"
-cap = cv2.VideoCapture(video_path)
-fps = cap.get(cv2.CAP_PROP_FPS) or 30.0
-width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
-height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
-# Pipe frames to ffmpeg for H.264 output (universally playable).
-proc = subprocess.Popen(
-    ["ffmpeg", "-y", "-f", "rawvideo", "-pix_fmt", "bgr24",
-     "-s", f"{width}x{height}", "-r", str(fps),
-     "-i", "pipe:0", "-c:v", "libx264", "-pix_fmt", "yuv420p",
-     "-movflags", "+faststart", "output.mp4"],
-    stdin=subprocess.PIPE, stderr=subprocess.DEVNULL,
-)
-# Distinct colors for trajectory lines (one per track ID).
-COLORS = [
-    (255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0),
-    (255, 0, 255), (0, 255, 255), (128, 0, 255), (255, 128, 0),
-]
-track_history: dict[int, list[tuple[float, float]]] = defaultdict(list)
-while cap.isOpened():
-    success, frame = cap.read()
-    if not success:
-        break
-    # Run YOLO26 tracking with BoT-SORT (default).
-    # Use tracker="bytetrack.yaml" for ByteTrack alternative.
-    results = model.track(frame, persist=True, conf=0.4, tracker="botsort.yaml")
-    result = results[0]
-    annotated = result.plot()
-    if result.boxes and result.boxes.is_track:
-        boxes = result.boxes.xywh.cpu()
-        track_ids = result.boxes.id.int().cpu().tolist()
-        classes = result.boxes.cls.int().cpu().tolist()
-        for box, track_id in zip(boxes, track_ids):
-            x, y, _w, _h = box
-            track = track_history[track_id]
-            track.append((float(x), float(y)))
-            if len(track) > 30:
-                track.pop(0)
-            color = COLORS[track_id % len(COLORS)]
-            points = np.array(track, dtype=np.int32).reshape((-1, 1, 2))
-            cv2.polylines(annotated, [points], False, color, 2)
-        for tid, cls_id in zip(track_ids, classes):
-            cx, cy = track_history[tid][-1]
-            print(f"  Track {tid}: class={cls_id} center=({cx:.0f},{cy:.0f})", flush=True)
-    proc.stdin.write(annotated.tobytes())
-cap.release()
-proc.stdin.close()
-proc.wait()
-print("Wrote output.mp4", flush=True)
-```
-**Device targets:**
-- Default runs on CPU via OpenVINO.
-- For GPU: set `device="gpu:0"` in the `model.track()` call.
-- For NPU: set `device="npu:0"` (validate availability with `benchmark_app -d NPU`).
-### Try It on a Sample Video
-The `export_and_quantize.sh` script downloads `test_video.mp4` automatically.
-Run the OpenVINO sample above.
-The script processes each frame, prints per-track positions to the console,
-and writes the annotated video to `output.mp4`.
-Expected console output (representative):
-```text
-  Track 1: class=0 center=(320,240)
-  Track 2: class=0 center=(450,300)
-```
-`output.mp4` shows bounding boxes with track IDs and colored trajectory
-polylines for each tracked object.
 ### DLStreamer Sample
 The pipeline below runs the YOLO26 FP16 detector via `gvadetect` on
 `test_video.mp4`, attaches persistent track IDs with `gvatrack`
 (`short-term-imageless` tracker), and overlays bounding boxes with
 `gvawatermark`.  Frames are pulled from an `appsink`, per-track trajectory
-polylines are drawn with OpenCV, and the result is muxed to `output.mp4`
 (H.264 via ffmpeg).
 > **Notes on running this sample:**
@@ -259,15 +141,14 @@ from gstgva import VideoFrame
 Gst.init(None)
-# For GPU: change device=CPU to device=GPU, add vapostproc !
-# video/x-raw(memory:VASurface) after decodebin3, and set
-# pre-process-backend=vaapi-surface-sharing on gvadetect.
-# For NPU: change device=CPU to device=NPU (batch-size=1, nireq=4 recommended).
 pipeline_str = (
-    "filesrc location=test_video.mp4 ! decodebin3 ! videoconvert ! "
-    "video/x-raw,format=BGR ! "
     "gvadetect model=yolo26n_openvino_model/yolo26n.xml "
-    "device=CPU threshold=0.4 ! queue ! "
     "gvatrack tracking-type=short-term-imageless ! queue ! "
     "gvawatermark ! appsink name=sink emit-signals=false sync=false"
 )
@@ -304,7 +185,7 @@ while True:
             ["ffmpeg", "-y", "-f", "rawvideo", "-pix_fmt", "bgr24",
              "-s", f"{width}x{height}", "-r", str(fps),
              "-i", "pipe:0", "-c:v", "libx264", "-pix_fmt", "yuv420p",
-             "-movflags", "+faststart", "output.mp4"],
             stdin=subprocess.PIPE, stderr=subprocess.DEVNULL,
         )
@@ -344,14 +225,18 @@ pipeline.set_state(Gst.State.NULL)
 if proc:
     proc.stdin.close()
     proc.wait()
-print("Wrote output.mp4", flush=True)
 ```
 **Device targets:**
-- `device=CPU` -- default in the sample code.
-- `device=GPU` -- add `vapostproc ! video/x-raw(memory:VASurface)` after `decodebin3` and set `pre-process-backend=vaapi-surface-sharing` on `gvadetect`.
-- `device=NPU` -- use `batch-size=1` and `nireq=4` for best NPU utilization.
 ---

 # Motion Tracking
 | Property | Value |
 |---|---|
 | **Category** | Object Detection + Multi-Object Tracking |
 Motion Tracking is a Metro Analytics use case that detects objects and assigns persistent track IDs across frames, enabling trajectory analysis and temporal event detection.
 It is built on [YOLO26](https://docs.ultralytics.com/models/yolo26/), a state-of-the-art real-time object detector quantized to INT8, paired with a multi-object tracker:
 - **DLStreamer pipeline:** YOLO26 FP16 detection via `gvadetect` + `gvatrack` element with `tracking-type=short-term-imageless`.
 Each detected object receives a unique `track_id` that persists across frames as long as the object remains visible.
 ## Prerequisites
 - Python 3.11+
+- `ffmpeg` (`sudo apt install ffmpeg`) -- used by both samples to encode output video
 - [Install OpenVINO](https://docs.openvino.ai/2026/get-started/install-openvino.html) (latest version)
 - [Install Intel DLStreamer](https://docs.openedgeplatform.intel.com/2026.0/edge-ai-libraries/dlstreamer/get_started/install/install_guide_ubuntu.html) (latest version)
 The script performs the following steps:
 1. Installs dependencies (`openvino`, `ultralytics`; adds `nncf` for INT8).
+2. Downloads the sample test video (`test_video.mp4`) and a sample test image (`test.jpg`).
 3. Downloads the PyTorch weights and exports to OpenVINO IR.
 4. *(INT8 only)* Quantizes the model using NNCF post-training quantization.
 > For production accuracy, replace it with a representative set of frames from
 > the target deployment site.
 ### DLStreamer Sample
 The pipeline below runs the YOLO26 FP16 detector via `gvadetect` on
 `test_video.mp4`, attaches persistent track IDs with `gvatrack`
 (`short-term-imageless` tracker), and overlays bounding boxes with
 `gvawatermark`.  Frames are pulled from an `appsink`, per-track trajectory
+polylines are drawn with OpenCV, and the result is muxed to `output_dlstreamer.mp4`
 (H.264 via ffmpeg).
 > **Notes on running this sample:**
 Gst.init(None)
+# For CPU: change device=GPU to device=CPU.
+# For NPU: change device=GPU to device=NPU (batch-size=1, nireq=4 recommended).
 pipeline_str = (
+    "filesrc location=test_video.mp4 ! decodebin3 ! "
+    "videoconvert ! "
     "gvadetect model=yolo26n_openvino_model/yolo26n.xml "
+    "device=GPU "
+    "threshold=0.4 ! queue ! "
     "gvatrack tracking-type=short-term-imageless ! queue ! "
     "gvawatermark ! appsink name=sink emit-signals=false sync=false"
 )
             ["ffmpeg", "-y", "-f", "rawvideo", "-pix_fmt", "bgr24",
              "-s", f"{width}x{height}", "-r", str(fps),
              "-i", "pipe:0", "-c:v", "libx264", "-pix_fmt", "yuv420p",
+             "-movflags", "+faststart", "output_dlstreamer.mp4"],
             stdin=subprocess.PIPE, stderr=subprocess.DEVNULL,
         )
 if proc:
     proc.stdin.close()
     proc.wait()
+print("Wrote output_dlstreamer.mp4", flush=True)
 ```
+#### Expected Output
+![DLStreamer expected output](expected_output_dlstreamer.gif)
 **Device targets:**
+- `device=GPU` -- default in the sample code.
+- `device=CPU` -- change `device=GPU` to `device=CPU`.
+- `device=NPU` -- change `device=GPU` to `device=NPU`; use `batch-size=1` and `nireq=4` for best NPU utilization.
 ---

expected_output_dlstreamer.gif ADDED Viewed

Git LFS Details

SHA256: 987a4b6152414e29729778e4a71290c567d518f19c556badce1d2574ebbedf8c
Pointer size: 132 Bytes
Size of remote file: 6.77 MB

expected_output_openvino.gif ADDED Viewed

Git LFS Details

SHA256: 001ae7d6abfa928172ecdd570c9d1757a62e93ffdc1dfbbd539ac313091cbeec
Pointer size: 132 Bytes
Size of remote file: 3.42 MB

export_and_quantize.sh CHANGED Viewed

@@ -47,6 +47,14 @@ else
     echo "Already present: test_video.mp4"
 fi
 if [[ "${PRECISION}" == "FP32" ]]; then
     HALF_FLAG="False"
     EXPORT_LABEL="FP32"

     echo "Already present: test_video.mp4"
 fi
+echo "--- Downloading sample test image ---"
+if [[ ! -f test.jpg ]]; then
+    wget -q -O test.jpg https://ultralytics.com/images/bus.jpg
+    echo "Downloaded: test.jpg"
+else
+    echo "Already present: test.jpg"
+fi
 if [[ "${PRECISION}" == "FP32" ]]; then
     HALF_FLAG="False"
     EXPORT_LABEL="FP32"