lzbinden commited on
Commit
32be28b
Β·
verified Β·
1 Parent(s): 658521f

Update README.md for checkpoint 12k-v2

Browse files
Files changed (1) hide show
  1. README.md +16 -15
README.md CHANGED
@@ -33,13 +33,16 @@ library_name: nv-medtech
33
  # Cosmos-H-Surgical-Simulator
34
 
35
  ## Description
36
- Cosmos-H-Surgical-Simulator is a surgical world foundation model fine-tuned on the Open-H embodiment dataset including clinical surgical procedures for the evaluation of physically grounded surgical robotics policies in simulation and synthetic data generation.
37
- This model assists in evaluating surgical robotics policies in simulation, primarily for CMR Surgical Versius clinical procedures (cholecystectomy, prostatectomy, inguinal hernia, and hysterectomy), as well as dVRK, MITIC, and other surgical platforms across tasks such as suturing, tissue manipulation, and peg transfer, before transitioning to a physical system.
38
 
39
- The released model is based on the public NVIDIA Cosmos-predict2.5 world foundation model for physical AI.
40
 
41
  This model is for commercial/non-commercial use.
42
 
 
 
 
 
43
  ## License/Terms of Use
44
  Use of this model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
45
 
@@ -146,26 +149,24 @@ Dataset: Open-H-Embodiment community generated dataset.
146
  Evaluated on 4 CMR Versius clinical surgery procedures (prostatectomy, inguinal hernia, hysterectomy, cholecystectomy) at 360p resolution, 2 episodes per procedure, 2 seeds each, 72-frame autoregressive generation (6 chunks Γ— 12 frames).
147
 
148
  ### Aggregate Metrics
149
- | Checkpoint | FDS (L1) ↓ | GATC ↑ | TCD (px) ↓ |
150
- |:---------------|:---:|:---:|:---:|
151
- | 4k | 0.2286 | 0.3517 | 84.42 |
152
- | 8k | 0.2298 | 0.3414 | 98.21 |
153
- | 12k | 0.2253 | 0.3814 | 138.54 |
154
- | **16k** | **0.2227** | **0.4167** | **83.68** |
155
- | 20k | **0.2219** | 0.4058 | 124.96 |
156
 
157
  **Metrics:**
158
  - **FDS (L1)**: Frame Decay Score - mean L1 distance between generated and ground-truth frames normalized to [-1, 1], averaged across all generated frames (lower is better)
159
  - **GATC**: Ground-truth Anchored Tool Consistency - median zero-mean normalized cross-correlation (ZNCC) of grayscale pixels within SAM3-segmented tool regions between generated and ground-truth frames, weighted by a gradient-based tool presence penalty (higher is better)
160
  - **TCD**: Tool Centroid Distance - median per-frame average Euclidean distance (in pixels) between Hungarian-matched tool instance centroids in generated vs ground-truth frames, with a half-diagonal penalty for unmatched tools (lower is better)
161
 
162
- ### Per-Procedure Metrics (16k checkpoint)
163
  | Procedure | FDS (L1) ↓ | GATC ↑ | TCD (px) ↓ |
164
  |:---|:---:|:---:|:---:|
165
- | Prostatectomy | 0.229 | 0.429 | 130.7 |
166
- | Inguinal Hernia | 0.259 | 0.261 | 211.3 |
167
- | Hysterectomy | 0.173 | 0.593 | 153.0 |
168
- | Cholecystectomy | 0.248 | 0.188 | 60.8 |
169
 
170
  ## Inference
171
  **Acceleration Engine:** [PyTorch](https://pytorch.org/), [Transformer Engine](https://github.com/NVIDIA/TransformerEngine)
 
33
  # Cosmos-H-Surgical-Simulator
34
 
35
  ## Description
36
+ Cosmos-H-Surgical-Simulator is a kinematic action-conditioned surgical world foundation model, built on the public NVIDIA [Cosmos-Predict2.5-2B](https://huggingface.co/nvidia/Cosmos-Predict2.5-2B) for physical AI and fine-tuned on the Open-H multi-embodiment surgical benchmark. Unlike the text-conditioned base model, it is driven directly by robot kinematics: given a surgical context frame and a sequence of 44-dimensional action vectors encoding end-effector poses and gripper commands (unified across 9 embodiments), it generates future video of the resulting surgical scene.
 
37
 
38
+ The model is intended for evaluating surgical robotics policies in simulation and for synthetic data generation prior to deployment on a physical system. It covers CMR Surgical Versius clinical procedures (cholecystectomy, prostatectomy, inguinal hernia, hysterectomy) as well as dVRK, MITIC, and other surgical platforms across tasks such as suturing, tissue manipulation, and peg transfer.
39
 
40
  This model is for commercial/non-commercial use.
41
 
42
+ ## Updates
43
+
44
+ - **April 2026** β€” Released updated checkpoint after fixing an action-embedder MLP initialization bug. Aggregate quality improves on all three metrics: FDS (L1) 0.223 β†’ **0.184** (βˆ’17%), GATC 0.417 β†’ **0.472** (+13%), TCD 83.68 β†’ **67.03** (βˆ’20%).
45
+
46
  ## License/Terms of Use
47
  Use of this model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
48
 
 
149
  Evaluated on 4 CMR Versius clinical surgery procedures (prostatectomy, inguinal hernia, hysterectomy, cholecystectomy) at 360p resolution, 2 episodes per procedure, 2 seeds each, 72-frame autoregressive generation (6 chunks Γ— 12 frames).
150
 
151
  ### Aggregate Metrics
152
+ | Checkpoint | FDS (L1) ↓ | GATC ↑ | TCD (px) ↓ |
153
+ |:---|:---:|:---:|:---:|
154
+ | Previous (16k, pre-fix) | 0.223 | 0.417 | 83.68 |
155
+ | **Current (12k-v2, post-fix)** | **0.184** | **0.472** | **67.03** |
156
+ | Relative change | **βˆ’17%** | **+13%** | **βˆ’20%** |
 
 
157
 
158
  **Metrics:**
159
  - **FDS (L1)**: Frame Decay Score - mean L1 distance between generated and ground-truth frames normalized to [-1, 1], averaged across all generated frames (lower is better)
160
  - **GATC**: Ground-truth Anchored Tool Consistency - median zero-mean normalized cross-correlation (ZNCC) of grayscale pixels within SAM3-segmented tool regions between generated and ground-truth frames, weighted by a gradient-based tool presence penalty (higher is better)
161
  - **TCD**: Tool Centroid Distance - median per-frame average Euclidean distance (in pixels) between Hungarian-matched tool instance centroids in generated vs ground-truth frames, with a half-diagonal penalty for unmatched tools (lower is better)
162
 
163
+ ### Per-Procedure Metrics (current checkpoint)
164
  | Procedure | FDS (L1) ↓ | GATC ↑ | TCD (px) ↓ |
165
  |:---|:---:|:---:|:---:|
166
+ | Prostatectomy | 0.220 | 0.451 | 122.0 |
167
+ | Inguinal Hernia | 0.199 | 0.429 | 143.2 |
168
+ | Hysterectomy | 0.121 | 0.737 | 12.7 |
169
+ | Cholecystectomy | 0.198 | 0.344 | 28.8 |
170
 
171
  ## Inference
172
  **Acceleration Engine:** [PyTorch](https://pytorch.org/), [Transformer Engine](https://github.com/NVIDIA/TransformerEngine)