原版本 vs 改进版本对比示例

示例视频

路径: PennPAL/failure/2023-10-01/Sun_Oct__1_16:56:53_2023/recordings/MP4/14436910.mp4

原版本输出

Prompt

固定的通用任务描述，没有具体任务信息

处理方式

一次性发送所有帧（无连续性分析）
无前后帧关系判断

输出示例

{
  "video_id": "14436910",
  "t": 0,
  "frame_index": 0,
  "stage": "reach",
  "reward": 0.5,
  "delta": 0,
  "success_prob": 0.6,
  "failure": 0,
  "explanation": "Robot moving toward object"
}

问题

❌ 不知道具体是什么任务（是抓取？还是开门？）
❌ 无法判断运动是否平滑（每帧独立分析）
❌ 缺少细粒度的奖励组件（reachout, grasp, smooth等）
❌ 无阶段索引（只有字符串）

改进版本输出

Prompt

**TASK DESCRIPTION**: Open or close hinged object (ex: hinged door, microwave, oven, book, dryer, toilet, box)

（GPT 现在明确知道这是一个开关铰链物体的任务）

处理方式

滑动窗口: 每次分析 5 个连续帧
上下文传递: 前一窗口的结果传递给下一窗口
动态关系: 可以判断运动是否平滑、物体是否稳定

输出示例

窗口 1: 帧 [0, 1, 2, 3, 4]

{
  "video_id": "14436910",
  "task": "Open or close hinged object (ex: hinged door, microwave, oven, book, dryer, toilet, box)",
  "t": 0,
  "frame_index": 0,
  "window_idx": 0,

  "stage": 0,
  "stage_name": "reach",

  "reachout": 0.2,    // 刚开始伸手
  "grasp": 0.0,       // 还未抓取
  "collision": 0.0,   // 无碰撞
  "fall": 0.0,        // 物体稳定
  "smooth": 0.85,     // 运动较平滑

  "reward": 0.4,
  "delta": 0.0,
  "success_prob": 0.5,
  "failure": 0,

  "explanation": "Robot arm is smoothly reaching toward the hinged object (appears to be a microwave door). Movement is stable and controlled. Stage: initial reach."
}

{
  "t": 4,
  "frame_index": 4,
  "window_idx": 0,

  "stage": 1,
  "stage_name": "grasp",

  "reachout": 0.9,    // 已接近目标
  "grasp": 0.3,       // 开始接触
  "collision": 0.0,
  "fall": 0.0,
  "smooth": 0.7,      // 运动略有减速（正常，准备抓取）

  "reward": 0.65,
  "delta": 0.05,
  "success_prob": 0.6,
  "failure": 0,

  "explanation": "Robot has reached the object and is beginning to make contact with the handle. Slight deceleration is observed, which is normal for grasping preparation. No signs of instability."
}

窗口 2: 帧 [3, 4, 5, 6, 7] + 上下文

上下文: "Previous stage: grasp, Success prob: 0.60, Explanation: Robot has reached the object..."

{
  "t": 7,
  "frame_index": 7,
  "window_idx": 1,

  "stage": 2,
  "stage_name": "lift",

  "reachout": 1.0,    // 已完成接近
  "grasp": 0.7,       // 抓取较稳固
  "collision": 0.0,
  "fall": 0.0,
  "smooth": 0.6,      // ⚠️ 运动有些抖动

  "reward": 0.75,
  "delta": 0.1,
  "success_prob": 0.65,
  "failure": 0,

  "explanation": "Object is being lifted. Grasp appears secure but some jerkiness detected in the motion (smooth score decreased). This could indicate the object is heavier than expected or the gripper adjustment is ongoing."
}

改进点总结

✅ 任务感知: GPT 知道这是开关铰链物体任务，可以更准确地评估
✅ 连续性分析: 通过连续帧判断"运动有些抖动"
✅ 细粒度奖励:
- reachout: 0.2 → 0.9 → 1.0（清晰的进度）
- grasp: 0.0 → 0.3 → 0.7（抓取过程）
- smooth: 0.85 → 0.7 → 0.6（捕捉到抖动）
✅ 阶段索引: 0, 1, 2（与仿真对齐）
✅ 上下文传递: 窗口2知道窗口1的结果
✅ 详细解释: 说明了为什么 smooth 下降

对齐仿真数据

仿真数据格式（参考）

{
  "stages": [0, 0, 0, 1, 1, 2, 2, 3, ...],
  "rewards": [
    {
      "reachout": 0.2,
      "grasp": 0.0,
      "collision": 0.0,
      "fall": 0.0,
      "smooth": 0.85
    },
    ...
  ]
}

真机标注输出（改进后）

{
  "stage": 0,
  "stage_name": "reach",
  "reachout": 0.2,
  "grasp": 0.0,
  "collision": 0.0,
  "fall": 0.0,
  "smooth": 0.85,
  "reward": 0.4,
  ...
}

✅ 完全对齐！ 可以直接用于训练奖励模型

实际使用案例

场景1: 检测失败原因

原版本:

{"stage": "move", "reward": 0.3, "failure": 1, "explanation": "Failed"}

❌ 不知道为什么失败

改进版本:

{
  "stage": 3,
  "stage_name": "move",
  "reachout": 1.0,
  "grasp": 0.2,        // ⚠️ 抓取不稳
  "fall": 1.0,         // ⚠️ 物体掉落
  "smooth": 0.1,       // ⚠️ 运动抖动
  "reward": 0.15,
  "failure": 1,
  "explanation": "Object slipped from gripper during transport. The grasp was insufficient (grasp=0.2) and the sudden jerky motion (smooth=0.1) caused the object to fall."
}

✅ 清晰知道：抓取力不足 + 运动抖动 → 掉落

场景2: 评估运动质量

滑动窗口可以捕捉:

帧 0-4:   smooth=0.9  ✓ 平滑
帧 3-7:   smooth=0.85 ✓ 平滑
帧 6-10:  smooth=0.3  ⚠️ 突然抖动！
帧 9-13:  smooth=0.2  ⚠️ 继续抖动

可以精确定位问题发生的时间点！

性能对比

指标	原版本	改进版本
任务准确性	中等	✓ 高（有任务上下文）
运动分析	有限	✓ 详细（连续帧）
失败诊断	粗糙	✓ 精确（细粒度指标）
与仿真对齐	否	✓ 是
可解释性	中等	✓ 高（详细解释）
API 成本	低	中等（+30-50%）

结论: 改进版本虽然成本略高，但标注质量和可用性大幅提升！