Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +7 -35
config.yaml +3 -5
convert_weights.py +80 -0
image_pusht_diffusion_policy_cnn.yaml +185 -0
model.safetensors +1 -1

README.md CHANGED Viewed

@@ -1,39 +1,11 @@
-# Model Card for Diffusion Policy / PushT
-Diffusion Policy (as per [Diffusion Policy: Visuomotor Policy
-Learning via Action Diffusion](https://arxiv.org/abs/2303.04137)) trained for the `PushT` environment from [gym-pusht](https://github.com/huggingface/gym-pusht).
-![demo](demo.gif)
-## How to Get Started with the Model
-See the [LeRobot library](https://github.com/huggingface/lerobot) (particularly the [evaluation script](https://github.com/huggingface/lerobot/blob/main/lerobot/scripts/eval.py)) for instructions on how to load and evaluate this model.
-## Training Details
-TODO commit hash.
-Trained with [LeRobot@d747195](https://github.com/huggingface/lerobot/tree/d747195c5733c4f68d4bfbe62632d6fc1b605712).
-The model was trained using [LeRobot's training script](https://github.com/huggingface/lerobot/blob/d747195c5733c4f68d4bfbe62632d6fc1b605712/lerobot/scripts/train.py) and with the [pusht](https://huggingface.co/datasets/lerobot/pusht/tree/v1.3) dataset.
-Here are the [loss](./train_loss.csv), [evaluation score](./eval_avg_max_reward.csv), [evaluation success rate](./eval_pc_success.csv) (with 50 rollouts) during training.
-![](training_curves.png)
-This took about 7 hours to train on an Nvida RTX 3090.
-## Evaluation
-The model was evaluated on the `PushT` environment from [gym-pusht](https://github.com/huggingface/gym-pusht) and compared to a similar model trained with the original [Diffusion Policy code](https://github.com/real-stanford/diffusion_policy). There are two evaluation metrics on a per-episode basis:
-- Maximum overlap with target (seen as `eval/avg_max_reward` in the charts above). This ranges in [0, 1].
-- Success: whether or not the maximum overlap is at least 95%.
-Here are the metrics for 500 episodes worth of evaluation. For the succes rate we add an extra row with confidence bounds. This assumes a uniform prior over success probability and computes the beta posterior, then calculates the mean and lower/upper confidence bounds (with a 68.2% confidence interval centered on the mean).
-<blank>|Ours|Theirs
--|-|-
-Average max. overlap ratio | 0.959 | 0.957
-Success rate for 500 episodes (%) | 63.8 | 64.2
-Beta distribution lower/mean/upper (%) | 61.6 / 63.7 / 65.9 | 62.0 / 64.1 / 66.3

+This branch contains the model weights obtained from training on the original Diffusion Policy repository.
+This is the command that was used for training:
+```bash
+python train.py --config-dir=. --config-name=image_pusht_diffusion_policy_cnn.yaml training.seed=42 logging.name=benchmark
+```
+The configuration file `image_pusht_diffusion_policy_cnn.yaml` is included in this branch.
+The weights were converted with [`convert_weights.py`](convert_weights.py).

config.yaml CHANGED Viewed

@@ -7,8 +7,8 @@ training:
   online_steps_between_rollouts: 1
   online_sampling_ratio: 0.5
   online_env_seed: ???
-  eval_freq: 10000
-  save_freq: 20000
   log_freq: 250
   save_model: true
   batch_size: 64
@@ -45,15 +45,13 @@ training:
     - 1.2
     - 1.3
     - 1.4
-  n_end_keyframes_dropped: ${policy.horizon} - ${policy.n_action_steps} - ${policy.n_obs_steps}
-    + 1
 eval:
   n_episodes: 50
   batch_size: 50
   use_async_envs: false
 wandb:
   enable: true
-  disable_artifact: true
   project: lerobot
   notes: ''
 fps: 10

   online_steps_between_rollouts: 1
   online_sampling_ratio: 0.5
   online_env_seed: ???
+  eval_freq: 5000
+  save_freq: 5000
   log_freq: 250
   save_model: true
   batch_size: 64
     - 1.2
     - 1.3
     - 1.4
 eval:
   n_episodes: 50
   batch_size: 50
   use_async_envs: false
 wandb:
   enable: true
+  disable_artifact: false
   project: lerobot
   notes: ''
 fps: 10

convert_weights.py ADDED Viewed

	@@ -0,0 +1,80 @@

+from itertools import product
+from pathlib import Path
+import torch
+from omegaconf import OmegaConf
+from lerobot.common.datasets.factory import make_dataset
+from lerobot.common.policies.factory import make_policy
+from lerobot.common.utils.utils import init_hydra_config
+PATH_TO_ORIGINAL_WEIGHTS = "/tmp/dp.pt"
+PATH_TO_CONFIG = "/home/alexander/Projects/lerobot/lerobot/configs/default.yaml"
+PATH_TO_SAVE_NEW_WEIGHTS = "/tmp/dp"
+cfg = init_hydra_config(PATH_TO_CONFIG)
+policy = make_policy(cfg, dataset_stats=make_dataset(cfg).stats)
+state_dict = torch.load(PATH_TO_ORIGINAL_WEIGHTS)
+# Remove keys based on what they start with.
+start_removals = ["normalizer.", "obs_encoder.obs_nets.rgb.backbone.nets.0.nets.0"]
+for to_remove in start_removals:
+    for k in list(state_dict.keys()):
+        if k.startswith(to_remove):
+            del state_dict[k]
+# Replace keys based on what they start with.
+start_replacements = [
+    ("obs_encoder.obs_nets.image.backbone.nets", "rgb_encoder.backbone"),
+    ("obs_encoder.obs_nets.image.pool", "rgb_encoder.pool"),
+    ("obs_encoder.obs_nets.image.nets.3", "rgb_encoder.out"),
+    *[(f"model.up_modules.{i}.2.conv.", f"model.up_modules.{i}.2.") for i in range(2)],
+    *[(f"model.down_modules.{i}.2.conv.", f"model.down_modules.{i}.2.") for i in range(2)],
+    *[
+        (f"model.mid_modules.{i}.blocks.{k}.", f"model.mid_modules.{i}.conv{k + 1}.")
+        for i, k in product(range(3), range(2))
+    ],
+    *[
+        (f"model.down_modules.{i}.{j}.blocks.{k}.", f"model.down_modules.{i}.{j}.conv{k + 1}.")
+        for i, j, k in product(range(3), range(2), range(2))
+    ],
+    *[
+        (f"model.up_modules.{i}.{j}.blocks.{k}.", f"model.up_modules.{i}.{j}.conv{k + 1}.")
+        for i, j, k in product(range(3), range(2), range(2))
+    ],
+    ("model.", "unet.")
+]
+for to_replace, replace_with in start_replacements:
+    for k in list(state_dict.keys()):
+        if k.startswith(to_replace):
+            k_ = replace_with + k.removeprefix(to_replace)
+            state_dict[k_] = state_dict[k]
+            del state_dict[k]
+missing_keys, unexpected_keys = policy.diffusion.load_state_dict(state_dict, strict=False)
+unexpected_keys = set(unexpected_keys)
+allowed_unexpected_keys = eval(
+    "{'obs_encoder.obs_nets.image.nets.0.nets.7.1.bn2.weight', 'obs_encoder.obs_nets.image.nets.0.nets.5.0.downsample.0.weight', 'obs_encoder.obs_nets.image.nets.0.nets.5.1.bn2.weight', 'obs_encoder.obs_nets.image.nets.0.nets.4.0.conv1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.6.1.bn1.bias', 'obs_encoder.obs_nets.image.nets.0.nets.5.0.bn1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.0.weight', 'obs_encoder.obs_nets.image.nets.0.nets.5.1.conv1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.1.bias', 'obs_encoder.obs_nets.image.nets.0.nets.7.1.bn1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.4.0.conv2.weight', 'obs_encoder.obs_nets.image.nets.0.nets.4.1.bn2.bias', 'obs_encoder.obs_nets.image.nets.0.nets.5.0.conv2.weight', 'obs_encoder.obs_nets.image.nets.0.nets.6.1.bn1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.7.0.bn2.bias', 'obs_encoder.obs_nets.image.nets.0.nets.6.1.conv1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.6.0.bn2.bias', 'obs_encoder.obs_nets.image.nets.0.nets.4.1.conv1.weight', 'obs_encoder.obs_nets.image.nets.1.nets.weight', 'obs_encoder.obs_nets.image.nets.0.nets.5.1.bn1.weight', 'obs_encoder.obs_nets.image.nets.1.pos_x', 'obs_encoder.obs_nets.image.nets.0.nets.6.1.bn2.bias', 'obs_encoder.obs_nets.image.nets.1.nets.bias', 'obs_encoder.obs_nets.image.nets.0.nets.6.1.bn2.weight', 'obs_encoder.obs_nets.image.nets.0.nets.4.1.conv2.weight', 'obs_encoder.obs_nets.image.nets.0.nets.4.1.bn1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.5.0.bn2.bias', 'obs_encoder.obs_nets.image.nets.0.nets.4.0.bn1.weight', '_dummy_variable', 'mask_generator._dummy_variable', 'obs_encoder.obs_nets.image.nets.0.nets.7.0.bn2.weight', 'obs_encoder.obs_nets.image.nets.0.nets.5.1.bn2.bias', 'obs_encoder.obs_nets.image.nets.0.nets.7.0.bn1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.6.0.bn1.bias', 'obs_encoder.obs_nets.image.nets.0.nets.7.0.downsample.1.bias', 'obs_encoder.obs_nets.image.nets.1.temperature', 'obs_encoder.obs_nets.image.nets.0.nets.4.1.bn1.bias', 'obs_encoder.obs_nets.image.nets.0.nets.5.1.conv2.weight', 'obs_encoder.obs_nets.image.nets.0.nets.7.1.conv1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.5.0.conv1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.6.1.conv2.weight', 'obs_encoder.obs_nets.image.nets.0.nets.4.0.bn2.weight', 'obs_encoder.obs_nets.image.nets.0.nets.7.1.bn2.bias', 'obs_encoder.obs_nets.image.nets.0.nets.5.0.bn2.weight', 'obs_encoder.obs_nets.image.nets.0.nets.6.0.bn2.weight', 'obs_encoder.obs_nets.image.nets.0.nets.5.0.downsample.1.bias', 'obs_encoder.obs_nets.image.nets.1.pos_y', 'obs_encoder.obs_nets.image.nets.0.nets.6.0.conv2.weight', 'obs_encoder.obs_nets.image.nets.0.nets.6.0.downsample.1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.7.0.bn1.bias', 'obs_encoder.obs_nets.image.nets.0.nets.5.1.bn1.bias', 'obs_encoder.obs_nets.image.nets.0.nets.6.0.conv1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.6.0.downsample.1.bias', 'obs_encoder.obs_nets.image.nets.0.nets.6.0.bn1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.7.0.conv2.weight', 'obs_encoder.obs_nets.image.nets.0.nets.7.0.downsample.0.weight', 'obs_encoder.obs_nets.image.nets.0.nets.5.0.downsample.1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.6.0.downsample.0.weight', 'obs_encoder.obs_nets.image.nets.0.nets.7.1.conv2.weight', 'obs_encoder.obs_nets.image.nets.0.nets.7.1.bn1.bias', 'obs_encoder.obs_nets.image.nets.0.nets.7.0.downsample.1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.5.0.bn1.bias', 'obs_encoder.obs_nets.image.nets.0.nets.1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.4.0.bn1.bias', 'obs_encoder.obs_nets.image.nets.0.nets.7.0.conv1.weight', 'obs_encoder.obs_nets.image.nets.0.nets.4.1.bn2.weight', 'obs_encoder.obs_nets.image.nets.0.nets.4.0.bn2.bias'}"
+)
+if len(missing_keys) != 0:
+    print("MISSING KEYS")
+    print(missing_keys)
+if unexpected_keys != allowed_unexpected_keys:
+    print("UNEXPECTED KEYS")
+    print(unexpected_keys)
+if len(missing_keys) != 0 or unexpected_keys != allowed_unexpected_keys:
+    print("Failed due to mismatch in state dicts.")
+    exit()
+torch.save(policy.state_dict(), "/tmp/policy.pt")
+policy.save_pretrained(PATH_TO_SAVE_NEW_WEIGHTS)
+OmegaConf.save(cfg, Path(PATH_TO_SAVE_NEW_WEIGHTS) / "config.yaml")

image_pusht_diffusion_policy_cnn.yaml ADDED Viewed

	@@ -0,0 +1,185 @@

+_target_: diffusion_policy.workspace.train_diffusion_unet_hybrid_workspace.TrainDiffusionUnetHybridWorkspace
+checkpoint:
+  save_last_ckpt: true
+  save_last_snapshot: false
+  topk:
+    format_str: epoch={epoch:04d}-test_mean_score={test_mean_score:.3f}.ckpt
+    k: 5
+    mode: max
+    monitor_key: test_mean_score
+dataloader:
+  batch_size: 64
+  num_workers: 8
+  persistent_workers: false
+  pin_memory: true
+  shuffle: true
+dataset_obs_steps: 2
+ema:
+  _target_: diffusion_policy.model.diffusion.ema_model.EMAModel
+  inv_gamma: 1.0
+  max_value: 0.9999
+  min_value: 0.0
+  power: 0.75
+  update_after_step: 0
+exp_name: default
+horizon: 16
+keypoint_visible_rate: 1.0
+logging:
+  group: null
+  id: null
+  mode: online
+  name: 2023.01.16-20.20.06_train_diffusion_unet_hybrid_pusht_image
+  project: diffusion_policy_debug
+  resume: true
+  tags:
+  - train_diffusion_unet_hybrid
+  - pusht_image
+  - default
+multi_run:
+  run_dir: data/outputs/2023.01.16/20.20.06_train_diffusion_unet_hybrid_pusht_image
+  wandb_name_base: 2023.01.16-20.20.06_train_diffusion_unet_hybrid_pusht_image
+n_action_steps: 8
+n_latency_steps: 0
+n_obs_steps: 2
+name: train_diffusion_unet_hybrid
+obs_as_global_cond: true
+optimizer:
+  _target_: torch.optim.AdamW
+  betas:
+  - 0.95
+  - 0.999
+  eps: 1.0e-08
+  lr: 0.0001
+  weight_decay: 1.0e-06
+past_action_visible: false
+policy:
+  _target_: diffusion_policy.policy.diffusion_unet_hybrid_image_policy.DiffusionUnetHybridImagePolicy
+  cond_predict_scale: true
+  crop_shape:
+  - 84
+  - 84
+  diffusion_step_embed_dim: 128
+  down_dims:
+  # - 256
+  # - 512
+  # - 1024
+  - 512
+  - 1024
+  - 2048
+  eval_fixed_crop: true
+  horizon: 16
+  kernel_size: 5
+  n_action_steps: 8
+  n_groups: 8
+  n_obs_steps: 2
+  noise_scheduler:
+    _target_: diffusers.schedulers.scheduling_ddpm.DDPMScheduler
+    beta_end: 0.02
+    beta_schedule: squaredcos_cap_v2
+    beta_start: 0.0001
+    clip_sample: true
+    num_train_timesteps: 100
+    prediction_type: epsilon
+    variance_type: fixed_small
+  num_inference_steps: 100
+  obs_as_global_cond: true
+  obs_encoder_group_norm: true
+  shape_meta:
+    action:
+      shape:
+      - 2
+    obs:
+      agent_pos:
+        shape:
+        - 2
+        type: low_dim
+      image:
+        shape:
+        - 3
+        - 96
+        - 96
+        type: rgb
+shape_meta:
+  action:
+    shape:
+    - 2
+  obs:
+    agent_pos:
+      shape:
+      - 2
+      type: low_dim
+    image:
+      shape:
+      - 3
+      - 96
+      - 96
+      type: rgb
+task:
+  dataset:
+    _target_: diffusion_policy.dataset.pusht_image_dataset.PushTImageDataset
+    horizon: 16
+    max_train_episodes: null
+    pad_after: 7
+    pad_before: 1
+    seed: 42
+    val_ratio: 0
+    zarr_path: data/pusht/pusht_cchi_v7_replay.zarr
+  env_runner:
+    _target_: diffusion_policy.env_runner.pusht_image_runner.PushTImageRunner
+    fps: 10
+    legacy_test: true
+    max_steps: 300
+    n_action_steps: 8
+    n_envs: null
+    n_obs_steps: 2
+    n_test: 50
+    n_test_vis: 4
+    n_train: 6
+    n_train_vis: 2
+    past_action: false
+    test_start_seed: 100000
+    train_start_seed: 0
+  image_shape:
+  - 3
+  - 96
+  - 96
+  name: pusht_image
+  shape_meta:
+    action:
+      shape:
+      - 2
+    obs:
+      agent_pos:
+        shape:
+        - 2
+        type: low_dim
+      image:
+        shape:
+        - 3
+        - 96
+        - 96
+        type: rgb
+task_name: pusht_image
+training:
+  checkpoint_every: 50
+  debug: false
+  device: cuda:0
+  gradient_accumulate_every: 1
+  lr_scheduler: cosine
+  lr_warmup_steps: 500
+  max_train_steps: null
+  max_val_steps: null
+  num_epochs: 500
+  resume: true
+  rollout_every: 50
+  sample_every: 5
+  seed: 42
+  tqdm_interval_sec: 1.0
+  use_ema: true
+  val_every: 50000000
+val_dataloader:
+  batch_size: 64
+  num_workers: 8
+  persistent_workers: false
+  pin_memory: true
+  shuffle: false

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:877969d58d12af315d8c672a2328b3984071901b6f71bdf592b6f131056b520f
 size 1050862612

 version https://git-lfs.github.com/spec/v1
+oid sha256:9150bda22091932686db52309233586c2695be418bda16fa7202e497f56bfab8
 size 1050862612